Post-processing and visualization of large-scale DEM simulation data with the open-source VELaSSCo platform

Regardless of its origin, in the near future the challenge will not be how to generate data, but rather how to manage big and highly distributed data to make it more easily handled and more accessible by users on their personal devices. VELaSSCo (Visualization for Extremely Large-Scale Scientific Computing) is a platform developed to provide new visual analysis methods for large-scale simulations serving the petabyte era. The platform adopts Big Data tools/architectures to enable in-situ processing for analytics of engineering and scientific data and hardware-accelerated interactive visualization. In large-scale simulations, the domain is partitioned across several thousand nodes, and the data (mesh and results) are stored on those nodes in a distributed manner. The VELaSSCo platform accesses this distributed information, processes the raw data, and returns the results to the users for local visualization by their specific visualization clients and tools. The global goal of VELaSSCo is to provide Big Data tools for the engineering and scientific community, in order to better manipulate simulations with billions of distributed records. The ability to easily handle large amounts of data will also enable larger, higher resolution simulations, which will allow the scientific and engineering communities to garner new knowledge from simulations previously considered too large to handle. This paper shows, by means of selected Discrete Element Method (DEM) simulation use cases, that the VELaSSCo platform facilitates distributed post-processing and visualization of large engineering datasets.

Download Full-text

Distributed Data Processing for Large-Scale Simulations on Cloud

10.1109/emc/si/pi/emceurope52599.2021.9559316 ◽

2021 ◽

Author(s):

Tianjian Lu ◽

Stephan Hoyer ◽

Qing Wang ◽

Lily Hu ◽

Yi-Fan Chen

Keyword(s):

Data Processing ◽

Large Scale ◽

Distributed Data ◽

Distributed Data Processing ◽

Large Scale Simulations

Download Full-text

Large-Scale Simulations Manager Tool for OMNeT++: Expediting Simulations and Post-Processing Analysis

IEEE Access ◽

10.1109/access.2020.3020745 ◽

2020 ◽

Vol 8 ◽

pp. 159291-159306

Author(s):

Pablo Andres Barbecho Bautista ◽

Luis Felipe Urquiza-Aguiar ◽

Leticia Lemus Cardenas ◽

Monica Aguilar Igartua

Keyword(s):

Large Scale ◽

Post Processing ◽

Large Scale Simulations

Download Full-text

A Machine Learning Approach for the Discovery of Ligand-Specific Functional Mechanisms of GPCRs

Molecules ◽

10.3390/molecules24112097 ◽

2019 ◽

Vol 24 (11) ◽

pp. 2097 ◽

Cited By ~ 10

Author(s):

Ambrose Plante ◽

Derek M. Shore ◽

Giulia Morra ◽

George Khelashvili ◽

Harel Weinstein

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Molecular Mechanisms ◽

Md Simulations ◽

Machine Learning Approach ◽

And Function ◽

Functional Mechanisms ◽

G Protein Coupled ◽

Large Scale Simulations

G protein-coupled receptors (GPCRs) play a key role in many cellular signaling mechanisms, and must select among multiple coupling possibilities in a ligand-specific manner in order to carry out a myriad of functions in diverse cellular contexts. Much has been learned about the molecular mechanisms of ligand-GPCR complexes from Molecular Dynamics (MD) simulations. However, to explore ligand-specific differences in the response of a GPCR to diverse ligands, as is required to understand ligand bias and functional selectivity, necessitates creating very large amounts of data from the needed large-scale simulations. This becomes a Big Data problem for the high dimensionality analysis of the accumulated trajectories. Here we describe a new machine learning (ML) approach to the problem that is based on transforming the analysis of GPCR function-related, ligand-specific differences encoded in the MD simulation trajectories into a representation recognizable by state-of-the-art deep learning object recognition technology. We illustrate this method by applying it to recognize the pharmacological classification of ligands bound to the 5-HT2A and D2 subtypes of class-A GPCRs from the serotonin and dopamine families. The ML-based approach is shown to perform the classification task with high accuracy, and we identify the molecular determinants of the classifications in the context of GPCR structure and function. This study builds a framework for the efficient computational analysis of MD Big Data collected for the purpose of understanding ligand-specific GPCR activity.

Download Full-text

Combining Strengths for Multi-genome Visual Analytics Comparison

Bioinformatics and Biology Insights ◽

10.1177/1177932218825127 ◽

2019 ◽

Vol 13 ◽

pp. 117793221882512 ◽

Cited By ~ 1

Author(s):

Sergio Diaz-del-Pino ◽

Pablo Rodriguez-Brazzarola ◽

Esteban Perez-Wohlfeil ◽

Oswaldo Trelles

Keyword(s):

Data Analysis ◽

Data Acquisition ◽

Visual Analytics ◽

Large Scale ◽

Visual Analysis ◽

Homo Sapiens ◽

Third Party ◽

Genome Comparison ◽

Post Processing ◽

Sequence Comparisons

The eclosion of data acquisition technologies has shifted the bottleneck in molecular biology research from data acquisition to data analysis. Such is the case in Comparative Genomics, where sequence analysis has transitioned from genes to genomes of several orders of magnitude larger. This fact has revealed the need to adapt software to work with huge experiments efficiently and to incorporate new data-analysis strategies to manage results from such studies. In previous works, we presented GECKO, a software to compare large sequences; now we address the representation, browsing, data exploration, and post-processing of the massive amount of information derived from such comparisons. GECKO-MGV is a web-based application organized as client-server architecture. It is aimed at visual analysis of the results from both pairwise and multiple sequences comparison studies combining a set of common commands for image exploration with improved state-of-the-art solutions. In addition, GECKO-MGV integrates different visualization analysis tools while exploiting the concept of layers to display multiple genome comparison datasets. Moreover, the software is endowed with capabilities for contacting external-proprietary and third-party services for further data post-processing and also presents a method to display a timeline of large-scale evolutionary events. As proof-of-concept, we present 2 exercises using bacterial and mammalian genomes which depict the capabilities of GECKO-MGV to perform in-depth, customizable analyses on the fly using web technologies. The first exercise is mainly descriptive and is carried out over bacterial genomes, whereas the second one aims to show the ability to deal with large sequence comparisons. In this case, we display results from the comparison of the first Homo sapiens chromosome against the first 5 chromosomes of Mus musculus.

Download Full-text

Big data and extreme-scale computing

The International Journal of High Performance Computing Applications ◽

10.1177/1094342018778123 ◽

2018 ◽

Vol 32 (4) ◽

pp. 435-479 ◽

Cited By ~ 22

Author(s):

M Asch ◽

T Moore ◽

R Badia ◽

M Beck ◽

P Beckman ◽

...

Keyword(s):

Big Data ◽

High Performance ◽

Large Scale ◽

Digital Data ◽

Distributed Sensors ◽

Large Scale Data ◽

Critical Problems ◽

Extreme Scale ◽

Shared Infrastructure ◽

Large Scale Simulations

Over the past four years, the Big Data and Exascale Computing (BDEC) project organized a series of five international workshops that aimed to explore the ways in which the new forms of data-centric discovery introduced by the ongoing revolution in high-end data analysis (HDA) might be integrated with the established, simulation-centric paradigm of the high-performance computing (HPC) community. Based on those meetings, we argue that the rapid proliferation of digital data generators, the unprecedented growth in the volume and diversity of the data they generate, and the intense evolution of the methods for analyzing and using that data are radically reshaping the landscape of scientific computing. The most critical problems involve the logistics of wide-area, multistage workflows that will move back and forth across the computing continuum, between the multitude of distributed sensors, instruments and other devices at the networks edge, and the centralized resources of commercial clouds and HPC centers. We suggest that the prospects for the future integration of technological infrastructures and research ecosystems need to be considered at three different levels. First, we discuss the convergence of research applications and workflows that establish a research paradigm that combines both HPC and HDA, where ongoing progress is already motivating efforts at the other two levels. Second, we offer an account of some of the problems involved with creating a converged infrastructure for peripheral environments, that is, a shared infrastructure that can be deployed throughout the network in a scalable manner to meet the highly diverse requirements for processing, communication, and buffering/storage of massive data workflows of many different scientific domains. Third, we focus on some opportunities for software ecosystem convergence in big, logically centralized facilities that execute large-scale simulations and models and/or perform large-scale data analytics. We close by offering some conclusions and recommendations for future investment and policy review.

Download Full-text

IoT Big Data provenance scheme using blockchain on Hadoop ecosystem

Journal Of Big Data ◽

10.1186/s40537-021-00505-y ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Houshyar Honar Pajooh ◽

Mohammed A. Rashid ◽

Fakhrul Alam ◽

Serge Demidenko

Keyword(s):

Big Data ◽

Data Storage ◽

Large Scale ◽

Data System ◽

Third Party ◽

Data Provenance ◽

Distributed Data ◽

Distributed Data Storage ◽

Blockchain Technology ◽

Iot Devices

AbstractThe diversity and sheer increase in the number of connected Internet of Things (IoT) devices have brought significant concerns associated with storing and protecting a large volume of IoT data. Storage volume requirements and computational costs are continuously rising in the conventional cloud-centric IoT structures. Besides, dependencies of the centralized server solution impose significant trust issues and make it vulnerable to security risks. In this paper, a layer-based distributed data storage design and implementation of a blockchain-enabled large-scale IoT system are proposed. It has been developed to mitigate the above-mentioned challenges by using the Hyperledger Fabric (HLF) platform for distributed ledger solutions. The need for a centralized server and a third-party auditor was eliminated by leveraging HLF peers performing transaction verifications and records audits in a big data system with the help of blockchain technology. The HLF blockchain facilitates storing the lightweight verification tags on the blockchain ledger. In contrast, the actual metadata are stored in the off-chain big data system to reduce the communication overheads and enhance data integrity. Additionally, a prototype has been implemented on embedded hardware showing the feasibility of deploying the proposed solution in IoT edge computing and big data ecosystems. Finally, experiments have been conducted to evaluate the performance of the proposed scheme in terms of its throughput, latency, communication, and computation costs. The obtained results have indicated the feasibility of the proposed solution to retrieve and store the provenance of large-scale IoT data within the Big Data ecosystem using the HLF blockchain. The experimental results show the throughput of about 600 transactions, 500 ms average response time, about 2–3% of the CPU consumption at the peer process and approximately 10–20% at the client node. The minimum latency remained below 1 s however, there is an increase in the maximum latency when the sending rate reached around 200 transactions per second (TPS).

Download Full-text

Survey on Scientific Data Visualization for Large-scale Simulations

JAMSTEC Report of Research and Development ◽

10.5918/jamstecr.13.35 ◽

2011 ◽

Vol 13 ◽

pp. 35-63

Author(s):

Daisuke Matsuoka ◽

Fumiaki Araki

Keyword(s):

Data Visualization ◽

Large Scale ◽

Scientific Data ◽

Scientific Data Visualization ◽

Large Scale Simulations

Download Full-text

An algorithm to generate high dense packing of particles with various shapes

MATEC Web of Conferences ◽

10.1051/matecconf/201821905004 ◽

2018 ◽

Vol 219 ◽

pp. 05004 ◽

Cited By ~ 1

Author(s):

Konrad Miśkiewicz ◽

Robert Banasiak ◽

Maciej Niedostatkiewicz ◽

Krzysztof Grudzień ◽

Laurent Babout

Keyword(s):

Large Scale ◽

Bulk Material ◽

Dense Packing ◽

Major Research ◽

Dem Simulation ◽

Loose Packing ◽

Mixed Material ◽

Large Scale Simulations ◽

Initial Packing ◽

Simulation Time

Discrete Element Method (DEM) is one of available numerical methods to compute movement of particles in large scale simulations. The method has been frequently applied to simulate the cases of grain or bulk material as the major research issue. The paper describes a new method of generating high dense packing with mixed material of two different shape used in DEM simulation. The initial packing is an important parameter to control, because have influence on the first few seconds after start the simulation. Some-times when the material in silo is arranged with loose packing before the start, the particles move downward gravity. These changes between the start and the first few seconds in simulations act strongly on the results at the end of a discharging process in silo. At the initial simulation time it is important to prepare proper packing with mixed material, in order to make sure that the particles will not move due to gravity action. This solution is a necessary step to integrate in the simulation procedure in order to compare later the computer simulation with experimental measurements of material discharge in a silo.

Download Full-text

Big Data Framework for storage Extraction and Identification of Data using Hadoop Distributed File system

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1002.1292s319 ◽

2019 ◽

Vol 9 (2S3) ◽

pp. 392-394

Keyword(s):

Big Data ◽

Data Storage ◽

Open Source Software ◽

Large Scale ◽

Mixed Data ◽

Software Framework ◽

Distributed Data ◽

Distributed Data Storage ◽

Data Framework ◽

Hadoop Distributed File System

Big data is all about the developing challenge that associations face in today’s world, As they manage enormous and quickly developing wellsprings of information or data, with the complex range of analysis and the problem includes computing infrastructure, accessing mixed data both structured and unstructured data from various sources such as networking, Recording and stored images. Hadoop is the open source software framework includes no of compartments that are specifically designed for solving large-scale distributed data storage. MapReduce is a parallel programming design for processing

Download Full-text

Ingenious Techniques for Creation of Smart Cities by Big Data Technology & Urban Modelling Simulation by Matsim

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1856.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 1922-1927

Keyword(s):

Big Data ◽

Smart City ◽

Urban Areas ◽

Large Scale ◽

Smart Cities ◽

Distributed Data ◽

Spatially Distributed ◽

Urban Modelling ◽

Different Sources ◽

Big Data Technology

Ingenious Techniques for creation of Smart Cities by Big Data Technology & Urban modeling simulation by MATSimas the smart cities are on nascent stage in India. The extension of huge information and the advancement of Internet of Things (IoT) innovations have assumed a significant job in the practicality of keen city activities. Enormous information offer the potential for urban areas to get significant bits of knowledge from a lot of information gathered through different sources, and the IoT permits the joining of sensors, radiofrequency recognizable proof, and Bluetooth in reality condition utilizing exceedingly organized administrations. Thus the job of urban reenactment models and their perception are utilized to help territorial arranging offices assess elective transportation ventures, land use guidelines, and natural insurance arrangements. Typical urban simulations provide spatially distributed data about number of inhabitants, land prices, traffic, and other variables for ex- MATSim is an activity-based transport simulation framework designed to simulate large scale scenarios. Such technologies which have been developed in the past few years have proven to be very effective in smart cities of various countries. This project is an attempt to study the feasibility of such modified system, by understanding the implementation of such technologies to improve the existing smart cities and those which are about to become one. This is done by proposing an idea that is by implementing a big data server in the proposed smart city, the data will be collected through smart sensors which will then be sent to server and the mined data will be converted to simplified data for planners, engineers etc. in order to make a economic, self-sustainable & fully automated smart city

Download Full-text