Big Data Analytics Framework for Real-Time Genome Analysis: A Comprehensive Approach

Big Data Technologies are well-accepted in the recent years in bio-medical and genome informatics. They are capable to process gigantic and heterogeneous genome information with good precision and recall. With the quick advancements in computation and storage technologies, the cost of acquiring and processing the genomic data has decreased significantly. The upcoming sequencing platforms will produce vast amount of data, which will imperatively require high-performance systems for on-demand analysis with time-bound efficiency. Recent bio-informatics tools are capable of utilizing the novel features of Hadoop in a more flexible way. In particular, big data technologies such as MapReduce and Hive are able to provide high-speed computational environment for the analysis of petabyte scale datasets. This has attracted the focus of bio-scientists to use the big data applications to automate the entire genome analysis. The proposed framework is designed over MapReduce and Java on extended Hadoop platform to achieve the parallelism of Big Data Analysis. It will assist the bioinformatics community by providing a comprehensive solution for Descriptive, Comparative, Exploratory, Inferential, Predictive and Causal Analysis on Genome data. The proposed framework is user-friendly, fully-customizable, scalable and fit for comprehensive real-time genome analysis from data acquisition till predictive sequence analysis.

Download Full-text

High Performance Storage for Big Data Analytics and Visualization

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch010 ◽

2018 ◽

pp. 254-275

Author(s):

Armando Fandango ◽

William Rivera

Keyword(s):

Big Data ◽

High Speed ◽

High Performance ◽

File System ◽

Predictive Analytics ◽

Big Data Analytics ◽

File Systems ◽

Distributed Applications ◽

System Level ◽

File Formats

Scientific Big Data being gathered at exascale needs to be stored, retrieved and manipulated. The storage stack for scientific Big Data includes a file system at the system level for physical organization of the data, and a file format and input/output (I/O) system at the application level for logical organization of the data; both of them of high-performance variety for exascale. The high-performance file system is designed with concurrent access, high-speed transmission and fault tolerance characteristics. High-performance file formats and I/O are designed to allow parallel and distributed applications with easy and fast access to Big Data. These specialized file formats make it easier to store and access Big Data for scientific visualization and predictive analytics. This chapter provides a brief review of the characteristics of high-performance file systems such as Lustre and GPFS, and high-performance file formats such as HDF5, NetCDF, MPI-IO, and HDFS.

Download Full-text

Big Data Analytics With Reference To Wide Variety of Domains

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i8953.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1100-1104

Keyword(s):

Health Care ◽

Social Networks ◽

Big Data ◽

Data Analytics ◽

High Speed ◽

Big Data Analytics ◽

Huge Amount ◽

Government Sector ◽

Big Data Technologies ◽

Survey Results

In the current day scenario, a huge amount of data is been generated from various heterogeneous sources like social networks, business apps, government sector, marketing, health care system, sensors, machine log data which is created at such a high speed and other sources. Big Data is chosen as one among the upcoming area of research by several industries. In this paper, the author presents wide collection of literature that has been reviewed and analyzed. This paper emphasizes on Big Data Technologies, Application & Challenges, a comparative study on architectures, methodologies, tools, and survey results proposed by various researchers are presented

Download Full-text

Big data analytics architecture for real-time traffic control

2017 5th IEEE International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS) ◽

10.1109/mtits.2017.8005605 ◽

2017 ◽

Cited By ~ 36

Author(s):

Sasan Amini ◽

Ilias Gerostathopoulos ◽

Christian Prehofer

Keyword(s):

Big Data ◽

Real Time ◽

Traffic Control ◽

Data Analytics ◽

Big Data Analytics ◽

Real Time Traffic

Download Full-text

High performance deep learning techniques for big data analytics

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.5032 ◽

2018 ◽

Vol 30 (23) ◽

pp. e5032

Author(s):

Maozhen Li

Keyword(s):

Big Data ◽

Deep Learning ◽

Data Analytics ◽

High Performance ◽

Big Data Analytics ◽

Learning Techniques

Download Full-text

Light-Weight, Self-Powered Sensor Based on Triboelectric Nanogenerator for Big Data Analytics in Sports

Electronics ◽

10.3390/electronics10192322 ◽

2021 ◽

Vol 10 (19) ◽

pp. 2322

Author(s):

Xiaofei Ma ◽

Xuan Liu ◽

Xinxing Li ◽

Yunfei Ma

Keyword(s):

Big Data ◽

Real Time ◽

Data Analytics ◽

Mechanical Energy ◽

Rapid Development ◽

Big Data Analytics ◽

Triboelectric Nanogenerator ◽

Light Weight ◽

Table Tennis ◽

Self Powered

With the rapid development of the Internet of Things (IoTs), big data analytics has been widely used in the sport field. In this paper, a light-weight, self-powered sensor based on a triboelectric nanogenerator for big data analytics in sports has been demonstrated. The weight of each sensing unit is ~0.4 g. The friction material consists of polyaniline (PANI) and polytetrafluoroethylene (PTFE). Based on the triboelectric nanogenerator (TENG), the device can convert small amounts of mechanical energy into the electrical signal, which contains information about the hitting position and hitting velocity of table tennis balls. By collecting data from daily table tennis training in real time, the personalized training program can be adjusted. A practical application has been exhibited for collecting table tennis information in real time and, according to these data, coaches can develop personalized training for an amateur to enhance the ability of hand control, which can improve their table tennis skills. This work opens up a new direction in intelligent athletic facilities and big data analytics.

Download Full-text

Big Data and IT Network Data Visualization

International Journal of Mathematical Engineering and Management Sciences ◽

10.33889/ijmems.2018.3.1-002 ◽

2018 ◽

Vol 3 (1) ◽

pp. 9-16 ◽

Cited By ~ 3

Author(s):

Lidong Wang

Keyword(s):

Big Data ◽

Network Analysis ◽

Graphics Processing Units ◽

Data Analytics ◽

High Performance ◽

Big Data Analytics ◽

Network Visualization ◽

Network Data ◽

Graphics Processing ◽

Performance Computing

Visualization with graphs is popular in the data analysis of Information Technology (IT) networks or computer networks. An IT network is often modelled as a graph with hosts being nodes and traffic being flows on many edges. General visualization methods are introduced in this paper. Applications and technology progress of visualization in IT network analysis and big data in IT network visualization are presented. The challenges of visualization and Big Data analytics in IT network visualization are also discussed. Big Data analytics with High Performance Computing (HPC) techniques, especially Graphics Processing Units (GPUs) helps accelerate IT network analysis and visualization.

Download Full-text

Workstation benchmark of Spark Capable Genome Analysis ToolKit 4 Variant Calling

10.1101/2020.05.17.101105 ◽

2020 ◽

Author(s):

Marcus H. Hansen ◽

Anita T. Simonsen ◽

Hans B. Ommen ◽

Charlotte G. Nyvold

Keyword(s):

Dna Sequencing ◽

Genome Analysis ◽

High Speed ◽

High Performance ◽

Variant Calling ◽

Amplicon Sequencing ◽

Targeted Sequencing ◽

Sequencing Analysis ◽

Genome Analysis Toolkit ◽

Order Of Magnitude

AbstractBackgroundRapid and practical DNA-sequencing processing has become essential for modern biomedical laboratories, especially in the field of cancer, pathology and genetics. While sequencing turn-over time has been, and still is, a bottleneck in research and diagnostics, the field of bioinformatics is moving at a rapid pace – both in terms of hardware and software development. Here, we benchmarked the local performance of three of the most important Spark-enabled Genome analysis toolkit 4 (GATK4) tools in a targeted sequencing workflow: Duplicate marking, base quality score recalibration (BQSR) and variant calling on targeted DNA sequencing using a modest hyperthreading 12-core single CPU and a high-speed PCI express solid-state drive.ResultsCompared to the previous GATK version the performance of Spark-enabled BQSR and HaplotypeCaller is shifted towards a more efficient usage of the available cores on CPU and outperforms the earlier GATK3.8 version with an order of magnitude reduction in processing time to analysis ready variants, whereas MarkDuplicateSpark was found to be thrice as fast. Furthermore, HaploTypeCallerSpark and BQSRPipelineSpark were significantly faster than the equivalent GATK4 standard tools with a combined ∼86% reduction in execution time, reaching a median rate of ten million processed bases per second, and duplicate marking was reduced ∼42%. The called variants were found to be in close agreement between the Spark and non-Spark versions, with an overall concordance of 98%. In this setup, the tools were also highly efficient when compared execution on a small 72 virtual CPU/18-node Google Cloud cluster.ConclusionIn conclusion, GATK4 offers practical parallelization possibilities for DNA sequence processing, and the Spark-enabled tools optimize performance and utilization of local CPUs. Spark utilizing GATK variant calling is several times faster than previous GATK3.8 multithreading with the same multi-core, single CPU, configuration. The improved opportunities for parallel computations not only hold implications for high-performance cluster, but also for modest laboratory or research workstations for targeted sequencing analysis, such as exome, panel or amplicon sequencing.

Download Full-text

Cyber-Physical Production Networks, Real-Time Big Data Analytics, and Cognitive Automation in Sustainable Smart Manufacturing

Journal of Self-Governance and Management Economics ◽

10.22381/jsme8220203 ◽

2020 ◽

Vol 8 (2) ◽

pp. 21

Keyword(s):

Big Data ◽

Real Time ◽

Data Analytics ◽

Big Data Analytics ◽

Smart Manufacturing ◽

Production Networks ◽

Cognitive Automation

Download Full-text

Engineering Scalable Distributed Services for Real-Time Big Data Analytics

2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService) ◽

10.1109/bigdataservice.2017.22 ◽

2017 ◽

Author(s):

Sahar Jambi ◽

Kenneth M. Anderson

Keyword(s):

Big Data ◽

Real Time ◽

Data Analytics ◽

Big Data Analytics ◽

Distributed Services

Download Full-text

Modeling Big Data Analytics with a Real-Time Executable Specification Language

Handbook of Research on Trends and Future Directions in Big Data and Web Intelligence - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-8505-5.ch014 ◽

2015 ◽

pp. 289-312

Author(s):

Amir A. Khwaja

Keyword(s):

Big Data ◽

Data Processing ◽

Real Time ◽

Data Analytics ◽

Big Data Analytics ◽

Exception Handling ◽

Specification Language ◽

Big Data Processing ◽

Business Decisions ◽

Concurrent Processes

Big data explosion has already happened and the situation is only going to exacerbate with such a high number of data sources and high-end technology prevalent everywhere, generating data at a frantic pace. One of the most important aspects of big data is being able to capture, process, and analyze data as it is happening in real-time to allow real-time business decisions. Alternate approaches must be investigated especially consisting of highly parallel and real-time computations for big data processing. The chapter presents RealSpec real-time specification language that may be used for the modeling of big data analytics due to the inherent language features needed for real-time big data processing such as concurrent processes, multi-threading, resource modeling, timing constraints, and exception handling. The chapter provides an overview of RealSpec and applies the language to a detailed big data event recognition case study to demonstrate language applicability to big data framework and analytics modeling.

Download Full-text