Referential DNA Data Compression using Hadoop Map Reduce Framework

Raju Bhukya; Sumit Deshmuk

doi:10.34028/iajit/17/2/8

Referential DNA Data Compression using Hadoop Map Reduce Framework

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/2/8 ◽

2019 ◽

Vol 17 (2) ◽

pp. 207-214

Author(s):

Raju Bhukya ◽

Sumit Deshmuk

Keyword(s):

Data Compression ◽

Dna Sequences ◽

High Performance ◽

Deoxyribonucleic Acid ◽

Genetic Data ◽

Disk Arrays ◽

Distributed Environment ◽

Mapreduce Framework ◽

Hadoop Mapreduce ◽

Time Required

The indispensable knowledge of Deoxyribonucleic Acid (DNA) sequences and sharply reducing cost of the DNA sequencing techniques has attracted numerous researchers in the field of Genetics. These sequences are getting available at an exponential rate leading to the bulging size of molecular biology databases making large disk arrays and compute clusters inevitable for analysis.In this paper, we proposed referential DNA data compression using hadoop MapReduce Framework to process humongous amount of genetic data in distributed environment on high performance compute clusters. Our method has successfully achieved a better balance between compression ratio and the amount of time required for DNA data compression as compared to other Referential DNA Data Compression methods.

Download Full-text

High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2017.01.08 ◽

2017 ◽

Vol 9 (1) ◽

pp. 75-84 ◽

Cited By ~ 1

Author(s):

Guru Prasad M S ◽

Nagesh H R ◽

Swathi Prabhu

Keyword(s):

Big Data ◽

Performance Optimization ◽

High Performance ◽

Optimization Approach ◽

Mapreduce Framework ◽

Transaction Data ◽

Hadoop Mapreduce ◽

Frequent Item ◽

Mining Algorithm ◽

High Performance Computation

Download Full-text

Optimizing the Hadoop MapReduce Framework with high-performance storage devices

The Journal of Supercomputing ◽

10.1007/s11227-015-1447-3 ◽

2015 ◽

Vol 71 (9) ◽

pp. 3525-3548 ◽

Cited By ~ 20

Author(s):

Sangwhan Moon ◽

Jaehwan Lee ◽

Xiling Sun ◽

Yang-suk Kee

Keyword(s):

High Performance ◽

Mapreduce Framework ◽

Storage Devices ◽

Hadoop Mapreduce

Download Full-text

Denaturing High Performance Liquid Chromatography and Bioinformatics - Two Modern Tools for Extracellular Superoxide Dismutase (SOD3) Gene Promoter Analysis

Revista de Chimie ◽

10.37358/rc.08.7.1893 ◽

2008 ◽

Vol 59 (7) ◽

Author(s):

Corina Samoila ◽

Alfa Xenia Lupea ◽

Andrei Anghel ◽

Marilena Motoc ◽

Gabriela Otiman ◽

...

Keyword(s):

High Performance Liquid Chromatography ◽

Transcription Factors ◽

Liquid Chromatography ◽

Dna Sequences ◽

High Performance ◽

Zinc Finger Protein ◽

High Capacity ◽

Gene Promoter ◽

Experimental Approaches ◽

Myeloid Zinc Finger

Denaturing High Performance Liquid Chromatography (DHPLC) is a relatively new method used for screening DNA sequences, characterized by high capacity to detect mutations/polymorphisms. This study is focused on the Transgenomic WAVETM DNA Fragment Analysis (based on DHPLC separation method) of a 485 bp fragment from human EC-SOD gene promoter in order to detect single nucleotide polymorphism (SNPs) associated with atherosclerosis and risk factors of cardiovascular disease. The fragment of interest was amplified by PCR reaction and analyzed by DHPLC in 100 healthy subjects and 70 patients characterized by atheroma. No different melting profiles were detected for the analyzed DNA samples. A combination of computational methods was used to predict putative transcription factors in the fragment of interest. Several putative transcription factors binding sites from the Ets-1 oncogene family: ETS member Elk-1, polyomavirus enhancer activator-3 (PEA3), protein C-Ets-1 (Ets-1), GABP: GA binding protein (GABP), Spi-1 and Spi-B/PU.1 related transcription factors, from the Krueppel-like family: Gut-enriched Krueppel-like factor (GKLF), Erythroid Krueppel-like factor (EKLF), Basic Krueppel-like factor (BKLF), GC box and myeloid zinc finger protein MZF-1 were identified in the evolutionary conserved regions. The bioinformatics results need to be investigated further in others studies by experimental approaches.

Download Full-text

Statistical and machine learning models for optimizing energy in parallel applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019842915 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1079-1097 ◽

Cited By ~ 2

Author(s):

Mark Endrei ◽

Chao Jin ◽

Minh Ngoc Dinh ◽

David Abramson ◽

Heidi Poxon ◽

...

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

High Performance ◽

Large Scale ◽

Energy Use ◽

Parallel Applications ◽

Learning Models ◽

Trade Off ◽

Time Required ◽

Machine Learning Models

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Download Full-text

A New Statistic for Detecting Genetic Differentiation

Genetics ◽

10.1093/genetics/155.4.2011 ◽

2000 ◽

Vol 155 (4) ◽

pp. 2011-2014 ◽

Cited By ~ 10

Author(s):

Richard R Hudson

Keyword(s):

Genetic Differentiation ◽

Dna Sequences ◽

Genetic Data ◽

Island Model ◽

Wide Range ◽

Infinite Sites Model ◽

Parameter Values

Abstract A new statistic for detecting genetic differentiation of subpopulations is described. The statistic can be calculated when genetic data are collected on individuals sampled from two or more localities. It is assumed that haplotypic data are obtained, either in the form of DNA sequences or data on many tightly linked markers. Using a symmetric island model, and assuming an infinite-sites model of mutation, it is found that the new statistic is as powerful or more powerful than previously proposed statistics for a wide range of parameter values.

Download Full-text

Computational storage: an efficient and scalable platform for big data and HPC applications

Journal Of Big Data ◽

10.1186/s40537-019-0265-5 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Mahdi Torabzadehkashi ◽

Siavash Rezaei ◽

Ali HeydariGorji ◽

Hosein Bobarshad ◽

Vladimir Alves ◽

...

Keyword(s):

Big Data ◽

High Performance ◽

Distributed Processing ◽

Data Access ◽

Distributed Applications ◽

Process Data ◽

Storage Devices ◽

Hadoop Mapreduce ◽

Big Data Applications ◽

Application Processor

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.

Download Full-text

A high sensitivity wireless mass-loading surface acoustic wave DNA biosensor

Modern Physics Letters B ◽

10.1142/s0217984914500560 ◽

2014 ◽

Vol 28 (07) ◽

pp. 1450056 ◽

Cited By ~ 9

Author(s):

Hua-Lin Cai ◽

Yi Yang ◽

Yi-Han Zhang ◽

Chang-Jian Zhou ◽

Cang-Ran Guo ◽

...

Keyword(s):

Acoustic Wave ◽

Surface Acoustic Wave ◽

Dna Sequences ◽

Biological Treatment ◽

High Performance ◽

Processing System ◽

High Sensitivity ◽

Treatment Method ◽

Saw Sensor ◽

Target Dna

In this paper, a surface acoustic wave (SAW) biosensor with gold delay area on LiNbO 3 substrate detecting DNA sequences is proposed. By well-designed device parameters of the SAW sensor, it achieves a high performance for highly sensitive detection of target DNA. In addition, an effective biological treatment method for DNA immobilization and abundant experimental verification of the sensing effect have made it a reliable device in DNA detection. The loading mass of the probe and target DNA sequences is obtained from the frequency shifts, which are big enough in this work due to an effective biological treatment. The experimental results show that the biosensor has a high sensitivity of 1.2 pg/ml/Hz and high selectivity characteristic is also verified by the few responses of other substances. In combination with wireless transceiver, we develop a wireless receiving and processing system that can directly display the detection results.

Download Full-text

Electrochemically Active DNA Probes: Detection of Target DNA Sequences at Femtomole Level by High-Performance Liquid Chromatography with Electrochemical Detection

Analytical Biochemistry ◽

10.1006/abio.1994.1203 ◽

1994 ◽

Vol 218 (2) ◽

pp. 436-443 ◽

Cited By ~ 69

Author(s):

S. Takenaka ◽

Y. Uto ◽

H. Kondo ◽

T. Ihara ◽

M. Takagi

Keyword(s):

High Performance Liquid Chromatography ◽

Liquid Chromatography ◽

Electrochemical Detection ◽

Dna Sequences ◽

High Performance ◽

Dna Probes ◽

Target Dna ◽

Electrochemically Active

Download Full-text

High performance SAR processing on an heterogeneous distributed environment

High-Performance Computing and Networking - Lecture Notes in Computer Science ◽

10.1007/bfb0037262 ◽

1998 ◽

pp. 1018-1020 ◽

Cited By ~ 1

Author(s):

F. P. Lovergine ◽

N. Veneziani

Keyword(s):

High Performance ◽

Distributed Environment ◽

Sar Processing

Download Full-text

Weighted Finite Automata Based Image Compression on Hadoop MapReduce Framework

2015 IEEE International Congress on Big Data ◽

10.1109/bigdatacongress.2015.101 ◽

2015 ◽

Cited By ~ 1

Author(s):

U.S.N. Raju ◽

Irlanki Sandeep ◽

Nattam Sai Karthik ◽

Rayapudi Siva Praveen ◽

Mayank Singh Sachan

Keyword(s):

Image Compression ◽

Finite Automata ◽

Mapreduce Framework ◽

Hadoop Mapreduce ◽

Weighted Finite Automata

Download Full-text