Genome Subsequences Assembly Using Approximate Matching Techniques in Hadoop

Author(s):  
Govindan Raja ◽  
U. Srinivasulu Reddy

Sequencing DNA will provide valuable insights into several aspects of human life. The major requirement of this domain is for a faster and more accurate sequencing mechanism. The process becomes difficult due to the huge size of DNA. This paper presents an effective genome assembly technique in Hadoop architecture using MapReduce. The fragment assembly is based on initially matching the subsequences and then depending on the matching levels, the final complete matching subsequences are filtered. The consensus alignment and recalibration are performed using Greedy approximate matching techniques. The experimental results show that our approach is more accurate and exhibits better coverage; however, the processing time is found to be high. In future, our contributions will be based on reducing the processing time. Discussions about these techniques are also presented in this paper.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Hossein Ahmadvand ◽  
Fouzhan Foroutan ◽  
Mahmood Fathy

AbstractData variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.


2012 ◽  
Vol 2012 ◽  
pp. 1-16 ◽  
Author(s):  
Khader Mohammad ◽  
Sos Agaian

Text embedded in an image contains useful information for applications in the medical, industrial, commercial, and research fields. While many systems have been designed to correctly identify text in images, no work addressing the recognition of degraded text on clear plastic has been found. This paper posits novel methods and an apparatus for extracting text from an image with the practical assumption: (a) poor background contrast, (b) white, curved, and/or differing fonts or character width between sets of images, (c) dotted text printed on curved reflective material, and/or (d) touching characters. Methods were evaluated using a total of 100 unique test images containing a variety of texts captured from water bottles. These tests averaged a processing time of ~10 seconds (using MATLAB R2008A on an HP 8510 W with 4 G of RAM and 2.3 GHz of processor speed), and experimental results yielded an average recognition rate of 90 to 93% using customized systems generated by the proposed development.


2017 ◽  
Vol 14 (2) ◽  
pp. 347-367 ◽  
Author(s):  
Elkhan Jabarov ◽  
Byung-Won On ◽  
Gyu Choi ◽  
Myong-Soon Park

Nowadays, many applications use spatial data for instance-location information, so storing spatial data is important.We suggest using R -Tree over PCM. Our objective is to design a PCM-sensitive R -Tree that can store spatial data as well as improve the endurance problem. Initially, we examine how R -Tree causes endurance problems in PCM, and we then optimize it for PCM. We propose doubling the leaf node size, writing a split node to a blank node, updating parent nodes only once and not merging the nodes after deletion when the minimum fill factor requirement does not meet. Based on our experimental results while using benchmark dataset, the number of write operations to PCM in average decreased by 56 times by using the proposed R -Tree. Moreover, the proposed R -Tree scheme improves the performance in terms of processing time in average 23% compared to R -Tree.


2004 ◽  
Vol 120 ◽  
pp. 555-562
Author(s):  
D. Apelian ◽  
S. K. Chaudhury

Heat Treatment and post casting treatments of cast components has always been an important step in the control of microstructure, and resultant properties. In the past, the solutionizing, quenching and ageing process steps may have “required” in total over 20 hours of processing time. With the advent of fluidized bed reactors (FB), processing time has been dramatically reduced. For example, instead of 8-10 hours solutionizing time in a conventional furnace, the time required in FB is less than an hour. Experiments with Al-Si-Mg alloy, (both modified with Sr, and unmodified) were performed, having different diffusion distances (different DAS), and for different reaction times and temperatures. Both the model and the experimental results are presented and discussed.


Author(s):  
Seung-Yong Yoon ◽  
◽  
Hirohisa Seki

We propose a parallel algorithm for mining non-redundant recurrent rules from a sequence database. Recurrent rules, proposed by Lo et al. [1], can express “Whenever a series of precedent events occurs, eventually a series of consequent events occurs,” and they have shown the usefulness of recurrent rules in various domains, including software specification and verification. Although some algorithms such as NR3 have been proposed, mining non-redundant recurrent rules still requires considerable processing time. To reduce the computation cost, we present a parallel approach to mining non-redundant recurrent rules, which fully utilizes the task-parallelism in NR3. We also give some experimental results, which show the effectiveness of our proposed method.


2021 ◽  
Author(s):  
Lele Yu ◽  
Shaowu Zhang ◽  
Yijia Zhang ◽  
Hongfei Lin

BACKGROUND Happiness refers to the joyful and pleasant emotions that humans produce subjectively. It is the positive part of emotions, and it affects the quality of human life. Therefore, understanding human happiness is a meaningful task in sentiment analysis. We mainly discuss two facets (Agency/Sociality) of happiness in this study. Through analysis and research on happiness, we can expand on new concepts that define happiness and enrich our understanding of emotions. OBJECTIVE In this paper, we treated each happy moment as a sequence of short sentences, then proposed a short happiness detection model based on transfer learning to analyze the Agency and Sociality aspects of happiness. METHODS Happiness analysis is a novel and challenging research task. However, the current dataset in the field of happiness is small. To solve this problem,we utilized the unlabeled training set and transfer learning to train a semantically enhanced language model in the target domain. Then, the trained language model with domain characteristics was further combined with other deep learning models to obtain various models. Finally, we used the improved voting strategy to further improve the experimental results. RESULTS The proposed approach was evaluated on the public dataset. Experimental results showed that our approach significantly outperforms the baselines. When predicting the Agency aspect of happiness, our approach achieved an accuracy of 0.8574 and an F1 score of 0.90, repectively. When predicting Sociality, our approach achieved an accuracy of 0.928 and an F1 score of 0.9360, respectively. CONCLUSIONS Through the evaluation of the dataset, the comparison results demonstrated the effectiveness of our approach for happiness analysis. Experimental results confirmed that our method achieved state-of-the-art performance and transfer learning effectively improved happiness analysis.


2020 ◽  
Author(s):  
Leonardo Andrade Ribeiro ◽  
Felipe Ferreira Borges ◽  
Diego Junior do Carmo Oliveira

Set similarity join, which finds all pairs of similar sets in a collection, plays an important role in data cleaning and integration. Many algorithms have been proposed to efficiently answer set similarity join on single-attribute data. However, real-world data often contain multiple attributes. In this paper, we propose a framework to enhance existing algorithms with additional filters for dealing with multi-attribute data. We then present a simple, yet effective filter based on lightweight indexes, for which exact and probabilistic implementation alternatives are evaluated. Finally, we devise a cost model to identify the best attribute ordering to reduce processing time. Our experimental results show that our approach is effective and significantly outperforms previous work.


2021 ◽  
Author(s):  
Hossein Ahmadvand ◽  
Fouzhan Foroutan ◽  
Mahmood Fathy

Abstract Data variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked from previous work. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.


2020 ◽  
Vol 17 (1) ◽  
pp. 21-26
Author(s):  
Akilandeswari Jeyapal ◽  
Jothi Ganesan ◽  
Sabeenian Royappan Savarimuthu ◽  
Iyyanar Perumal ◽  
Paramasivam Muthan Eswaran ◽  
...  

A development of automatic location identification and tracking system for visually impaired/ challenged person is a very challenging task in an indoor environment. In this paper, the comprehensive study of different feature detection and matching techniques namely, Minimum Eigenvalue (MinEigen) algorithm, Harris–Stephens (Harris) algorithm, Speeded Up Robust Features (SURF), Features from Accelerated Segment Test (FAST), Binary Robust Invariant Scalable Keypoints (BRISK) and Maximally Stable Extremal Regions (MSER) is presented. These algorithms are employed to detect and match the features of an image and retrieve the best matched image. Based on our experiments, we compare those algorithms on parameters such as sum of square difference (SDD), precision, recall, number of detected, matched features and processing time. Empirically, we have found that SURF algorithm produce minimum SSD score to achieve best matching. The MSER and MinEign algorithm extracts high and low number of features respectively. In respect of processing time, BRISK takes maximum and FAST method takes minimum time when compared to the other algorithms.


Sign in / Sign up

Export Citation Format

Share Document