Genome Subsequences Assembly Using Approximate Matching Techniques in Hadoop

Sequencing DNA will provide valuable insights into several aspects of human life. The major requirement of this domain is for a faster and more accurate sequencing mechanism. The process becomes difficult due to the huge size of DNA. This paper presents an effective genome assembly technique in Hadoop architecture using MapReduce. The fragment assembly is based on initially matching the subsequences and then depending on the matching levels, the final complete matching subsequences are filtered. The consensus alignment and recalibration are performed using Greedy approximate matching techniques. The experimental results show that our approach is more accurate and exhibits better coverage; however, the processing time is found to be high. In future, our contributions will be based on reducing the processing time. Discussions about these techniques are also presented in this paper.

Download Full-text

DV-DVFS: merging data variety and DVFS technique to manage the energy consumption of big data processing

Journal Of Big Data ◽

10.1186/s40537-021-00437-7 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Hossein Ahmadvand ◽

Fouzhan Foroutan ◽

Mahmood Fathy

Keyword(s):

Big Data ◽

Energy Consumption ◽

Processing Time ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Multiple Sources ◽

Evaluation Phase ◽

Dynamic Voltage ◽

Processing Resources

AbstractData variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.

Download Full-text

Practical Recognition System for Text Printed on Clear Reflected Material

ISRN Machine Vision ◽

10.5402/2012/253863 ◽

2012 ◽

Vol 2012 ◽

pp. 1-16 ◽

Cited By ~ 3

Author(s):

Khader Mohammad ◽

Sos Agaian

Keyword(s):

Processing Time ◽

Recognition Rate ◽

Recognition System ◽

Experimental Results ◽

Research Fields ◽

Average Recognition Rate ◽

Clear Plastic ◽

Novel Methods ◽

Processor Speed ◽

Background Contrast

Text embedded in an image contains useful information for applications in the medical, industrial, commercial, and research fields. While many systems have been designed to correctly identify text in images, no work addressing the recognition of degraded text on clear plastic has been found. This paper posits novel methods and an apparatus for extracting text from an image with the practical assumption: (a) poor background contrast, (b) white, curved, and/or differing fonts or character width between sets of images, (c) dotted text printed on curved reflective material, and/or (d) touching characters. Methods were evaluated using a total of 100 unique test images containing a variety of texts captured from water bottles. These tests averaged a processing time of ~10 seconds (using MATLAB R2008A on an HP 8510 W with 4 G of RAM and 2.3 GHz of processor speed), and experimental results yielded an average recognition rate of 90 to 93% using customized systems generated by the proposed development.

Download Full-text

R-Tree for phase change memory

Computer Science and Information Systems ◽

10.2298/csis160620008j ◽

2017 ◽

Vol 14 (2) ◽

pp. 347-367 ◽

Cited By ~ 2

Author(s):

Elkhan Jabarov ◽

Byung-Won On ◽

Gyu Choi ◽

Myong-Soon Park

Keyword(s):

Phase Change ◽

Spatial Data ◽

Processing Time ◽

Fill Factor ◽

Phase Change Memory ◽

Experimental Results ◽

Location Information ◽

Node Size ◽

Blank Node ◽

Change Memory

Nowadays, many applications use spatial data for instance-location information, so storing spatial data is important.We suggest using R -Tree over PCM. Our objective is to design a PCM-sensitive R -Tree that can store spatial data as well as improve the endurance problem. Initially, we examine how R -Tree causes endurance problems in PCM, and we then optimize it for PCM. We propose doubling the leaf node size, writing a split node to a blank node, updating parent nodes only once and not merging the nodes after deletion when the minimum fill factor requirement does not meet. Based on our experimental results while using benchmark dataset, the number of write operations to PCM in average decreased by 56 times by using the proposed R -Tree. Moreover, the proposed R -Tree scheme improves the performance in terms of processing time in average 23% compared to R -Tree.

Download Full-text

Fluidized bed heat treatment of aluminum cast components

Journal de Physique IV (Proceedings) ◽

10.1051/jp4:2004120064 ◽

2004 ◽

Vol 120 ◽

pp. 555-562

Author(s):

D. Apelian ◽

S. K. Chaudhury

Keyword(s):

Heat Treatment ◽

Fluidized Bed ◽

Processing Time ◽

Reaction Times ◽

Experimental Results ◽

Fluidized Bed Reactors ◽

The Past ◽

Conventional Furnace ◽

Time Required ◽

Solutionizing Time

Heat Treatment and post casting treatments of cast components has always been an important step in the control of microstructure, and resultant properties. In the past, the solutionizing, quenching and ageing process steps may have “required” in total over 20 hours of processing time. With the advent of fluidized bed reactors (FB), processing time has been dramatically reduced. For example, instead of 8-10 hours solutionizing time in a conventional furnace, the time required in FB is less than an hour. Experiments with Al-Si-Mg alloy, (both modified with Sr, and unmodified) were performed, having different diffusion distances (different DAS), and for different reaction times and temperatures. Both the model and the experimental results are presented and discussed.

Download Full-text

How Cuckoo Filter Can Improve Existing Approximate Matching Techniques

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Digital Forensics and Cyber Crime ◽

10.1007/978-3-319-25512-5_4 ◽

2015 ◽

pp. 39-52 ◽

Cited By ~ 5

Author(s):

Vikas Gupta ◽

Frank Breitinger

Keyword(s):

Approximate Matching ◽

Cuckoo Filter ◽

Matching Techniques

Download Full-text

A Parallel Algorithm for Mining Non-Redundant Recurrent Rules from a Sequence Database

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2019.p0956 ◽

2019 ◽

Vol 23 (5) ◽

pp. 956-961

Author(s):

Seung-Yong Yoon ◽

◽

Hirohisa Seki

Keyword(s):

Parallel Algorithm ◽

Processing Time ◽

Experimental Results ◽

Task Parallelism ◽

Sequence Database ◽

Computation Cost ◽

Software Specification ◽

Specification And Verification

We propose a parallel algorithm for mining non-redundant recurrent rules from a sequence database. Recurrent rules, proposed by Lo et al. [1], can express “Whenever a series of precedent events occurs, eventually a series of consequent events occurs,” and they have shown the usefulness of recurrent rules in various domains, including software specification and verification. Although some algorithms such as NR3 have been proposed, mining non-redundant recurrent rules still requires considerable processing time. To reduce the computation cost, we present a parallel approach to mining non-redundant recurrent rules, which fully utilizes the task-parallelism in NR3. We also give some experimental results, which show the effectiveness of our proposed method.

Download Full-text

Improving Human Happiness Analysis based on Transfer Learning：Algorithm Development and Validation (Preprint)

10.2196/preprints.28292 ◽

2021 ◽

Author(s):

Lele Yu ◽

Shaowu Zhang ◽

Yijia Zhang ◽

Hongfei Lin

Keyword(s):

Transfer Learning ◽

Human Life ◽

Language Model ◽

Experimental Results ◽

Target Domain ◽

Detection Model ◽

Human Happiness ◽

Comparison Results ◽

Semantically Enhanced ◽

Voting Strategy

BACKGROUND Happiness refers to the joyful and pleasant emotions that humans produce subjectively. It is the positive part of emotions, and it affects the quality of human life. Therefore, understanding human happiness is a meaningful task in sentiment analysis. We mainly discuss two facets (Agency/Sociality) of happiness in this study. Through analysis and research on happiness, we can expand on new concepts that define happiness and enrich our understanding of emotions. OBJECTIVE In this paper, we treated each happy moment as a sequence of short sentences, then proposed a short happiness detection model based on transfer learning to analyze the Agency and Sociality aspects of happiness. METHODS Happiness analysis is a novel and challenging research task. However, the current dataset in the field of happiness is small. To solve this problem，we utilized the unlabeled training set and transfer learning to train a semantically enhanced language model in the target domain. Then, the trained language model with domain characteristics was further combined with other deep learning models to obtain various models. Finally, we used the improved voting strategy to further improve the experimental results. RESULTS The proposed approach was evaluated on the public dataset. Experimental results showed that our approach significantly outperforms the baselines. When predicting the Agency aspect of happiness, our approach achieved an accuracy of 0.8574 and an F1 score of 0.90, repectively. When predicting Sociality, our approach achieved an accuracy of 0.928 and an F1 score of 0.9360, respectively. CONCLUSIONS Through the evaluation of the dataset, the comparison results demonstrated the effectiveness of our approach for happiness analysis. Experimental results confirmed that our method achieved state-of-the-art performance and transfer learning effectively improved happiness analysis.

Download Full-text

A Framework for Set Similarity Join on Multi-Attribute Data

10.5753/sbbd.2020.13625 ◽

2020 ◽

Author(s):

Leonardo Andrade Ribeiro ◽

Felipe Ferreira Borges ◽

Diego Junior do Carmo Oliveira

Keyword(s):

Real World ◽

Processing Time ◽

Data Cleaning ◽

Cost Model ◽

Experimental Results ◽

Real World Data ◽

Similarity Join ◽

World Data ◽

Attribute Data ◽

Single Attribute

Set similarity join, which finds all pairs of similar sets in a collection, plays an important role in data cleaning and integration. Many algorithms have been proposed to efficiently answer set similarity join on single-attribute data. However, real-world data often contain multiple attributes. In this paper, we propose a framework to enhance existing algorithms with additional filters for dealing with multi-attribute data. We then present a simple, yet effective filter based on lightweight indexes, for which exact and probabilistic implementation alternatives are evaluated. Finally, we devise a cost model to identify the best attribute ordering to reduce processing time. Our experimental results show that our approach is effective and significantly outperforms previous work.

Download Full-text

DV-DVFS: Merging Data Variety and DVFS Technique to Manage the Energy Consumption of Big Data Processing

10.21203/rs.3.rs-45414/v4 ◽

2021 ◽

Author(s):

Hossein Ahmadvand ◽

Fouzhan Foroutan ◽

Mahmood Fathy

Keyword(s):

Big Data ◽

Energy Consumption ◽

Processing Time ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Multiple Sources ◽

Evaluation Phase ◽

Dynamic Voltage ◽

Processing Resources

Abstract Data variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked from previous work. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.

Download Full-text

A Comparative Study of Feature Detection Techniques for Navigation of Visually Impaired Person in an Indoor Environment

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.8623 ◽

2020 ◽

Vol 17 (1) ◽

pp. 21-26

Author(s):

Akilandeswari Jeyapal ◽

Jothi Ganesan ◽

Sabeenian Royappan Savarimuthu ◽

Iyyanar Perumal ◽

Paramasivam Muthan Eswaran ◽

...

Keyword(s):

Visually Impaired ◽

Indoor Environment ◽

Processing Time ◽

Feature Detection ◽

Tracking System ◽

Fast Method ◽

Detection Techniques ◽

Speeded Up Robust Features ◽

Impaired Person ◽

Matching Techniques

A development of automatic location identification and tracking system for visually impaired/ challenged person is a very challenging task in an indoor environment. In this paper, the comprehensive study of different feature detection and matching techniques namely, Minimum Eigenvalue (MinEigen) algorithm, Harris–Stephens (Harris) algorithm, Speeded Up Robust Features (SURF), Features from Accelerated Segment Test (FAST), Binary Robust Invariant Scalable Keypoints (BRISK) and Maximally Stable Extremal Regions (MSER) is presented. These algorithms are employed to detect and match the features of an image and retrieve the best matched image. Based on our experiments, we compare those algorithms on parameters such as sum of square difference (SDD), precision, recall, number of detected, matched features and processing time. Empirically, we have found that SURF algorithm produce minimum SSD score to achieve best matching. The MSER and MinEign algorithm extracts high and low number of features respectively. In respect of processing time, BRISK takes maximum and FAST method takes minimum time when compared to the other algorithms.

Download Full-text