Optimization of DNA Sequences Data to Accelerate DNA Sequence Alignment on FPGA

The number of alignments between two DNA sequences

International Journal of Biomathematics ◽

10.1142/s1793524516500534 ◽

2016 ◽

Vol 09 (04) ◽

pp. 1650053

Author(s):

Helena Andrade ◽

Juan J. Nieto ◽

Angela Torres

Keyword(s):

Dna Sequence ◽

Sequence Alignment ◽

Dna Sequences ◽

Dna Sequence Alignment ◽

Insight Into

We consider two DNA sequences and compare both sequences. One of the crucial issues in bioinformatics is to measure the similarity of two DNA sequences. To this purpose one has to consider different alignments between both sequences. The number of alignments grows very rapidly with the length of the sequences. In this paper we give exact, explicit and computable formulas for the number of different possible alignments and for some classes of reduced alignments. We provide a new insight into the theory of DNA sequence alignment.

Download Full-text

Optimal Pair DNA Sequence Alignment based on Matching Regions and Multi-Zone Genetic Algorithm

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.19.27993 ◽

2018 ◽

Vol 7 (4.19) ◽

pp. 751

Author(s):

Sara Q. Abedulridha ◽

Eman S. Al-Shamery

Keyword(s):

Genetic Algorithm ◽

Dna Sequence ◽

Sequence Alignment ◽

Dna Sequences ◽

Search Space ◽

Pairwise Sequence Alignment ◽

Optimal Arrangement ◽

Fitness Score ◽

Dna Sequence Alignment ◽

Fitness Value

DNA sequence alignment is an important and challenging task in Bioinformatics, which is used for finding the optimal arrangement between two sequences. In this paper, two methods are proposed in two stages to solve the pairwise sequence alignment problem. The first method is Matching Regions(MR) concerns on splitting the DNA into regions with adaptive interleaving windows to isolate the DNA tape into matched and non-matched regions. Additionally, a Multi-Zone Genetic Algorithm (MZGA) is proposed as an improved method in the second stage. It consists of segmenting a large non-matched region into smaller search space. Then, the MZGA is implemented in parallel to save time. Genetic Algorithm can be applied as an optimization toolto produce multiple solutions. Furthermore, the improvement focuses on the enhancement of Simple GA operators. In the selection, the population is divided into three Zones according to the fitness score. A new crossover approach is proposed depending on cut-points and location of gaps. The proposed method guarantees that the value of fitness tends to improvement or convergence in each successive generation. Thus, the offspring of populations will have better fitness value. The system has been applied to the real-world dataset of DNA with variable lengths which are ranged from 66 bases up to 26037 bases. As a result, the proposed technique satisfied the best alignment score of the DNA sequences. Finally, it is worth mentioning that the proposed system proved to be generalizable.

Download Full-text

FPGA ACCELERATION FOR DNA SEQUENCE ALIGNMENT

Journal of Circuits System and Computers ◽

10.1142/s0218126607003575 ◽

2007 ◽

Vol 16 (02) ◽

pp. 245-266 ◽

Cited By ~ 15

Author(s):

GABRIEL CAFFARENA ◽

CARLOS PEDREIRA ◽

CARLOS CARRERAS ◽

SLOBODAN BOJANIC ◽

OCTAVIO NIETO-TALADRIZ

Keyword(s):

Dna Sequence ◽

Sequence Alignment ◽

Dna Sequences ◽

Low Cost ◽

Regular Structure ◽

Clock Frequency ◽

Database Access ◽

Dna Sequence Alignment ◽

Field Programmable ◽

Hardware Architectures

In this paper, we present two new hardware architectures that implement the Smith–Waterman algorithm for DNA sequence alignment. Previous low-cost approaches based on Field Programmable Gate Array (FPGA) technology are reviewed in detail and then improved with the goal of increased performance at the same cost (i.e., area). This goal is achieved through low level optimizations aimed to adapt the systolic structure implementing the algorithm to the regular structure of FPGAs, essentially finding the optimum granularity of the systolic cells. The proposed architectures achieve processing rates close to 1 Gbps, clearly outperforming previous approaches. Comparing to the reported FPGA results of the computation of the edit-distance between two DNA sequences, throughput is doubled for the same clock frequency with a minimum area penalty. The design has been implemented on an FPGA-based prototyping board integrated into a bioinformatics system. This has allowed validating the approach in a real system (i.e., including I/O and database access), and comparing the proposed hardware solution to purely software approaches. As shown in the paper, the results are outstanding even for slow-rate buses.

Download Full-text

DNA Sequence Alignment and Critical Phenomena

MRS Proceedings ◽

10.1557/proc-463-75 ◽

1996 ◽

Vol 463 ◽

Author(s):

Dirk Drasdo ◽

Terence Hwa ◽

Michael Lässig

Keyword(s):

Random Walks ◽

Dna Sequence ◽

Critical Phenomena ◽

Sequence Alignment ◽

Dna Sequences ◽

Critical Phenomenon ◽

Recent Theory ◽

Similarity Detection ◽

Dna Sequence Alignment ◽

Alignment Algorithms

AbstractAlignment algorithms are commonly used to detect and quantify similarities between DNA sequences. We study these algorithms in the framework of a recent theory viewing similarity detection as a geometrical critical phenomenon of directed random walks. We show that the roughness of these random walks governs the fidelity of an alignment, i.e., its ability to capture the correlations between the sequences compared. Criteria for the optimization of alignment algorithms emerge from this theory.

Download Full-text

Fast DNA Sequence Alignment Algorithm Based on Quality Score Using Improved Dynamic Programming and Fuzzy Gap Cost Control

Current Bioinformatics ◽

10.2174/1574893609666140523000227 ◽

2014 ◽

Vol 9 (5) ◽

pp. 540-547

Author(s):

Kwang Kim ◽

Hyun Park ◽

Doo Song

Keyword(s):

Dynamic Programming ◽

Dna Sequence ◽

Sequence Alignment ◽

Cost Control ◽

Quality Score ◽

Alignment Algorithm ◽

Sequence Alignment Algorithm ◽

Dna Sequence Alignment ◽

Improved Dynamic Programming

Download Full-text

Design and Analysis of 8-bit Smith Waterman based DNA Sequence Alignment Accelerator's Core on ASIC Design Flow

2010 Fourth UKSim European Symposium on Computer Modeling and Simulation ◽

10.1109/ems.2010.31 ◽

2010 ◽

Cited By ~ 2

Author(s):

A.K. Halim ◽

Z.A. Majid ◽

M.A. Mansor ◽

S.A.M. Al Junid ◽

S. Mohamed ◽

...

Keyword(s):

Dna Sequence ◽

Sequence Alignment ◽

Design Flow ◽

Asic Design ◽

Dna Sequence Alignment

Download Full-text

A Memory-Efficient Accelerator for DNA Sequence Alignment with Two-Piece Affine Gap Tracebacks

2021 IEEE International Symposium on Circuits and Systems (ISCAS) ◽

10.1109/iscas51556.2021.9401771 ◽

2021 ◽

Author(s):

Jing-Ping Wu ◽

Yi-Chien Lin ◽

Ying-Wei Wu ◽

Shih-Wei Hsieh ◽

Ching-Hsuan Tai ◽

...

Keyword(s):

Dna Sequence ◽

Sequence Alignment ◽

Dna Sequence Alignment ◽

Memory Efficient

Download Full-text

DISCo-microbe: design of an identifiable synthetic community of microbes

PeerJ ◽

10.7717/peerj.8534 ◽

2020 ◽

Vol 8 ◽

pp. e8534 ◽

Cited By ~ 1

Author(s):

Dana L. Carper ◽

Travis J. Lawrence ◽

Alyssa A. Carrell ◽

Dale A. Pelletier ◽

David J. Weston

Keyword(s):

Microbial Community ◽

16S Rrna ◽

Dna Sequence ◽

Sequence Alignment ◽

Amplicon Sequencing ◽

Ribosomal Database Project ◽

Community Members ◽

Dna Sequence Alignment ◽

Diverse Community

Background Microbiomes are extremely important for their host organisms, providing many vital functions and extending their hosts’ phenotypes. Natural studies of host-associated microbiomes can be difficult to interpret due to the high complexity of microbial communities, which hinders our ability to track and identify individual members along with the many factors that structure or perturb those communities. For this reason, researchers have turned to synthetic or constructed communities in which the identities of all members are known. However, due to the lack of tracking methods and the difficulty of creating a more diverse and identifiable community that can be distinguished through next-generation sequencing, most such in vivo studies have used only a few strains. Results To address this issue, we developed DISCo-microbe, a program for the design of an identifiable synthetic community of microbes for use in in vivo experimentation. The program is composed of two modules; (1) create, which allows the user to generate a highly diverse community list from an input DNA sequence alignment using a custom nucleotide distance algorithm, and (2) subsample, which subsamples the community list to either represent a number of grouping variables, including taxonomic proportions, or to reach a user-specified maximum number of community members. As an example, we demonstrate the generation of a synthetic microbial community that can be distinguished through amplicon sequencing. The synthetic microbial community in this example consisted of 2,122 members from a starting DNA sequence alignment of 10,000 16S rRNA sequences from the Ribosomal Database Project. We generated simulated Illumina sequencing data from the constructed community and demonstrate that DISCo-microbe is capable of designing diverse communities with members distinguishable by amplicon sequencing. Using the simulated data we were able to recover sequences from between 97–100% of community members using two different post-processing workflows. Furthermore, 97–99% of sequences were assigned to a community member with zero sequences being misidentified. We then subsampled the community list using taxonomic proportions to mimic a natural plant host–associated microbiome, ultimately yielding a diverse community of 784 members. Conclusions DISCo-microbe can create a highly diverse community list of microbes that can be distinguished through 16S rRNA gene sequencing, and has the ability to subsample (i.e., design) the community for the desired number of members and taxonomic proportions. Although developed for bacteria, the program allows for any alignment input from any taxonomic group, making it broadly applicable. The software and data are freely available from GitHub (https://github.com/dlcarper/DISCo-microbe) and Python Package Index (PYPI).

Download Full-text