Optimization of DNA Sequences Data to Accelerate DNA Sequence Alignment on FPGA

Author(s):  
S.A.M. Al Junid ◽  
M.A. Haron ◽  
Z. Abd Majid ◽  
F.N. Osman ◽  
H. Hashim ◽  
...  
2016 ◽  
Vol 09 (04) ◽  
pp. 1650053
Author(s):  
Helena Andrade ◽  
Juan J. Nieto ◽  
Angela Torres

We consider two DNA sequences and compare both sequences. One of the crucial issues in bioinformatics is to measure the similarity of two DNA sequences. To this purpose one has to consider different alignments between both sequences. The number of alignments grows very rapidly with the length of the sequences. In this paper we give exact, explicit and computable formulas for the number of different possible alignments and for some classes of reduced alignments. We provide a new insight into the theory of DNA sequence alignment.


2018 ◽  
Vol 7 (4.19) ◽  
pp. 751
Author(s):  
Sara Q. Abedulridha ◽  
Eman S. Al-Shamery

DNA sequence alignment is an important and challenging task in Bioinformatics, which is used for finding the optimal arrangement between two sequences. In this paper, two methods are proposed in two stages to solve the pairwise sequence alignment problem. The first method is Matching Regions(MR) concerns on splitting the DNA into regions with adaptive interleaving windows to isolate the DNA tape into matched and non-matched regions. Additionally, a Multi-Zone Genetic Algorithm (MZGA) is proposed as an improved method in the second stage. It consists of segmenting a large non-matched region into smaller search space. Then, the MZGA is implemented in parallel to save time. Genetic Algorithm can be applied as an optimization toolto produce multiple solutions. Furthermore, the improvement focuses on the enhancement of Simple GA operators. In the selection, the population is divided into three Zones according to the fitness score. A new crossover approach is proposed depending on cut-points and location of gaps. The proposed method guarantees that the value of fitness tends to improvement or convergence in each successive generation. Thus, the offspring of populations will have better fitness value. The system has been applied to the real-world dataset of DNA with variable lengths which are ranged from 66 bases up to 26037 bases. As a result, the proposed technique satisfied the best alignment score of the DNA sequences. Finally, it is worth mentioning that the proposed system proved to be generalizable.  


2007 ◽  
Vol 16 (02) ◽  
pp. 245-266 ◽  
Author(s):  
GABRIEL CAFFARENA ◽  
CARLOS PEDREIRA ◽  
CARLOS CARRERAS ◽  
SLOBODAN BOJANIC ◽  
OCTAVIO NIETO-TALADRIZ

In this paper, we present two new hardware architectures that implement the Smith–Waterman algorithm for DNA sequence alignment. Previous low-cost approaches based on Field Programmable Gate Array (FPGA) technology are reviewed in detail and then improved with the goal of increased performance at the same cost (i.e., area). This goal is achieved through low level optimizations aimed to adapt the systolic structure implementing the algorithm to the regular structure of FPGAs, essentially finding the optimum granularity of the systolic cells. The proposed architectures achieve processing rates close to 1 Gbps, clearly outperforming previous approaches. Comparing to the reported FPGA results of the computation of the edit-distance between two DNA sequences, throughput is doubled for the same clock frequency with a minimum area penalty. The design has been implemented on an FPGA-based prototyping board integrated into a bioinformatics system. This has allowed validating the approach in a real system (i.e., including I/O and database access), and comparing the proposed hardware solution to purely software approaches. As shown in the paper, the results are outstanding even for slow-rate buses.


1996 ◽  
Vol 463 ◽  
Author(s):  
Dirk Drasdo ◽  
Terence Hwa ◽  
Michael Lässig

AbstractAlignment algorithms are commonly used to detect and quantify similarities between DNA sequences. We study these algorithms in the framework of a recent theory viewing similarity detection as a geometrical critical phenomenon of directed random walks. We show that the roughness of these random walks governs the fidelity of an alignment, i.e., its ability to capture the correlations between the sequences compared. Criteria for the optimization of alignment algorithms emerge from this theory.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8534 ◽  
Author(s):  
Dana L. Carper ◽  
Travis J. Lawrence ◽  
Alyssa A. Carrell ◽  
Dale A. Pelletier ◽  
David J. Weston

Background Microbiomes are extremely important for their host organisms, providing many vital functions and extending their hosts’ phenotypes. Natural studies of host-associated microbiomes can be difficult to interpret due to the high complexity of microbial communities, which hinders our ability to track and identify individual members along with the many factors that structure or perturb those communities. For this reason, researchers have turned to synthetic or constructed communities in which the identities of all members are known. However, due to the lack of tracking methods and the difficulty of creating a more diverse and identifiable community that can be distinguished through next-generation sequencing, most such in vivo studies have used only a few strains. Results To address this issue, we developed DISCo-microbe, a program for the design of an identifiable synthetic community of microbes for use in in vivo experimentation. The program is composed of two modules; (1) create, which allows the user to generate a highly diverse community list from an input DNA sequence alignment using a custom nucleotide distance algorithm, and (2) subsample, which subsamples the community list to either represent a number of grouping variables, including taxonomic proportions, or to reach a user-specified maximum number of community members. As an example, we demonstrate the generation of a synthetic microbial community that can be distinguished through amplicon sequencing. The synthetic microbial community in this example consisted of 2,122 members from a starting DNA sequence alignment of 10,000 16S rRNA sequences from the Ribosomal Database Project. We generated simulated Illumina sequencing data from the constructed community and demonstrate that DISCo-microbe is capable of designing diverse communities with members distinguishable by amplicon sequencing. Using the simulated data we were able to recover sequences from between 97–100% of community members using two different post-processing workflows. Furthermore, 97–99% of sequences were assigned to a community member with zero sequences being misidentified. We then subsampled the community list using taxonomic proportions to mimic a natural plant host–associated microbiome, ultimately yielding a diverse community of 784 members. Conclusions DISCo-microbe can create a highly diverse community list of microbes that can be distinguished through 16S rRNA gene sequencing, and has the ability to subsample (i.e., design) the community for the desired number of members and taxonomic proportions. Although developed for bacteria, the program allows for any alignment input from any taxonomic group, making it broadly applicable. The software and data are freely available from GitHub (https://github.com/dlcarper/DISCo-microbe) and Python Package Index (PYPI).


2021 ◽  
Vol 3 (2) ◽  
pp. 41
Author(s):  
Artur Bąk ◽  
Grzegorz Migdałek ◽  
Chandra Shekhar Pareek ◽  
Kacper Żukowski

Sign in / Sign up

Export Citation Format

Share Document