A local alignment of DNA based on parallelized MUMmer algorithm

Author(s):  
Yan Li ◽  
Zhenzhou Ji
Keyword(s):  
Author(s):  
M. Vidyasagar

This book explores important aspects of Markov and hidden Markov processes and the applications of these ideas to various problems in computational biology. It starts from first principles, so that no previous knowledge of probability is necessary. However, the work is rigorous and mathematical, making it useful to engineers and mathematicians, even those not interested in biological applications. A range of exercises is provided, including drills to familiarize the reader with concepts and more advanced problems that require deep thinking about the theory. Biological applications are taken from post-genomic biology, especially genomics and proteomics. The topics examined include standard material such as the Perron–Frobenius theorem, transient and recurrent states, hitting probabilities and hitting times, maximum likelihood estimation, the Viterbi algorithm, and the Baum–Welch algorithm. The book contains discussions of extremely useful topics not usually seen at the basic level, such as ergodicity of Markov processes, Markov Chain Monte Carlo (MCMC), information theory, and large deviation theory for both i.i.d and Markov processes. It also presents state-of-the-art realization theory for hidden Markov models. Among biological applications, it offers an in-depth look at the BLAST (Basic Local Alignment Search Technique) algorithm, including a comprehensive explanation of the underlying theory. Other applications such as profile hidden Markov models are also explored.


2019 ◽  
Vol 14 (2) ◽  
pp. 157-163
Author(s):  
Majid Hajibaba ◽  
Mohsen Sharifi ◽  
Saeid Gorgin

Background: One of the pivotal challenges in nowadays genomic research domain is the fast processing of voluminous data such as the ones engendered by high-throughput Next-Generation Sequencing technologies. On the other hand, BLAST (Basic Local Alignment Search Tool), a longestablished and renowned tool in Bioinformatics, has shown to be incredibly slow in this regard. Objective: To improve the performance of BLAST in the processing of voluminous data, we have applied a novel memory-aware technique to BLAST for faster parallel processing of voluminous data. Method: We have used a master-worker model for the processing of voluminous data alongside a memory-aware technique in which the master partitions the whole data in equal chunks, one chunk for each worker, and consequently each worker further splits and formats its allocated data chunk according to the size of its memory. Each worker searches every split data one-by-one through a list of queries. Results: We have chosen a list of queries with different lengths to run insensitive searches in a huge database called UniProtKB/TrEMBL. Our experiments show 20 percent improvement in performance when workers used our proposed memory-aware technique compared to when they were not memory aware. Comparatively, experiments show even higher performance improvement, approximately 50 percent, when we applied our memory-aware technique to mpiBLAST. Conclusion: We have shown that memory-awareness in formatting bulky database, when running BLAST, can improve performance significantly, while preventing unexpected crashes in low-memory environments. Even though distributed computing attempts to mitigate search time by partitioning and distributing database portions, our memory-aware technique alleviates negative effects of page-faults on performance.


2019 ◽  
Vol 14 (7) ◽  
pp. 628-639 ◽  
Author(s):  
Bizhi Wu ◽  
Hangxiao Zhang ◽  
Limei Lin ◽  
Huiyuan Wang ◽  
Yubang Gao ◽  
...  

Background: The BLAST (Basic Local Alignment Search Tool) algorithm has been widely used for sequence similarity searching. Analogously, the public phenotype images must be efficiently retrieved using biological images as queries and identify the phenotype with high similarity. Due to the accumulation of genotype-phenotype-mapping data, a system of searching for similar phenotypes is not available due to the bottleneck of image processing. Objective: In this study, we focus on the identification of similar query phenotypic images by searching the biological phenotype database, including information about loss-of-function and gain-of-function. Methods: We propose a deep convolutional autoencoder architecture to segment the biological phenotypic images and develop a phenotype retrieval system to enable a better understanding of genotype–phenotype correlation. Results: This study shows how deep convolutional autoencoder architecture can be trained on images from biological phenotypes to achieve state-of-the-art performance in a phenotypic images retrieval system. Conclusion: Taken together, the phenotype analysis system can provide further information on the correlation between genotype and phenotype. Additionally, it is obvious that the neural network model of image segmentation and the phenotype retrieval system is equally suitable for any species, which has enough phenotype images to train the neural network.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Dimitri Boeckaerts ◽  
Michiel Stock ◽  
Bjorn Criel ◽  
Hans Gerstmans ◽  
Bernard De Baets ◽  
...  

AbstractNowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs.


Pathogens ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 318
Author(s):  
Bernardo Sachman-Ruiz ◽  
Luis Lozano ◽  
José J. Lira ◽  
Grecia Martínez ◽  
Carmen Rojas ◽  
...  

Cattle babesiosis is a socio-economically important tick-borne disease caused by Apicomplexa protozoa of the genus Babesia that are obligate intraerythrocytic parasites. The pathogenicity of Babesia parasites for cattle is determined by the interaction with the host immune system and the presence of the parasite’s virulence genes. A Babesia bigemina strain that has been maintained under a microaerophilic stationary phase in in vitro culture conditions for several years in the laboratory lost virulence for the bovine host and the capacity for being transmitted by the tick vector. In this study, we compared the virulome of the in vitro culture attenuated Babesia bigemina strain (S) and the virulent tick transmitted parental Mexican B. bigemina strain (M). Preliminary results obtained by using the Basic Local Alignment Search Tool (BLAST) showed that out of 27 virulence genes described and analyzed in the B. bigemina virulent tick transmitted strain, only five were fully identified in the attenuated laboratory strain. In all cases, the identity and coverture of the identified genes of the wildtype strain were higher than those of the laboratory strain. This finding is putatively associated with the continuous partial loss of virulence genes in the laboratory strain after several passages of the parasite population under optimal in vitro growth conditions. The loss of virulence factors might be reflected in the absence of symptoms of the disease in cattle inoculated with the attenuated strain despite the presence of infection in the bovine host cells.


2020 ◽  
Vol 19 (1) ◽  
Author(s):  
Ommer Mohammed Dafalla ◽  
Mohammed Alzahrani ◽  
Ahmed Sahli ◽  
Mohammed Abdulla Al Helal ◽  
Mohammad Mohammad Alhazmi ◽  
...  

Abstract Background Artemisinin-based combination therapy (ACT) is recommended at the initial phase for treatment of Plasmodium falciparum, to reduce morbidity and mortality in all countries where malaria is endemic. Polymorphism in portions of P. falciparum gene encoding kelch (K13)-propeller domains is associated with delayed parasite clearance after ACT. Of about 124 different non-synonymous mutations, 46 have been identified in Southeast Asia (SEA), 62 in sub-Saharan Africa (SSA) and 16 in both the regions. This is the first study designed to analyse the prevalence of polymorphism in the P. falciparum k13-propeller domain in the Jazan region of southwest Saudi Arabia, where malaria is endemic. Methods One-hundred and forty P. falciparum samples were collected from Jazan region of southwest Saudi Arabia at three different times: 20 samples in 2011, 40 samples in 2016 and 80 samples in 2020 after the implementation of ACT. Plasmodium falciparum kelch13 (k13) gene DNA was extracted, amplified, sequenced, and analysed using a basic local alignment search tool (BLAST). Results This study obtained 51 non-synonymous (NS) mutations in three time groups, divided as follows: 6 single nucleotide polymorphisms (SNPs) ‘11.8%’ in samples collected in 2011 only, 3 (5.9%) in 2011and 2016, 5 (9.8%) in 2011 and 2020, 5 (9.8%) in 2016 only, 8 (15.7%) in 2016 and 2020, 14 (27.5%) in 2020 and 10 (19.6%) in all the groups. The BLAST revealed that the 2011 isolates were genetically closer to African isolates (53.3%) than Asian ones (46.7%). Interestingly, this proportion changed completely in 2020, to become closer to Asian isolates (81.6%) than to African ones (18.4%). Conclusions Despite the diversity of the identified mutations in the k13-propeller gene, these data did not report widespread artemisinin-resistant polymorphisms in the Jazan region where these samples were collected. Such a process would be expected to increase frequencies of mutations associated with the resistance of ACT.


2015 ◽  
Vol 2015 ◽  
pp. 1-6 ◽  
Author(s):  
Congzhao Fan ◽  
Xiaojin Li ◽  
Jun Zhu ◽  
Jingyuan Song ◽  
Hui Yao

The medicinal plantFerulahas been widely used in Asian medicine, especially in Uyghur medicine in Xinjiang, China. Given that various substitutes and closely related species have similar morphological characteristics,Ferulais difficult to distinguish based on morphology alone, thereby causing confusion and threatening the safe use ofFerula. In this study, internal transcribed spacer 2 (ITS2) sequences were analyzed and assessed for the accurate identification of two salableFerulaspecies (Ferula sinkiangensisandFerula fukangensis) and eight substitutes or closely related species. Results showed that the sequence length of ITS2 ranged from 451 bp to 45 bp, whereas guanine and cytosine contents (GC) were from 53.6% to 56.2%. A total of 77 variation sites were detected, including 63 base mutations and 14 insertion/deletion mutations. The ITS2 sequence correctly identified 100% of the samples at the species level using the basic local alignment search tool 1 and nearest-distance method. Furthermore, neighbor-joining tree successfully identified the genuine plantsF. sinkiangensisandF. fukangensisfrom their succedaneum and closely related species. These results indicated that ITS2 sequence could be used as a valuable barcode to distinguish Uyghur medicineFerulafrom counterfeits and closely related species. This study may broaden DNA barcoding application in the Uyghur medicinal plant field.


2016 ◽  
Vol 13 (123) ◽  
pp. 20160575 ◽  
Author(s):  
Adama Creppy ◽  
Franck Plouraboué ◽  
Olivier Praud ◽  
Xavier Druart ◽  
Sébastien Cazin ◽  
...  

New experimental evidence of self-motion of a confined active suspension is presented. Depositing fresh semen sample in an annular shaped microfluidic chip leads to a spontaneous vortex state of the fluid at sufficiently large sperm concentration. The rotation occurs unpredictably clockwise or counterclockwise and is robust and stable. Furthermore, for highly active and concentrated semen, richer dynamics can occur such as self-sustained or damped rotation oscillations. Experimental results obtained with systematic dilution provide a clear evidence of a phase transition towards collective motion associated with local alignment of spermatozoa akin to the Vicsek model. A macroscopic theory based on previously derived self-organized hydrodynamics models is adapted to this context and provides predictions consistent with the observed stationary motion.


2006 ◽  
Vol 69 (7) ◽  
pp. 1653-1661 ◽  
Author(s):  
H. J. KIM ◽  
S. H. PARK ◽  
T. H. LEE ◽  
B. H. NAHM ◽  
Y. H. CHUNG ◽  
...  

Salmonella enterica serovar Typhimurium is a major foodborne pathogen throughout the world. Until now, the specific target genes for the detection and identification of serovar Typhimurium have not been developed. To determine the specific probes for serovar Typhimurium, the genes of serovar Typhimurium LT2 that were expected to be unique were selected with the BLAST (Basic Local Alignment Search Tool) program within GenBank. The selected genes were compared with 11 genomic sequences of various Salmonella serovars by BLAST. Of these selected genes, 10 were expected to be specific to serovar Typhimurium and were not related to virulence factor genes of Salmonella pathogenicity island or to genes of the O and H antigens of Salmonella. Primers for the 10 selected genes were constructed, and PCRs were evaluated with various genomic DNAs of Salmonella and non-Salmonella strains for the specific identification of Salmonella serovar Typhimurium. Among all the primer sets for the 10 genes, STM4497 showed the highest degree of specificity to serovar Typhimurium. In this study, a specific primer set for Salmonella serovar Typhimurium was developed on the basis of the comparison of genomic sequences between Salmonella serovars and was validated with PCR. This method of comparative genomics to select target genes or sequences can be applied to the specific detection of microorganisms.


Sign in / Sign up

Export Citation Format

Share Document