alignment free
Recently Published Documents


TOTAL DOCUMENTS

599
(FIVE YEARS 202)

H-INDEX

42
(FIVE YEARS 8)

2021 ◽  
Author(s):  
Yana Hrytsenko ◽  
Noah M. Daniels ◽  
Rachel S. Schwartz

Abstract Background: Phylogenies enrich our understanding of how genes, genomes, and species evolve. Traditionally, alignment-based methods are used to construct phylogenies from genetic sequence data; however, this process can be time-consuming when analyzing the large amounts of genomic data available today. Additionally, these analyses face challenges due to differences in genome structure, synteny, and the need to identify similarities in the face of repeated substitutions resulting in loss of phylogenetic information contained in the sequence. Alignment Free (AF) approaches using k-mers (short subsequences) can be an efficient alternative due to their indifference to positional rearrangements in a sequence. However, these approaches may be sensitive to k-mer length and the distance between samples.Results: In this paper, we analyzed the sensitivity of an AF approach based on k-mer frequencies to these challenges using cosine and Euclidean distance metrics for both assembled genomes and unassembled sequencing reads. Quantification of the sensitivity of this AF approach for phylogeny reconstruction to branch length and k-mer length provides a better understanding of the necessary parameter ranges for accurate phylogeny reconstruction. Our results show that a frequency-based AF approach can result in accurate phylogeny reconstruction when using whole genomes, but not stochastically sequenced reads, so long as longer k-mers are used. Conclusions: In this study, we have shown an AF approach for phylogeny reconstruction is robust in analyzing assembled genome data for a range of numbers of substitutions using longer k-mers. Using simulated reads randomly selected from the genome by the Illumina sequencer had a detrimental effect on phylogeny estimation. Additionally, filtering out infrequent k-mers improved the computational efficiency of the method while preserving the accuracy of the results thus suggesting the feasibility of using only a subset of data to improve computational efficiency in cases where large sets of genome-scale data are analyzed.


2021 ◽  
Vol 23 ◽  
Author(s):  
Rui Yin ◽  
Zihan Luo ◽  
Chee Keong Kwoh

Background: A newly emerging novel coronavirus appeared and rapidly spread worldwide and World Health Organization declared a pandemic on March 11, 2020. The roles and characteristics of coronavirus have captured much attention due to its power of causing a wide variety of infectious diseases, from mild to severe, on humans. The detection of the lethality of human coronavirus is key to estimate the viral toxicity and provide perspectives for treatment. Methods: We developed an alignment-free framework that utilizes machine learning approaches for an ultra-fast and highly accurate prediction of the lethality of human-adapted coronavirus using genomic sequences. We performed extensive experiments through six different feature transformation and machine learning algorithms combining digital signal processing to identify the lethality of possible future novel coronaviruses using existing strains. Results: The results tested on SARS-CoV, MERS-CoV and SARS-CoV-2 datasets show an average 96.7% prediction accuracy. We also provide preliminary analysis validating the effectiveness of our models through other human coronaviruses. Our framework achieves high levels of prediction performance that is alignment-free and based on RNA sequences alone without genome annotations and specialized biological knowledge. Conclusion: The results demonstrate that, for any novel human coronavirus strains, this study can offer a reliable real-time estimation for its viral lethality.


2021 ◽  
Author(s):  
Ivar Grytten ◽  
Knut D. Rand ◽  
Geir K. Sandve

AbstractOne of the core applications of high-throughput sequencing is the characterization of individual genetic variation. Traditionally, variants have been inferred by comparing sequenced reads to a reference genome. There has recently been an emergence of genotyping methods, which instead infer variants of an individual based on variation present in population-scale repositories like the 1000 Genomes Project. However, commonly used methods for genotyping are slow since they still require mapping of reads to a reference genome. Also, since traditional reference genomes do not include genetic variation, traditional genotypers suffer from reference bias and poor accuracy in variation-rich regions where reads cannot accurately be mapped.We here present KAGE, a genotyper for SNPs and short indels that is inspired by recent developments within graph-based genome representations and alignment-free genotyping. We propose two novel ideas to improve both the speed and accuracy: we (1) use known genotypes from thousands of individuals in a Bayesian model to predict genotypes, and (2) propose a computationally efficient method for leveraging correlation between variants.We show through experiments on experimental data that KAGE is both faster and more accurate than other alignment-free genotypers. KAGE is able to genotype a new sample (15x coverage) in less than half an hour on a consumer laptop, more than 10 times faster than the fastest existing methods, making it ideal in clinical settings or when large numbers of individuals are to be genotyped at low computational cost.


Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 2896
Author(s):  
Binpeng Zhan ◽  
Chao Yang ◽  
Fuyuan Xie ◽  
Liang Hu ◽  
Weiting Liu ◽  
...  

Sensor–artery alignment has always been a significant problem in arterial tonometry devices and prevents their application to wearable continuous blood pressure (BP) monitoring. Traditional solutions are to use a complex servo system to search for the best measurement position or to use an inefficient pressure sensor array. In this study, a novel solid–liquid mixture pressure sensing module is proposed. A flexible film with unique liquid-filled structures greatly reduces the pulse measurement error caused by sensor misplacement. The ideal measuring location was defined as −2.5 to 2.5 mm from the center of the module and the pressure variation was within 5.4%, which is available in the real application. Even at a distance of ±4 mm from the module center, the pressure decays by 23.7%, and its dynamic waveform is maintained. In addition, the sensing module is also endowed with the capability of measuring the pulse wave transmit time as a complementary method for BP measuring. The capability of the developed alignment-free sensing module in BP measurement was been validated. Twenty subjects were selected for the BP measurement experiment, which followed IEEE standards. The experimental results showed that the mean error of SBP is −4.26 mmHg with a standard deviation of 7.0 mmHg, and the mean error of DBP is 2.98 mmHg with a standard deviation of 5.07 mmHg. The device is expected to provide a new solution for wearable continuous BP monitoring.


Life ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 1246
Author(s):  
Maria Frolova ◽  
Sergey Yudin ◽  
Valentin Makarov ◽  
Olga Glazunova ◽  
Olga Alikina ◽  
...  

Alignment-free approaches employing short k-mers as barcodes for individual genomes have created a new strategy for taxonomic analysis and paved a way for high-resolution phylogeny. Here, we introduce this strategy for the Lacticaseibacillus paracasei species as a taxon requiring barcoding support for precise systematics. Using this approach for phylotyping of L. paracasei VKM B-1144 at the genus level, we identified four L. paracasei phylogroups and found that L. casei 12A belongs to one of them, rather than to the L. casei clade. Therefore, we propose to change the specification of this strain. At the genus level we found only one relative of L. paracasei VKM B-1144 among 221 genomes, complete or available in contigs, and showed that the coding potential of the genome of this “rare” strain allows its consideration as a potential probiotic component. Four sets of published metagenomes were used to assess the dependence of L. paracasei presence in the human gut microbiome on chronic diseases, dietary changes and antibiotic treatment. Only antibiotics significantly affected their presence, and strain-specific barcoding allowed the identification of the main scenarios of the adaptive response. Thus, suggesting bacteria of this species for compensatory therapy, we also propose strain-specific barcoding for selecting optimal strains for target microbiomes.


2021 ◽  
Author(s):  
Metin Balaban ◽  
Nishat Anjum Bristy ◽  
Ahnaf Faisal ◽  
Md Shamsuzzoha Bayzid ◽  
Siavash Mirarab

While aligning sequences has been the dominant approach for determining homology prior to phylogenetic inference, alignment-free methods have much appeal in terms of simplifying the process of inference, especially when analyzing genome-wide data. Furthermore, alignment-free methods present the only option for some emerging forms of data such as genome skims, which cannot be assembled. Despite the appeal, alignment-free methods have not been competitive with alignment-based methods in terms of accuracy. One limitation of alignment-free methods is that they typically rely on simplified models of sequence evolution such as Jukes-Cantor. It is possible to compute pairwise distances under more complex models by computing frequencies of base substitutions provided that these quantities can be estimated in the alignment-free setting. A particular limitation is that for many forms of genome-wide data, which arguably present the best use case for alignment-free methods, the strand of DNA sequences is unknown. Under such conditions, the so-called no-strand bias models are the most complex models that can be used. Here, we show how to calculate distances under a no-strain bias restriction of the General Time Reversible (GTR) model called TK4 without relying on alignments. The method relies on replacing letters in the input sequences, and subsequent computation of Jaccard indices between k-mer sets. For the method to work on large genomes, we also need to compute the number of k-mer mismatches after replacement due to random chance. We show in simulation that these alignment-free distances can be highly accurate when genomes evolve under the assumed models, and we examine the effectiveness of the method on real genomic data.


2021 ◽  
Vol 7 (11) ◽  
Author(s):  
Oliver Schwengers ◽  
Lukas Jelonek ◽  
Marius Alfred Dieckmann ◽  
Sebastian Beyvers ◽  
Jochen Blom ◽  
...  

Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yao-Qun Wu ◽  
Zu-Guo Yu ◽  
Run-Bin Tang ◽  
Guo-Sheng Han ◽  
Vo V. Anh

Alignment methods have faced disadvantages in sequence comparison and phylogeny reconstruction due to their high computational costs in handling time and space complexity. On the other hand, alignment-free methods incur low computational costs and have recently gained popularity in the field of bioinformatics. Here we propose a new alignment-free method for phylogenetic tree reconstruction based on whole genome sequences. A key component is a measure called information-entropy position-weighted k-mer relative measure (IEPWRMkmer), which combines the position-weighted measure of k-mers proposed by our group and the information entropy of frequency of k-mers. The Manhattan distance is used to calculate the pairwise distance between species. Finally, we use the Neighbor-Joining method to construct the phylogenetic tree. To evaluate the performance of this method, we perform phylogenetic analysis on two datasets used by other researchers. The results demonstrate that the IEPWRMkmer method is efficient and reliable. The source codes of our method are provided at https://github.com/ wuyaoqun37/IEPWRMkmer.


Sign in / Sign up

Export Citation Format

Share Document