scholarly journals PHYSER: An Algorithm to Detect Sequencing Errors from Phylogenetic Information

Author(s):  
Jorge Álvarez-Jarreta ◽  
Elvira Mayordomo ◽  
Eduardo Ruiz-Pesini
2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Gundula Povysil ◽  
Monika Heinzl ◽  
Renato Salazar ◽  
Nicholas Stoler ◽  
Anton Nekrutenko ◽  
...  

Abstract Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling. Specifically, our tools show that tags contain polymerase chain reaction and sequencing errors that contribute to data loss and lower DCS yields. Our tools also identified chimeras, which likely reflect barcode collisions. Finally, we also developed a tool that re-examines variant calls from raw reads and provides different summary data that categorizes the confidence level of a variant call by a tier-based system. With this tool, we can include reads without a family and check the reliability of the call, that increases substantially the sequencing depth for variant calling, a particular important advantage for low-input samples or low-coverage regions.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yan Helen Yan ◽  
Sherry X. Chen ◽  
Lauren Y. Cheng ◽  
Alyssa Y. Rodriguez ◽  
Rui Tang ◽  
...  

AbstractWhole exome sequencing (WES) is used to identify mutations in a patient’s tumor DNA that are predictive of tumor behavior, including the likelihood of response or resistance to cancer therapy. WES has a mutation limit of detection (LoD) at variant allele frequencies (VAF) of 5%. Putative mutations called at ≤ 5% VAF are frequently due to sequencing errors, therefore reporting these subclonal mutations incurs risk of significant false positives. Here we performed ~ 1000 × WES on fresh-frozen and formalin-fixed paraffin-embedded (FFPE) tissue biopsy samples from a non-small cell lung cancer patient, and identified 226 putative mutations at between 0.5 and 5% VAF. Each variant was then tested using NuProbe NGSure, to confirm the original WES calls. NGSure utilizes Blocker Displacement Amplification to first enrich the allelic fraction of the mutation and then uses Sanger sequencing to determine mutation identity. Results showed that 52% of the 226 (117) putative variants were disconfirmed, among which 2% (5) putative variants were found to be misidentified in WES. In the 66 cancer-related variants, the disconfirmed rate was 82% (54/66). This data demonstrates Blocker Displacement Amplification allelic enrichment coupled with Sanger sequencing can be used to confirm putative mutations ≤ 5% VAF. By implementing this method, next-generation sequencing can reliably report low-level variants at a high sensitivity, without the cost of high sequencing depth.


Author(s):  
Victoria A. Janes ◽  
Daan W. Notermans ◽  
Ingrid J.B. Spijkerman ◽  
Caroline E. Visser ◽  
Marja E. Jakobs ◽  
...  

Abstract Background Recognition of nosocomial outbreaks with antimicrobial resistant (AMR) pathogens and appropriate infection prevention measures are essential to limit the consequences of AMR pathogens to patients in hospitals. Because unrelated, but genetically similar AMR pathogens may circulate simultaneously, rapid high-resolution molecular typing methods are needed for outbreak management. We compared amplified fragment length polymorphism (AFLP) and whole genome sequencing (WGS) during a nosocomial outbreak of vancomycin-resistant Enterococcus faecium (VRE) that spanned 5 months. Methods Hierarchical clustering of AFLP profiles was performed using unweighted pair-grouping and similarity coefficients were calculated with Pearson correlation. For WGS-analysis, core single nucleotide polymorphisms (SNPs) were used to calculate the pairwise distance between isolates, construct a maximum likelihood phylogeny and establish a cut-off for relatedness of epidemiologically linked VRE isolates. SNP-variations in the vanB gene cluster were compared to increase the comparative resolution. Technical replicates of 2 isolates were sequenced to determine the number of core-SNPs derived from random sequencing errors. Results Of the 721 patients screened for VRE carriage, AFLP assigned isolates of 22 patients to the outbreak cluster. According to WGS, all 22 isolates belonged to ST117 but only 21 grouped in a tight phylogenetic cluster and carried vanB resistance gene clusters. Sequencing of technical replicates showed that 4–5 core-SNPs were derived by random sequencing errors. The cut-off for relatedness of epidemiologically linked VRE isolates was established at ≤7 core-SNPs. The discrepant isolate was separated from the index isolate by 61 core-SNPs and the vanB gene cluster was absent. In AFLP analysis this discrepant isolate was indistinguishable from the other outbreak isolates, forming a cluster with 92% similarity (cut-off for identical isolates ≥90%). The inclusion of the discrepant isolate in the outbreak resulted in the screening of 250 patients and quarantining of an entire ward. Conclusion AFLP was a rapid and affordable screening tool for characterising hospital VRE outbreaks. For in-depth understanding of the outbreak WGS was needed. Compared to AFLP, WGS provided higher resolution typing of VRE isolates with implications for outbreak management.


2019 ◽  
Vol 11 (10) ◽  
pp. 2824-2849 ◽  
Author(s):  
Paweł Mackiewicz ◽  
Adam Dawid Urantówka ◽  
Aleksandra Kroczak ◽  
Dorota Mackiewicz

Abstract Mitochondrial genes are placed on one molecule, which implies that they should carry consistent phylogenetic information. Following this advantage, we present a well-supported phylogeny based on mitochondrial genomes from almost 300 representatives of Passeriformes, the most numerous and differentiated Aves order. The analyses resolved the phylogenetic position of paraphyletic Basal and Transitional Oscines. Passerida occurred divided into two groups, one containing Paroidea and Sylvioidea, whereas the other, Passeroidea and Muscicapoidea. Analyses of mitogenomes showed four types of rearrangements including a duplicated control region (CR) with adjacent genes. Mapping the presence and absence of duplications onto the phylogenetic tree revealed that the duplication was the ancestral state for passerines and was maintained in early diverged lineages. Next, the duplication could be lost and occurred independently at least four times according to the most parsimonious scenario. In some lineages, two CR copies have been inherited from an ancient duplication and highly diverged, whereas in others, the second copy became similar to the first one due to concerted evolution. The second CR copies accumulated over twice as many substitutions as the first ones. However, the second CRs were not completely eliminated and were retained for a long time, which suggests that both regions can fulfill an important role in mitogenomes. Phylogenetic analyses based on CR sequences subjected to the complex evolution can produce tree topologies inconsistent with real evolutionary relationships between species. Passerines with two CRs showed a higher metabolic rate in relation to their body mass.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Jean-Marc Aury ◽  
Benjamin Istace

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Michal Motyka ◽  
Dominik Kusy ◽  
Michal Masek ◽  
Matej Bocek ◽  
Yun Li ◽  
...  

AbstractBiologists have reported on the chemical defences and the phenetic similarity of net-winged beetles (Coleoptera: Lycidae) and their co-mimics. Nevertheless, our knowledge has remained fragmental, and the evolution of mimetic patterns has not been studied in the phylogenetic context. We illustrate the general appearance of ~ 600 lycid species and ~ 200 co-mimics and their distribution. Further, we assemble the phylogeny using the transcriptomic backbone and ~ 570 species. Using phylogenetic information, we closely scrutinise the relationships among aposematically coloured species, the worldwide diversity, and the distribution of aposematic patterns. The emitted visual signals differ in conspicuousness. The uniform coloured dorsum is ancestral and was followed by the evolution of bicoloured forms. The mottled patterns, i.e. fasciate, striate, punctate, and reticulate, originated later in the course of evolution. The highest number of sympatrically occurring patterns was recovered in New Guinea and the Andean mountain ecosystems (the areas of the highest abundance), and in continental South East Asia (an area of moderate abundance but high in phylogenetic diversity). Consequently, a large number of co-existing aposematic patterns in a single region and/or locality is the rule, in contrast with the theoretical prediction, and predators do not face a simple model-like choice but cope with complex mimetic communities. Lycids display an ancestral aposematic signal even though they sympatrically occur with differently coloured unprofitable relatives. We show that the highly conspicuous patterns evolve within communities predominantly formed by less conspicuous Müllerian mimics and, and often only a single species displays a novel pattern. Our work is a forerunner to the detailed research into the aposematic signalling of net-winged beetles.


2011 ◽  
Vol 61 (10) ◽  
pp. 2520-2524 ◽  
Author(s):  
Elisa Salvetti ◽  
Giovanna E. Felis ◽  
Franco Dellaglio ◽  
Anna Castioni ◽  
Sandra Torriani ◽  
...  

The development of molecular tools and in particular the use of 16S rRNA gene sequencing has had a profound effect on the taxonomy of many bacterial groups. Gram-positive organisms that encompass the genera Lactobacillus and Clostridium within the Firmicutes are examples of taxa that have undergone major revisions based on phylogenetic information. A consequence of these reorganizations is that a number of organisms are now recognized as being misclassified. Previous studies have demonstrated that Lactobacillus catenaformis and Lactobacillus vitulinus are phylogenetically unrelated to Lactobacillus sensu stricto, being placed within the Clostridia rRNA cluster XVII. Based on the phenotypic, chemotaxonomic and phylogenetic data presented, it is proposed that L. catenaformis and L. vitulinus be reclassified in two new genera, named respectively Eggerthia gen. nov., with the type species Eggerthia catenaformis gen. nov., comb. nov. (type strain DSM 20559T = ATCC 25536T = CCUG 48174T = CIP 104817T = JCM 1121T) and Kandleria gen. nov., with the type species Kandleria vitulina gen. nov., comb. nov. (type strain LMG 18931T = ATCC 27783T = CCUG 32236T = DSM 20405T = JCM 1143T).


2009 ◽  
Vol 2009 ◽  
pp. 1-19 ◽  
Author(s):  
GongXin Yu

Chimpanzees and humans are closely related but differ in many deadly human diseases and other characteristics in physiology, anatomy, and pathology. In spite of decades of extensive research, crucial questions about the molecular mechanisms behind the differences are yet to be understood. Here I reportExonVar, a novel computational pipeline forExon-based human-chimpanzee comparativeVariant analysis. The objective is to comparatively analyze mutations specifically those that caused the frameshift and nonsense mutations and to assess their scale and potential impacts on human-chimpanzee divergence. Genomewide analysis of human and chimpanzee exons withExonVaridentified a number of species-specific, exon-disrupting mutations in chimpanzees but much fewer in humans. Many were found on genes involved in important biological processes such as T cell lineage development, the pathogenesis of inflammatory diseases, and antigen induced cell death. A “less-is-more” model was previously established to illustrate the role of the gene inactivation and disruptions during human evolution. Here this analysis suggested a different model where the chimpanzee-specific exon-disrupting mutations may act as additional evolutionary force that drove the human-chimpanzee divergence. Finally, the analysis revealed a number of sequencing errors in the chimpanzee and human genome sequences and further illustrated that they could be corrected without resequencing.


Archaea ◽  
2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Yendi E. Navarro-Noya ◽  
César Valenzuela-Encinas ◽  
Alonso Sandoval-Yuriar ◽  
Norma G. Jiménez-Bueno ◽  
Rodolfo Marsch ◽  
...  

In this study the archaeal communities in extreme saline-alkaline soils of the former lake Texcoco, Mexico, with electrolytic conductivities (EC) ranging from 0.7 to 157.2 dS/m and pH from 8.5 to 10.5 were explored. Archaeal communities in the 0.7 dS/m pH 8.5 soil had the lowest alpha diversity values and were dominated by a limited number of phylotypes belonging to the mesophilic CandidatusNitrososphaera. Diversity and species richness were higher in the soils with EC between 9.0 and 157.2 dS/m. The majority of OTUs detected in the hypersaline soil were members of the Halobacteriaceae family. Novel phylogenetic branches in the Halobacteriales class were detected in the soil, and more abundantly in soil with the higher pH (10.5), indicating that unknown and uncharacterized Archaea can be found in this soil. Thirteen different genera of the Halobacteriaceae family were identified and were distributed differently between the soils.Halobiforma,Halostagnicola,Haloterrigena, andNatronomonaswere found in all soil samples. Methanogenic archaea were found only in soil with pH between 10.0 and 10.3. Retrieved methanogenic archaea belonged to the Methanosarcinales and Methanomicrobiales orders. The comparison of the archaeal community structures considering phylogenetic information (UniFrac distances) clearly clustered the communities by pH.


Sign in / Sign up

Export Citation Format

Share Document