scholarly journals Feature Frequency Profile-based phylogenies are inaccurate

Author(s):  
Yuanning Li ◽  
Kyle T. David ◽  
Xing-Xing Shen ◽  
Jacob L. Steenwyk ◽  
Kenneth M. Halanych ◽  
...  

AbstractChoi and Kim (PNAS, 117: 3678-3686; first published February 4, 2020; https://doi.org/10.1073/pnas.1915766117) used the alignment-free Feature Frequency Profile (FFP) method to reconstruct a broad sketch of the tree of life based on proteome data from 4,023 taxa. The FFP-based reconstruction reports many relationships that strongly contradict the current consensus view of the tree of life and its accuracy has not been tested. Comparison of FFP with current standard approaches, such as concatenation and coalescence, using simulation analyses shows that FFP performs poorly. We conclude that the phylogeny of the tree of life reconstructed by Choi and Kim is suspect based on methodology as well as prior phylogenetic evidence.

2020 ◽  
Vol 117 (7) ◽  
pp. 3678-3686 ◽  
Author(s):  
JaeJin Choi ◽  
Sung-Hou Kim

An organism tree of life (organism ToL) is a conceptual and metaphorical tree to capture a simplified narrative of the evolutionary course and kinship among the extant organisms. Such a tree cannot be experimentally validated but may be reconstructed based on characteristics associated with the organisms. Since the whole-genome sequence of an organism is, at present, the most comprehensive descriptor of the organism, a whole-genome sequence-based ToL can be an empirically derivable surrogate for the organism ToL. However, experimentally determining the whole-genome sequences of many diverse organisms was practically impossible until recently. We have constructed three types of ToLs for diversely sampled organisms using the sequences of whole genome, of whole transcriptome, and of whole proteome. Of the three, whole-proteome sequence-based ToL (whole-proteome ToL), constructed by applying information theory-based feature frequency profile method, an “alignment-free” method, gave the most topologically stable ToL. Here, we describe the main features of a whole-proteome ToL for 4,023 species with known complete or almost complete genome sequences on grouping and kinship among the groups at deep evolutionary levels. The ToL reveals 1) all extant organisms of this study can be grouped into 2 “Supergroups,” 6 “Major Groups,” or 35+ “Groups”; 2) the order of emergence of the “founders” of all of the groups may be assigned on an evolutionary progression scale; 3) all of the founders of the groups have emerged in a “deep burst” at the very beginning period near the root of the ToL—an explosive birth of life’s diversity.


2020 ◽  
Vol 117 (50) ◽  
pp. 31580-31581 ◽  
Author(s):  
Yuanning Li ◽  
Kyle T. David ◽  
Xing-Xing Shen ◽  
Jacob L. Steenwyk ◽  
Kenneth M. Halanych ◽  
...  

2009 ◽  
Vol 106 (8) ◽  
pp. 2677-2682 ◽  
Author(s):  
Gregory E. Sims ◽  
Se-Ran Jun ◽  
Guohong A. Wu ◽  
Sung-Hou Kim

Trials ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jeremy Horwood ◽  
Melanie Chalder ◽  
Ben Ainsworth ◽  
James Denison-Day ◽  
Frank de Vocht ◽  
...  

Abstract Objectives To examine the effectiveness of randomising dissemination of the Germ Defence behaviour change website via GP practices across England UK. Trial design A two-arm (1:1 ratio) cluster randomised controlled trial implementing Germ Defence via GP practices compared with usual care. Participants Setting: All Primary care GP practices in England. Participants: All patients aged 16 years and over who were granted access by participating GP practices. Intervention and comparator Intervention: We will ask staff at GP practices randomised to the intervention arm to share the weblink to Germ Defence with all adult patients registered at their practice during the 4-month trial implementation period and care will otherwise follow current standard management. Germ Defence is an interactive website (http://GermDefence.org/) employing behaviour change techniques and practical advice on how to reduce the spread of infection in the home. The coronavirus version of Germ Defence helps people understand what measures to take and when to take them to avoid infection. This includes hand washing, avoiding sharing rooms and surfaces, dealing with deliveries and ventilating rooms. Using behaviour change techniques, it helps users think through and adopt better home hygiene habits and find ways to solve any barriers, providing personalised goal setting and tailored advice that fits users’ personal circumstances and problem solving to overcome barriers. Comparator: Patients at GP practices randomised to the usual care arm will receive current standard management for the 4-month trial period after which we will ask staff to share the link to Germ Defence with all adult patients registered at their practice. Main outcomes The primary outcome is the effects of implementing Germ Defence on prevalence of all respiratory tract infection diagnoses during the 4-month trial implementation period. The secondary outcomes are: 1) incidence of COVID-19 diagnoses 2) incidence of COVID-19 symptom presentation 3) incidence of gastrointestinal infections 4) number of primary care consultations 5) antibiotic usage 6) hospital admissions 7) uptake of GP practices disseminating Germ Defence to their patients 8) usage of the Germ Defence website by individuals who were granted access by their GP practice Randomisation GP practices will be randomised on a 1:1 basis by the independent Bristol Randomised Trials Collaboration (BRTC). Clinical Commission Groups (CCGs) in England will be divided into blocks according to region, and equal numbers in each block will be randomly allocated to intervention or usual care. The randomisation schedule will be generated in Stata statistical software by a statistician not otherwise involved in the enrolment of general practices into the study. Blinding (masking) The principal investigators, the statistician and study collaborators will remain blinded from the identity of randomised practices until the end of the study. Numbers to be randomised (sample size) To detect planned effect size (based on PRIMIT trial, Little et al, 2015): 11.1 million respondents from 6822 active GP practices. Assuming 25% of these GP practices will engage, we will contact all GP practices in England spread across 135 Clinical Commissioning Groups. Trial status Protocol version 2.0, dated 13 January 2021. Implementation is ongoing. The implementation period started on 10 November 2020 and will end on 10 March 2021. Trial registration This trial was registered in the ISRCTN registry (isrctn.com/ISRCTN14602359) on 12 August 2020. Full protocol The full protocol is attached as an additional file, accessible from the Trials website (Additional file 1). In the interest in expediting dissemination of this material, the familiar formatting has been eliminated; this Letter serves as a summary of the key elements of the full protocol.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Kujin Tang ◽  
Jie Ren ◽  
Fengzhu Sun

AbstractAlignment-free methods, more time and memory efficient than alignment-based methods, have been widely used for comparing genome sequences or raw sequencing samples without assembly. However, in this study, we show that alignment-free dissimilarity calculated based on sequencing samples can be overestimated compared with the dissimilarity calculated based on their genomes, and this bias can significantly decrease the performance of the alignment-free analysis. Here, we introduce a new alignment-free tool, Alignment-Free methods Adjusted by Neural Network (Afann) that successfully adjusts this bias and achieves excellent performance on various independent datasets. Afann is freely available at https://github.com/GeniusTang/Afann.


2018 ◽  
Author(s):  
Diogo Pratas ◽  
Armando J. Pinho ◽  
Raquel M. Silva ◽  
João M. O. S. Rodrigues ◽  
Morteza Hosseini ◽  
...  

The general approaches to detect and quantify metagenomic sample composition are based on the alignment of the reads, according to an existing database containing reference microbial sequences. However, without proper parameterization, these methods are not suitable for ancient DNA. Quantifying somewhat dissimilar sequences by alignment methods is problematic, due to the need of fine-tuned thresholds, considering relaxed edit distances and the consequent increase of computational cost. Additionally, the choice of the thresholds poses the problem of how to quantify similarity without producing overestimated measures. We propose FALCON-meta, a compression-based method to infer metagenomic composition of next-generation sequencing samples. This unsupervised alignment-free method runs efficiently on FASTQ samples. FALCON-meta quickly learns how to give importance to the models that cooperate to predict similarity, incorporating parallelism and flexibility for multiple hardware characteristics. It shows substantial identification capabilities in ancient DNA without overestimation. In one of the examples, we found and authenticated an ancient Pseudomonas bacteria in a Mammoth mitogenome.FALCON-meta can be accessed at https://github.com/pratas/falcon.


Sign in / Sign up

Export Citation Format

Share Document