scholarly journals A comparative study of metagenomics analysis pipelines at the species level

2016 ◽  
Author(s):  
Yee Voan Teo ◽  
Nicola Neretti

AbstractMany metagenomics classification tools have been developed with the rapid growth of the metagenomics field. However, the classification of closely related species remains a challenge for this field. Here, we compared MetaPhlAn2, kallisto and Kraken for their performances in two metagenomics settings, human metagenomics and environmental metagenomics. Our comparative study showed that kallisto demonstrated higher sensitivity than MetaPhlAn2 and Kraken and better quantification accuracy than Kraken at the species level. We also showed that classification tools that run on full reference genomes misidentified many species that were not truly present. In order to reduce false positives, we introduced marker genes from MetaPhlAn2 into our pipeline, which uses kallisto for the classification step, as an additional filtering step for species detection.

2021 ◽  
Vol 503 (2) ◽  
pp. 1828-1846
Author(s):  
Burger Becker ◽  
Mattia Vaccari ◽  
Matthew Prescott ◽  
Trienko Grobler

ABSTRACT The morphological classification of radio sources is important to gain a full understanding of galaxy evolution processes and their relation with local environmental properties. Furthermore, the complex nature of the problem, its appeal for citizen scientists, and the large data rates generated by existing and upcoming radio telescopes combine to make the morphological classification of radio sources an ideal test case for the application of machine learning techniques. One approach that has shown great promise recently is convolutional neural networks (CNNs). Literature, however, lacks two major things when it comes to CNNs and radio galaxy morphological classification. First, a proper analysis of whether overfitting occurs when training CNNs to perform radio galaxy morphological classification using a small curated training set is needed. Secondly, a good comparative study regarding the practical applicability of the CNN architectures in literature is required. Both of these shortcomings are addressed in this paper. Multiple performance metrics are used for the latter comparative study, such as inference time, model complexity, computational complexity, and mean per class accuracy. As part of this study, we also investigate the effect that receptive field, stride length, and coverage have on recognition performance. For the sake of completeness, we also investigate the recognition performance gains that we can obtain by employing classification ensembles. A ranking system based upon recognition and computational performance is proposed. MCRGNet, Radio Galaxy Zoo, and ConvXpress (novel classifier) are the architectures that best balance computational requirements with recognition performance.


2019 ◽  
Author(s):  
Yu Liu ◽  
Paul W Bible ◽  
Bin Zou ◽  
Qiaoxing Liang ◽  
Cong Dong ◽  
...  

Abstract Motivation Microbiome analyses of clinical samples with low microbial biomass are challenging because of the very small quantities of microbial DNA relative to the human host, ubiquitous contaminating DNA in sequencing experiments and the large and rapidly growing microbial reference databases. Results We present computational subtraction-based microbiome discovery (CSMD), a bioinformatics pipeline specifically developed to generate accurate species-level microbiome profiles for clinical samples with low microbial loads. CSMD applies strategies for the maximal elimination of host sequences with minimal loss of microbial signal and effectively detects microorganisms present in the sample with minimal false positives using a stepwise convergent solution. CSMD was benchmarked in a comparative evaluation with other classic tools on previously published well-characterized datasets. It showed higher sensitivity and specificity in host sequence removal and higher specificity in microbial identification, which led to more accurate abundance estimation. All these features are integrated into a free and easy-to-use tool. Additionally, CSMD applied to cell-free plasma DNA showed that microbial diversity within these samples is substantially broader than previously believed. Availability and implementation CSMD is freely available at https://github.com/liuyu8721/csmd. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Yu. A. Sakhno

This article deals with the study of the structural and semantic features of tactile verbs (hereinafter TVs) in English, German and Russian. Particular attention is paid to the comparative study of TVs, which allows us to identify structural and semantic similarities and differences of linguistic units studied. The structural and semantic classification of TVs in the compared languages is also provided.


2015 ◽  
Vol 87 (1) ◽  
pp. 15-27 ◽  
Author(s):  
José D. Ferreira ◽  
Martín Zamorano ◽  
Ana Maria Ribeiro

The genus Panochthus represents the last lineage of "Panochthini" recorded in the Pleistocene. This genus has a wide latitudinal distribution in South America, and in Brazil it occurs in the southern and northeastern regions. In this paper we describe new material (isolated osteoderms and caudal tube fragments) assigned to Panochthus from the state of Rio Grande do Sul (southern Brazil) and discuss some taxonomic issues related to Panochthus tuberculatus and Panochthus greslebini based on this material . The occurrence of P. greslebini is the first for outside the Brazilian Intertropical Region. In addition, we describe new diagnostic features to differentiate the osteoderms of P. greslebini and P. tuberculatus. Unfortunately, it was not possible to identify some osteoderms at the species level. Interestingly, they showed four distinct morphotypes characterized by their external morphology, and thus were attributed to Panochthus sp. Lastly, we conclude that in addition to P.tuberculatus registered to southern Brazil, there is another species of the genus, assignable to P. cf. P. greslebini. Our analysis reinforce the reliability of caudal tube characters for the classification of species of Panochthus.


Author(s):  
J. A. Allen

The survey of the sublittoral fauna of the Clyde Sea Area from 1949 onwards has shown that five species of the Protobranchiata are abundant throughout this region on a variety of substrata. Pelseneer (1891, 1899, 1911), Heath (1937), and Yonge (1939) have contributed much to the knowledge of the group as a whole, but little comparative work has been done at species level. Verrill & Bush (1897, 1898) studied the shell characters of the American Atlantic species. Moore (1931 a, b) worked on the faecal pellets of the British Nuculidae and attempted to distinguish the species by this means, while Winckworth (1930,1931), mainly in the light of the latter work, attempted to clarify the nomenclature of these species. Winckworth (1932) lists six British species of the family Nuculidae: Nucula sulcata Bronn, N. nucleus (Linné), N. hanleyi Winckworth, N. turgida Leckenby & Marshall, N. moorei Winckworth and N. tenuis (Montagu); and four species of the family Nuculanidae: Nuculana minuta (Müller), Yoldiella lucida (Loven), Y. tomlini Winckworth and Phaseolus pusillus (Jeffreys). All species of Nucula, except N. hanleyi, were taken from the Clyde Sea Area, although the latter species is included in the Clyde fauna list (Scott Elliot, Laurie & Murdoch, 1901). Only Nuculana minuta of the Nuculanidae has been taken on the present survey. Yoldiella tomlini is included in the 1901 list but is noted as being ‘insufficiently attested’. Nucula hanleyi was obtained from the Marine Station, Port Erin, but Yoldiella and Phaseolus were unobtainable.


2017 ◽  
Author(s):  
Zhemin Zhou ◽  
Nina Luhmann ◽  
Nabil-Fareed Alikhan ◽  
Christopher Quince ◽  
Mark Achtman

AbstractExploring the genetic diversity of microbes within the environment through metagenomic sequencing first requires classifying these reads into taxonomic groups. Current methods compare these sequencing data with existing biased and limited reference databases. Several recent evaluation studies demonstrate that current methods either lack sufficient sensitivity for species-level assignments or suffer from false positives, overestimating the number of species in the metagenome. Both are especially problematic for the identification of low-abundance microbial species, e. g. detecting pathogens in ancient metagenomic samples. We present a new method, SPARSE, which improves taxonomic assignments of metagenomic reads. SPARSE balances existing biased reference databases by grouping reference genomes into similarity-based hierarchical clusters, implemented as an efficient incremental data structure. SPARSE assigns reads to these clusters using a probabilistic model, which specifically penalizes non-specific mappings of reads from unknown sources and hence reduces false-positive assignments. Our evaluation on simulated datasets from two recent evaluation studies demonstrated the improved precision of SPARSE in comparison to other methods for species-level classification. In a third simulation, our method successfully differentiated multiple co-existing Escherichia coli strains from the same sample. In real archaeological datasets, SPARSE identified ancient pathogens with ≤ 0.02% abundance, consistent with published findings that required additional sequencing data. In these datasets, other methods either missed targeted pathogens or reported non-existent ones. SPARSE and all evaluation scripts are available at https://github.com/zheminzhou/SPARSE.


Sign in / Sign up

Export Citation Format

Share Document