scholarly journals Taxonomy assignment approach determines the efficiency of identification of OTUs in marine nematodes

2017 ◽  
Vol 4 (8) ◽  
pp. 170315 ◽  
Author(s):  
Oleksandr Holovachov ◽  
Quiterie Haenel ◽  
Sarah J. Bourlat ◽  
Ulf Jondelius

Precision and reliability of barcode-based biodiversity assessment can be affected at several steps during acquisition and analysis of data. Identification of operational taxonomic units (OTUs) is one of the crucial steps in the process and can be accomplished using several different approaches, namely, alignment-based, probabilistic, tree-based and phylogeny-based. The number of identified sequences in the reference databases affects the precision of identification. This paper compares the identification of marine nematode OTUs using alignment-based, tree-based and phylogeny-based approaches. Because the nematode reference dataset is limited in its taxonomic scope, OTUs can only be assigned to higher taxonomic categories, families. The phylogeny-based approach using the evolutionary placement algorithm provided the largest number of positively assigned OTUs and was least affected by erroneous sequences and limitations of reference data, compared to alignment-based and tree-based approaches.

2016 ◽  
Author(s):  
Alexey M. Kozlov ◽  
Jiajie Zhang ◽  
Pelin Yilmaz ◽  
Frank Oliver Glöckner ◽  
Alexandros Stamatakis

AbstractMolecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labour-intensive manual curation process.Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences (“mislabels”) using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity / 91.7% precision) as well as correction (94.9% sensitivity / 89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria.SATIVA is freely available at https://github.com/amkozlov/sativa.


Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Elisa Banchi ◽  
Claudio G Ametrano ◽  
Samuele Greco ◽  
David Stanković ◽  
Lucia Muggia ◽  
...  

Abstract DNA metabarcoding combines DNA barcoding with high-throughput sequencing to identify different taxa within environmental communities. The ITS has already been proposed and widely used as universal barcode marker for plants, but a comprehensive, updated and accurate reference dataset of plant ITS sequences has not been available so far. Here, we constructed reference datasets of Viridiplantae ITS1, ITS2 and entire ITS sequences including both Chlorophyta and Streptophyta. The sequences were retrieved from NCBI, and the ITS region was extracted. The sequences underwent identity check to remove misidentified records and were clustered at 99% identity to reduce redundancy and computational effort. For this step, we developed a script called ‘better clustering for QIIME’ (bc4q) to ensure that the representative sequences are chosen according to the composition of the cluster at a different taxonomic level. The three datasets obtained with the bc4q script are PLANiTS1 (100 224 sequences), PLANiTS2 (96 771 sequences) and PLANiTS (97 550 sequences), and all are pre-formatted for QIIME, being this the most used bioinformatic pipeline for metabarcoding analysis. Being curated and updated reference databases, PLANiTS1, PLANiTS2 and PLANiTS are proposed as a reliable, pivotal first step for a general standardization of plant DNA metabarcoding studies. The bc4q script is presented as a new tool useful in each research dealing with sequences clustering. Database URL: https://github.com/apallavicini/bc4q; https://github.com/apallavicini/PLANiTS.


2010 ◽  
Vol 16 (2) ◽  
pp. 254-265 ◽  
Author(s):  
Žilvinas Stankevičius ◽  
Giedrė Beconytė ◽  
Aušra Kalantaitė

Unified geo‐reference data model is a very important part of national geographic information management. It has been developed within the project of Lithuanian geographic information infrastructure in 2006–2008. This model allows automated integration of large scale (mainly municipality) geo‐reference data into the unified national geo‐reference database. It is based on unique object identifiers across all geo‐reference databases and on standard update and harmonisation procedures. The common stages of harmonisation of geo‐reference databases at different scales include: implementation of a unique identifier of geographic objects across all databases concerned; definition of the life cycle of the objects; definition of cohesion boundary and of the harmonisation points along the boundary; maintenance of the local database and automatic update of the national database using special service. When implemented, such model will significantly facilitate maintenance of national geo‐reference database and in five years from full implementation will have a significant economic effect. Santrauka Lietuvoje atlikta savivaldybėse kaupiamų erdvinių duomenų analizė parodė, kad tik didesniu miestų savivaldybės kaupia erdvinius duomenis, tačiau erdvinių duomenų sandaros skirtingos. Nacionaliniu lygmeniu kuriamos erdviniu duomenų bazės nesuderintos tarpusavyje, dubliuojamas erdviniu duomenų kaupimo procesas, orientuojantis į skirtingų masteliu žemelapių gamyba. Bendras georeferenciniu duomenų modelis (VGDM) apima georeferencinių duomenų konversija iš įvairių mastelių oficialių geografinių duomenų rinkinių, o ypač iš savivaldybių georeferencinių duomenų rinkinių į bendrą valstybės georeferencinių duomenų bazę (VGDB) ir nuolatinės VGDB atnaujinimo procedūras. VGDB atnaujinimo technologijos pagrindas ‐ geoobjektų (vektorinių geografinių duomenų elementų) egzistavimo ciklas ir pokyčių sekimas. Georeferencinių duomenų modelis reiškia, kad yra numatytas kelias pasiekti efektyvią įvairių mastelių oficialių duomenų bazių sąveiką.


Author(s):  
N. Soyama ◽  
K. Muramatsu ◽  
M. Daigo ◽  
F. Ochiai ◽  
N. Fujiwara

Validating the accuracy of land cover products using a reliable reference dataset is an important task. A reliable reference dataset is produced with information derived from ground truth data. Recently, the amount of ground truth data derived from information collected by volunteers has been increasing globally. The acquisition of volunteer-based reference data demonstrates great potential. However information given by volunteers is limited useful vegetation information to produce a complete reference dataset based on the plant functional type (PFT) with five specialized forest classes. In this study, we examined the availability and applicability of FLUXNET information to produce reference data with higher levels of reliability. FLUXNET information was useful especially for forest classes for interpretation in comparison with the reference dataset using information given by volunteers.


Atmosphere ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 45
Author(s):  
Angelina Metaxatos ◽  
Sydonia Manibusan ◽  
Gediminas Mainelis

We characterized the composition, diversity, and potential bacterial aerosol sources in Athens’ urban air by DNA barcoding (analysis of 16S rRNA genes) during three seasons in 2019. Air samples were collected using the recently developed Rutgers Electrostatic Passive Sampler (REPS). It is the first field application of REPS to study bacterial aerosol diversity. REPS samplers captured a sufficient amount of biological material to demonstrate the diversity of airborne bacteria and their variability over time. Overall, in the air of Athens, we detected 793 operational taxonomic units (OTUs), which were fully classified into the six distinct taxonomic categories (Phylum, Class, Order, etc.). These OTUs belonged to Phyla Actinobacteria, Firmicutes, Proteobacteria, Bacteroidetes, Cyanobacteria, and Fusobacteria. We found a complex community of bacterial aerosols with several opportunistic or potential pathogens in Athens’ urban air. Referring to the available literature, we discuss the likely sources of observed airborne bacteria, including soil, plants, animals, and humans. Our results on bacterial diversity are comparable to earlier studies, even though the sampling sites are different or geographically distant. However, the exact functional and ecological role of bioaerosols and, even more importantly, their impact on public health and the ecosystem requires further air monitoring and analysis.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Wenjing Ma ◽  
Kenong Su ◽  
Hao Wu

Abstract Background Cell type identification is one of the most important questions in single-cell RNA sequencing (scRNA-seq) data analysis. With the accumulation of public scRNA-seq data, supervised cell type identification methods have gained increasing popularity due to better accuracy, robustness, and computational performance. Despite all the advantages, the performance of the supervised methods relies heavily on several key factors: feature selection, prediction method, and, most importantly, choice of the reference dataset. Results In this work, we perform extensive real data analyses to systematically evaluate these strategies in supervised cell identification. We first benchmark nine classifiers along with six feature selection strategies and investigate the impact of reference data size and number of cell types in cell type prediction. Next, we focus on how discrepancies between reference and target datasets and how data preprocessing such as imputation and batch effect correction affect prediction performance. We also investigate the strategies of pooling and purifying reference data. Conclusions Based on our analysis results, we provide guidelines for using supervised cell typing methods. We suggest combining all individuals from available datasets to construct the reference dataset and use multi-layer perceptron (MLP) as the classifier, along with F-test as the feature selection method. All the code used for our analysis is available on GitHub (https://github.com/marvinquiet/RefConstruction_supervisedCelltyping).


2021 ◽  
Vol 4 ◽  
Author(s):  
Arne Beermann ◽  
Dominik Buchner ◽  
Florian Leese ◽  
Till-Hendrik Macher ◽  
Miroslav Ocadlik ◽  
...  

The Joint Danube Survey (JDS) is a multinational effort in monitoring Danube’s water quality, including its major tributaries. The Danube river stretches over a distance of 2,800 km and flows through or borders 10 different countries to which it is of utter importance as a source of potable water and hydrodynamic power. The JDS is conducted every 6 years and provides a unique opportunity to collect comprehensive data on both abiotic parameters and organisms and to raise awareness of the importance of water as a natural resource. As part of JDS and as a biological quality element in many monitoring programs worldwide, macroinvertebrates are monitored as indicators for various environmental conditions. However, due to their diverse taxonomic composition, associated difficulties with their morphology-based identification as well as their sheer abundance, macroinvertebrates are often analysed with a low taxonomic resolution (i.e., above species level). As an alternative, DNA metabarcoding offers a promising approach to capture this species diversity more accurately. Here, we used DNA metabarcoding to investigate the macrozoobenthic diversity of 46 sites from the latest JDS sampling campaign in 2019. To analyse macroinvertebrate diversity, bulk samples were taken by kick-net sampling and analysed using two different approaches, analysing the bulk sample fixative and analysing homogenised organisms from complete bulk samples. DNA metabarcoding of the sample fixative revealed 1,146 Operational Taxonomic Units (OTUs) and 231 species compared to 833 OTUs and 333 species from homogenised sample analysis. While more dipterans, in particular Chironomidae, were detected in fixative (136 species) than homogenised bulk (90 species) analyses, the latter picked up more Trichoptera (19 vs. 2), Amphipoda (10 vs. 4) and Bivalvia species (13 vs. 5). Even though these results of a DNA-based assessment deliver new insights into species richness and composition of Danube’s macroinvertebrate communities from the Danube source to its delta already, it is evident that the majority of OTUs was not assigned to species. While filling this lack of reference sequences poses a major challenge, the JDS consortium also offers a unique opportunity to complement reference databases in a multinational effort towards a more comprehensive Danube assessment and monitoring.


2014 ◽  
Author(s):  
Benjamin Bomfleur ◽  
Guido W Grimm ◽  
Stephen McLoughlin

The systematic classification of Osmundaceae has long remained controversial. Recent molecular data indicate that Osmunda is paraphyletic, and needs to be separated into Osmundastrum and Osmunda s. str. Here we describe an exquisitely preserved Jurassic Osmunda rhizome (O. pulchella sp. nov.) that combines diagnostic features of Osmundastrum and Osmunda, calling molecular evidence for paraphyly into question. We assembled a new morphological matrix based on rhizome anatomy, and used network analyses to establish phylogenetic relationships between fossil and extant members of modern Osmundaceae. We re-analysed the original molecular data to evaluate root-placement support. Finally, we integrated morphological and molecular data-sets using the evolutionary placement algorithm. Osmunda pulchella and five additional, newly identified Jurassic Osmunda species show anatomical character suites intermediate between Osmundastrum and Osmunda. Molecular evidence for paraphyly is ambiguous: a previously unrecognized signal from spacer sequences favours an alternative root placement that would resolve Osmunda s.l. as monophyletic. Our evolutionary placement analysis identifies fossil species as ancestral members of modern genera and subgenera. Altogether, the seemingly conflicting evidence from morphological, anatomical, molecular, and palaeontological data can be elegantly reconciled under the assumption that Osmunda is indeed monophyletic; the recently proposed root-placement in Osmundaceae—based solely on molecular data—likely results from un- or misinformative out-group signals.


Author(s):  
Çaglar Bayık ◽  
Kazimierz Becek ◽  
Çetin Mekik ◽  
Mustafa Özendi

The digital elevation model (DEM) is one of the key geospatial datasets used in many fields of engineering and science for countless applications. In this contribution, we assess the vertical accuracy of the Advanced Land Observing Satellite (ALOS) World 3D-30m (AW3D30) DEM using the runway method (RWYM). The RWYM utilizes the longitudinal profiles of runways which are reliable and ubiquitous reference data. A reference dataset used in this project consists of 36 runways located at various points throughout the world. The same dataset was previously used to test the accuracy of WorldDEMTM.  Our study indicates that AW3D30 has a remarkably high RMSE of 1.78 m (one σ). However, while analyzing the results, it has become apparent that it also contains a widespread elevation anomaly. We conclude that this anomaly is the result of uncompensated sensor noise and the data processing algorithm (downsampling of the higher resolution data). We believe that this issue should be communicated to the user community. Also, we would like to note that the traditional accuracy assessment of a DEM, e.g., statistical assessment of the elevation differences = model – reference, does not allow for identification of these type of anomalies in a DEM.


Sign in / Sign up

Export Citation Format

Share Document