scholarly journals Current practice in plankton metabarcoding: optimization and error management

2019 ◽  
Vol 41 (5) ◽  
pp. 571-582 ◽  
Author(s):  
Luciana F Santoferrara

Abstract High-throughput sequencing of a targeted genetic marker is being widely used to analyze biodiversity across taxa and environments. Amid a multitude of exciting findings, scientists have also identified and addressed technical and biological limitations. Improved study designs and alternative sampling, lab and bioinformatic procedures have progressively enhanced data quality, but some problems persist. This article provides a framework to recognize and bypass the main types of errors that can affect metabarcoding data: false negatives, false positives, artifactual variants, disproportions and incomplete or incorrect taxonomic identifications. It is crucial to discern potential error impacts on different ecological parameters (e.g. taxon distribution, community structure, alpha and beta-diversity), as error management implies compromises and is thus directed by the research question. Synthesis of multiple plankton metabarcoding evaluations (mock sample sequencing or microscope comparisons) shows that high-quality data for qualitative and some semiquantitative goals can be achieved by implementing three checkpoints: first, rigorous protocol optimization; second, error minimization; and third, downstream analysis that considers potentially remaining biases. Conclusions inform us about the reliability of metabarcoding for plankton studies and, because plankton provides unique chances to compare genotypes and phenotypes, the robustness of this method in general.

2018 ◽  
Author(s):  
Yu Fu ◽  
Pei-Hsuan Wu ◽  
Timothy Beane ◽  
Phillip D. Zamore ◽  
Zhiping Weng

AbstractRNA-seq and small RNA-seq are powerful, quantitative tools to study gene regulation and function. Common high-throughput sequencing methods rely on polymerase chain reaction (PCR) to expand the starting material, but not every molecule amplifies equally, causing some to be overrepresented. Unique molecular identifiers (UMIs) can be used to distinguish undesirable PCR duplicates derived from a single molecule and identical but biologically meaningful reads from different molecules. We have incorporated UMIs into RNA-seq and small RNA-seq protocols and developed tools to analyze the resulting data. Our UMIs contain stretches of random nucleotides whose lengths sufficiently capture diverse molecule species in both RNA-seq and small RNA-seq libraries generated from mouse testis. Our approach yields high-quality data while allowing unique tagging of all molecules in high-depth libraries. Using simulated and real datasets, we demonstrate that our methods increase the reproducibility of RNA-seq and small RNA-seq data. Notably, we find that the amount of starting material and sequencing depth, but not the number of PCR cycles, determine PCR duplicate frequency. Finally, we show that computational removal of PCR duplicates based only on their mapping coordinates introduces substantial bias into data analysis.


Author(s):  
Marietta Kokla ◽  
Anton Klåvus ◽  
Stefania Noerman ◽  
Ville M. Koistinen ◽  
Marjo Tuomainen ◽  
...  

Metabolomics analysis generates vast arrays of data, necessitating comprehensive workflows involving expertise in analytics, biochemistry and bioinformatics, in order to provide coherent and high-quality data that enables discovery of robust and biologically significant metabolic findings. In this protocol article, we introduce NoTaMe, an analytical workflow for non-targeted metabolic profiling approaches utilizing liquid chromatography–mass spectrometry analysis. We provide an overview of lab protocols and statistical methods that we commonly practice for the analysis of nutritional metabolomics data. The paper is divided into three main sections: the first and second sections introducing the background and the study designs available for metabolomics research, and the third section describing in detail the steps of the main methods and protocols used to produce, preprocess and statistically analyze metabolomics data, and finally to identify and interpret the compounds that have emerged as interesting.


Author(s):  
Koustav Pal ◽  
Ilario Tagliaferri ◽  
Carmen Maria Livi ◽  
Francesco Ferrari

Abstract Summary Genome-wide chromosome conformation capture based on high-throughput sequencing (Hi-C) has been widely adopted to study chromatin architecture by generating datasets of ever-increasing complexity and size. HiCBricks offers user-friendly and efficient solutions for handling large high-resolution Hi-C datasets. The package provides an R/Bioconductor framework with the bricks to build more complex data analysis pipelines and algorithms. HiCBricks already incorporates functions for calling domain boundaries and functions for high-quality data visualization. Availability and implementation http://bioconductor.org/packages/devel/bioc/html/HiCBricks.html. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 2015 ◽  
pp. 1-12 ◽  
Author(s):  
Reem M. Sallam

With the introduction of recent high-throughput technologies to various fields of science and medicine, it is becoming clear that obtaining large amounts of data is no longer a problem in modern research laboratories. However, coherent study designs, optimal conditions for obtaining high-quality data, and compelling interpretation, in accordance with the evidence-based systems biology, are critical factors in ensuring the emergence of good science out of these recent technologies. This review focuses on the proteomics field and its new perspectives on cancer research. Cornerstone publications that have tremendously helped scientists and clinicians to better understand cancer pathogenesis; to discover novel diagnostic and/or prognostic biomarkers; and to suggest novel therapeutic targets will be presented. The author of this review aims at presenting some of the relevant literature data that helped as a step forward in bridging the gap between bench work results and bedside potentials. Undeniably, this review cannot include all the work that is being produced by expert research groups all over the world.


Metabolites ◽  
2020 ◽  
Vol 10 (4) ◽  
pp. 135 ◽  
Author(s):  
Anton Klåvus ◽  
Marietta Kokla ◽  
Stefania Noerman ◽  
Ville M. Koistinen ◽  
Marjo Tuomainen ◽  
...  

Metabolomics analysis generates vast arrays of data, necessitating comprehensive workflows involving expertise in analytics, biochemistry and bioinformatics in order to provide coherent and high-quality data that enable discovery of robust and biologically significant metabolic findings. In this protocol article, we introduce notame, an analytical workflow for non-targeted metabolic profiling approaches, utilizing liquid chromatography–mass spectrometry analysis. We provide an overview of lab protocols and statistical methods that we commonly practice for the analysis of nutritional metabolomics data. The paper is divided into three main sections: the first and second sections introducing the background and the study designs available for metabolomics research and the third section describing in detail the steps of the main methods and protocols used to produce, preprocess and statistically analyze metabolomics data and, finally, to identify and interpret the compounds that have emerged as interesting.


2021 ◽  
Author(s):  
Olivier J. M. Béquignon ◽  
Brandon J. Bongers ◽  
Willem Jespers ◽  
Ad P. IJzerman ◽  
Bob van de Water ◽  
...  

With the recent rapid growth of publicly available ligand-protein bioactivity data, there is a trove of viable data that can be used to train machine learning algorithms. However, not all data is equal in terms of size and quality, and a significant portion of researcher’s time is needed to adapt the data to their needs. On top of that, finding the right data for a research question can often be a challenge on its own. As an answer to that, we have constructed the Papyrus dataset (DOI: 10.4121/16896406), comprised of around 60 million datapoints. This dataset contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with several smaller datasets containing high quality data. The aggregated data has been standardised and normalised in a manner that is suitable for machine learning. We show how data can be filtered in a variety of ways, and also perform some baseline quantitative structure-activity relationship analyses and proteochemometrics modeling. Our ambition is this pruned data collection constitutes a benchmark set that can be used for constructing predictive models, while also providing a solid baseline for related research.


2020 ◽  
Author(s):  
James McDonagh ◽  
William Swope ◽  
Richard L. Anderson ◽  
Michael Johnston ◽  
David J. Bray

Digitization offers significant opportunities for the formulated product industry to transform the way it works and develop new methods of business. R&D is one area of operation that is challenging to take advantage of these technologies due to its high level of domain specialisation and creativity but the benefits could be significant. Recent developments of base level technologies such as artificial intelligence (AI)/machine learning (ML), robotics and high performance computing (HPC), to name a few, present disruptive and transformative technologies which could offer new insights, discovery methods and enhanced chemical control when combined in a digital ecosystem of connectivity, distributive services and decentralisation. At the fundamental level, research in these technologies has shown that new physical and chemical insights can be gained, which in turn can augment experimental R&D approaches through physics-based chemical simulation, data driven models and hybrid approaches. In all of these cases, high quality data is required to build and validate models in addition to the skills and expertise to exploit such methods. In this article we give an overview of some of the digital technology demonstrators we have developed for formulated product R&D. We discuss the challenges in building and deploying these demonstrators.<br>


Author(s):  
Mary Kay Gugerty ◽  
Dean Karlan

Without high-quality data, even the best-designed monitoring and evaluation systems will collapse. Chapter 7 introduces some the basics of collecting high-quality data and discusses how to address challenges that frequently arise. High-quality data must be clearly defined and have an indicator that validly and reliably measures the intended concept. The chapter then explains how to avoid common biases and measurement errors like anchoring, social desirability bias, the experimenter demand effect, unclear wording, long recall periods, and translation context. It then guides organizations on how to find indicators, test data collection instruments, manage surveys, and train staff appropriately for data collection and entry.


2021 ◽  
Vol 13 (7) ◽  
pp. 1387
Author(s):  
Chao Li ◽  
Jinhai Zhang

The high-frequency channel of lunar penetrating radar (LPR) onboard Yutu-2 rover successfully collected high quality data on the far side of the Moon, which provide a chance for us to detect the shallow subsurface structures and thickness of lunar regolith. However, traditional methods cannot obtain reliable dielectric permittivity model, especially in the presence of high mix between diffractions and reflections, which is essential for understanding and interpreting the composition of lunar subsurface materials. In this paper, we introduce an effective method to construct a reliable velocity model by separating diffractions from reflections and perform focusing analysis using separated diffractions. We first used the plane-wave destruction method to extract weak-energy diffractions interfered by strong reflections, and the LPR data are separated into two parts: diffractions and reflections. Then, we construct a macro-velocity model of lunar subsurface by focusing analysis on separated diffractions. Both the synthetic ground penetrating radar (GPR) and LPR data shows that the migration results of separated reflections have much clearer subsurface structures, compared with the migration results of un-separated data. Our results produce accurate velocity estimation, which is vital for high-precision migration; additionally, the accurate velocity estimation directly provides solid constraints on the dielectric permittivity at different depth.


Societies ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. 65
Author(s):  
Clem Brooks ◽  
Elijah Harter

In an era of rising inequality, the U.S. public’s relatively modest support for redistributive policies has been a puzzle for scholars. Deepening the paradox is recent evidence that presenting information about inequality increases subjects’ support for redistributive policies by only a small amount. What explains inequality information’s limited effects? We extend partisan motivated reasoning scholarship to investigate whether political party identification confounds individuals’ processing of inequality information. Our study considers a much larger number of redistribution preference measures (12) than past scholarship. We offer a second novelty by bringing the dimension of historical time into hypothesis testing. Analyzing high-quality data from four American National Election Studies surveys, we find new evidence that partisanship confounds the interrelationship of inequality information and redistribution preferences. Further, our analyses find the effects of partisanship on redistribution preferences grew in magnitude from 2004 through 2016. We discuss implications for scholarship on information, motivated reasoning, and attitudes towards redistribution.


Sign in / Sign up

Export Citation Format

Share Document