PIMBA: a PIpeline for MetaBarcoding Analysis

DNA metabarcoding is an emerging monitoring method capable of assessing biodiversity from environmental samples (eDNA). Advances in computational tools have been required due to the increase of Next-Generation Sequencing data. Tools for DNA metabarcoding analysis, such as MOTHUR, QIIME, Obitools, and mBRAVE have been widely used in ecological studies. However, some difficulties are encountered when there is a need to use custom databases. Here we present PIMBA, a PIpeline for MetaBarcoding Analysis, which allows the use of customized databases, as well as other reference databases used by the softwares mentioned here. PIMBA is an open-source and user-friendly pipeline that consolidates all analyses in just three command lines.

Download Full-text

Blacklisting variants common in private cohorts but not in public databases optimizes human exome analysis

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1808403116 ◽

2018 ◽

Vol 116 (3) ◽

pp. 950-959 ◽

Cited By ~ 16

Author(s):

Patrick Maffucci ◽

Benedetta Bigio ◽

Franck Rapaport ◽

Aurélie Cobat ◽

Alessandro Borghesi ◽

...

Keyword(s):

Reference Genome ◽

Low Complexity ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Human Patient ◽

Reference Genome Assembly ◽

Exome Analysis ◽

User Friendly ◽

Computational Analyses ◽

Generation Sequencing

Computational analyses of human patient exomes aim to filter out as many nonpathogenic genetic variants (NPVs) as possible, without removing the true disease-causing mutations. This involves comparing the patient’s exome with public databases to remove reported variants inconsistent with disease prevalence, mode of inheritance, or clinical penetrance. However, variants frequent in a given exome cohort, but absent or rare in public databases, have also been reported and treated as NPVs, without rigorous exploration. We report the generation of a blacklist of variants frequent within an in-house cohort of 3,104 exomes. This blacklist did not remove known pathogenic mutations from the exomes of 129 patients and decreased the number of NPVs remaining in the 3,104 individual exomes by a median of 62%. We validated this approach by testing three other independent cohorts of 400, 902, and 3,869 exomes. The blacklist generated from any given cohort removed a substantial proportion of NPVs (11–65%). We analyzed the blacklisted variants computationally and experimentally. Most of the blacklisted variants corresponded to false signals generated by incomplete reference genome assembly, location in low-complexity regions, bioinformatic misprocessing, or limitations inherent to cohort-specific private alleles (e.g., due to sequencing kits, and genetic ancestries). Finally, we provide our precalculated blacklists, together with ReFiNE, a program for generating customized blacklists from any medium-sized or large in-house cohort of exome (or other next-generation sequencing) data via a user-friendly public web server. This work demonstrates the power of extracting variant blacklists from private databases as a specific in-house but broadly applicable tool for optimizing exome analysis.

Download Full-text

SAVE: A secure cloud-based pipeline for CRISPR pooled screen deconvolution

10.1101/110262 ◽

2017 ◽

Cited By ~ 2

Author(s):

Hyun-Hwan Jeong ◽

Seon Young Kim ◽

Maxime WC Rosseaux ◽

Huda Y Zoghbi ◽

Zhandong Liu

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Analysis Pipeline ◽

Web Based ◽

Screening Experiments ◽

Accelerate Development ◽

User Friendly ◽

Generation Sequencing

AbstractWe present a user-friendly, cloud-based, data analysis pipeline for the deconvolution of pooled screening data. This tool, termed SAVE for Screening Analysis Visual Explorer, serves a dual purpose of extracting, clustering and analyzing raw next generation sequencing files derived from pooled screening experiments while at the same time presenting them in a user-friendly way on a secure web-based platform. Moreover, SAVE serves as a useful web-based analysis pipeline for reanalysis of pooled CRISPR screening datasets. Taken together, the framework described in this study is expected to accelerate development of web-based bioinformatics tool for handling all studies which include next generation sequencing data. SAVE is available at http://save.nrihub.org.

Download Full-text

Large Disclosing the Nature of Computational Tools for the Analysis of Next Generation Sequencing Data

Current Topics in Medicinal Chemistry ◽

10.2174/156802612801319007 ◽

2012 ◽

Vol 12 (12) ◽

pp. 1320-1330 ◽

Cited By ~ 7

Author(s):

Francesca Cordero ◽

Marco Beccuti ◽

Susanna Donatelli ◽

Raffaele A. Calogero

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Computational Tools ◽

Generation Sequencing

Download Full-text

Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives

BMC Bioinformatics ◽

10.1186/1471-2105-14-s11-s1 ◽

2013 ◽

Vol 14 (S11) ◽

Cited By ~ 287

Author(s):

Min Zhao ◽

Qingguo Wang ◽

Quan Wang ◽

Peilin Jia ◽

Zhongming Zhao

Keyword(s):

Next Generation Sequencing ◽

Copy Number Variation ◽

Copy Number ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Computational Tools ◽

Number Variation ◽

Generation Sequencing ◽

Cnv Detection

Download Full-text

VDAP-GUI: a user-friendly pipeline for variant discovery and annotation of raw next-generation sequencing data

3 Biotech ◽

10.1007/s13205-016-0382-1 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 5

Author(s):

Ramesh Menon ◽

Namrata V. Patel ◽

Amitbikram Mohapatra ◽

Chaitanya G. Joshi

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Variant Discovery ◽

User Friendly ◽

Generation Sequencing

Download Full-text

SIRA-HIV: A User-friendly System to Evaluate HIV-1 Drug Resistance from Next-generation Sequencing Data

Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies ◽

10.5220/0008874700930100 ◽

2020 ◽

Author(s):

Letícia Raposo ◽

Mônica Arruda ◽

Rodrigo Brindeiro ◽

Flavio Nobre

Keyword(s):

Drug Resistance ◽

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

User Friendly ◽

Generation Sequencing ◽

Hiv 1

Download Full-text

GeMSTONE: Orchestrated Prioritization of Human Germline Mutations in the Cloud

10.1101/052001 ◽

2016 ◽

Author(s):

Siwei Chen ◽

Juan Felipe Beltrán ◽

Xiaomu Wei ◽

Steven Lipkin ◽

Clara Esteban-Jurado ◽

...

Keyword(s):

Next Generation Sequencing Data ◽

Sequencing Data ◽

Variant Prioritization ◽

Computational Tools ◽

Exome Sequencing Data ◽

Comprehensive Collection ◽

High Level ◽

Scoring Tool ◽

Data Libraries ◽

Generation Sequencing

Integrative analysis of whole-genome/exome-sequencing data has been challenging, especially for the non-programming research community, as it requires leveraging an inordinate number of computational tools. Even computational biologists find it unexpectedly difficult to reproduce results from others or optimize their own strategies in an end-to-end workflow. We introduce Germline Mutation Scoring Tool fOr Next-generation sEquencing data (GeMSTONE), a cloud- based variant prioritization tool with high-level customization and a comprehensive collection of bioinformatics tools and data libraries (http://gemstone.yulab.org/). GeMSTONE generates and readily accepts a sharable 'recipe' file for each run to either replicate existing results or analyze new data with identical parameters.

Download Full-text

Faculty Opinions recommendation of VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718272765.793499663 ◽

2014 ◽

Author(s):

Gary Bader ◽

Mohamed Helmy

Keyword(s):

Next Generation Sequencing ◽

Network Analysis ◽

Next Generation Sequencing Data ◽

Cancer Genes ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

Faculty Opinions recommendation of Bioinformatory-assisted analysis of next-generation sequencing data for precision medicine in pancreatic cancer.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.727775566.793536095 ◽

2017 ◽

Author(s):

Steve Pereira

Keyword(s):

Pancreatic Cancer ◽

Next Generation Sequencing ◽

Precision Medicine ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Assisted Analysis ◽

Generation Sequencing

Download Full-text

NGSremix: A software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab174 ◽

2021 ◽

Author(s):

Anne Krogh Nøhr ◽

Kristian Hanghøj ◽

Genis Garcia Erill ◽

Zilong Li ◽

Ida Moltke ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genetic Research ◽

Likelihood Estimation ◽

Software Tool ◽

Estimation Methods ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Ngs Data ◽

Generation Sequencing

Abstract Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C ++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.

Download Full-text