pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive

Research Community ◽

Command Line ◽

Next Generation ◽

Multiple Use ◽

Sequencing Data ◽

Sequence Read Archive ◽

Python Package ◽

Generation Sequencing ◽

Ncbi Sequence Read Archive

AbstractNCBIs Sequence Read Archive (SRA) is the primary archive of next-generation sequencing datasets. SRA makes metadata and raw sequencing data available to the research community to encourage reproducibility, and to provide avenues for testing novel hypotheses on publicly available data. However, existing methods to programmatically access these data are limited. We introduce a Python packagepysradbthat provides a collection of command line methods to query and download metadata and data from SRA utilizing the curated metadata database available through the SRAdb project. We demonstrate the utility ofpysradbon multiple use cases for searching and downloading SRA datasets. It is available freely athttps://github.com/saketkc/pysradb.

pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive

F1000Research ◽

10.12688/f1000research.18676.1 ◽

2019 ◽

Vol 8 ◽

pp. 532 ◽

Cited By ~ 2

Author(s):

Saket Choudhary

Keyword(s):

Research Community ◽

Command Line ◽

Next Generation ◽

Multiple Use ◽

Sequencing Data ◽

Sequence Read Archive ◽

Python Package ◽

Generation Sequencing ◽

Ncbi Sequence Read Archive

The NCBI Sequence Read Archive (SRA) is the primary archive of next-generation sequencing datasets. SRA makes metadata and raw sequencing data available to the research community to encourage reproducibility and to provide avenues for testing novel hypotheses on publicly available data. However, methods to programmatically access this data are limited. We introduce the Python package, pysradb, which provides a collection of command line methods to query and download metadata and data from SRA, utilizing the curated metadata database available through the SRAdb project. We demonstrate the utility of pysradb on multiple use cases for searching and downloading SRA datasets. It is available freely at https://github.com/saketkc/pysradb.

Methods for analyzing next-generation sequencing data II. From graphical user interface to command line interface

Japanese Journal of Lactic Acid Bacteria ◽

10.4109/jslab.25.166 ◽

2014 ◽

Vol 25 (3) ◽

pp. 166-174

Author(s):

Jianqiang Sun ◽

Min Tang ◽

Tasuku Nishioka ◽

Kentaro Shimizu ◽

Koji Kadota

Keyword(s):

User Interface ◽

Graphical User Interface ◽

Command Line ◽

Next Generation ◽

Sequencing Data ◽

Command Line Interface ◽

Faculty Opinions recommendation of VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718272765.793499663 ◽

2014 ◽

Author(s):

Gary Bader ◽

Mohamed Helmy

Keyword(s):

Network Analysis ◽

Cancer Genes ◽

Next Generation ◽

Sequencing Data ◽

Faculty Opinions recommendation of Bioinformatory-assisted analysis of next-generation sequencing data for precision medicine in pancreatic cancer.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.727775566.793536095 ◽

2017 ◽

Author(s):

Steve Pereira

Keyword(s):

Pancreatic Cancer ◽

Precision Medicine ◽

Next Generation ◽

Sequencing Data ◽

Assisted Analysis ◽

NGSremix: A software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab174 ◽

2021 ◽

Author(s):

Anne Krogh Nøhr ◽

Kristian Hanghøj ◽

Genis Garcia Erill ◽

Zilong Li ◽

Ida Moltke ◽

...

Keyword(s):

Genetic Research ◽

Likelihood Estimation ◽

Software Tool ◽

Estimation Methods ◽

Next Generation ◽

Sequencing Data ◽

Ngs Data ◽

Abstract Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C ++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.

recoup: flexible and versatile signal visualization from next generation sequencing

BMC Bioinformatics ◽

10.1186/s12859-020-03902-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Panagiotis Moulos

Keyword(s):

Special Focus ◽

Next Generation ◽

Sequencing Data ◽

User Friendliness ◽

Computational Environment ◽

Level Data ◽

Data Signal ◽

Abstract Background The relentless continuing emergence of new genomic sequencing protocols and the resulting generation of ever larger datasets continue to challenge the meaningful summarization and visualization of the underlying signal generated to answer important qualitative and quantitative biological questions. As a result, the need for novel software able to reliably produce quick, comprehensive, and easily repeatable genomic signal visualizations in a user-friendly manner is rapidly re-emerging. Results recoup is a Bioconductor package for quick, flexible, versatile, and accurate visualization of genomic coverage profiles generated from Next Generation Sequencing data. Coupled with a database of precalculated genomic regions for multiple organisms, recoup offers processing mechanisms for quick, efficient, and multi-level data interrogation with minimal effort, while at the same time creating publication-quality visualizations. Special focus is given on plot reusability, reproducibility, and real-time exploration and formatting options, operations rarely supported in similar visualization tools in a profound way. recoup was assessed using several qualitative user metrics and found to balance the tradeoff between important package features, including speed, visualization quality, overall friendliness, and the reusability of the results with minimal additional calculations. Conclusion While some existing solutions for the comprehensive visualization of NGS data signal offer satisfying results, they are often compromised regarding issues such as effortless tracking of processing and preparation steps under a common computational environment, visualization quality and user friendliness. recoup is a unique package presenting a balanced tradeoff for a combination of assessment criteria while remaining fast and friendly.