A Fun Introductory Command Line Lesson: Next Generation Sequencing Quality Analysis with Emoji!

Methods for analyzing next-generation sequencing data II. From graphical user interface to command line interface

Japanese Journal of Lactic Acid Bacteria ◽

10.4109/jslab.25.166 ◽

2014 ◽

Vol 25 (3) ◽

pp. 166-174

Author(s):

Jianqiang Sun ◽

Min Tang ◽

Tasuku Nishioka ◽

Kentaro Shimizu ◽

Koji Kadota

Keyword(s):

User Interface ◽

Next Generation Sequencing ◽

Graphical User Interface ◽

Next Generation Sequencing Data ◽

Command Line ◽

Next Generation ◽

Sequencing Data ◽

Command Line Interface ◽

Generation Sequencing

Download Full-text

pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive

F1000Research ◽

10.12688/f1000research.18676.1 ◽

2019 ◽

Vol 8 ◽

pp. 532 ◽

Cited By ~ 2

Author(s):

Saket Choudhary

Keyword(s):

Next Generation Sequencing ◽

Research Community ◽

Command Line ◽

Next Generation ◽

Multiple Use ◽

Sequencing Data ◽

Sequence Read Archive ◽

Python Package ◽

Generation Sequencing ◽

Ncbi Sequence Read Archive

The NCBI Sequence Read Archive (SRA) is the primary archive of next-generation sequencing datasets. SRA makes metadata and raw sequencing data available to the research community to encourage reproducibility and to provide avenues for testing novel hypotheses on publicly available data. However, methods to programmatically access this data are limited. We introduce the Python package, pysradb, which provides a collection of command line methods to query and download metadata and data from SRA, utilizing the curated metadata database available through the SRAdb project. We demonstrate the utility of pysradb on multiple use cases for searching and downloading SRA datasets. It is available freely at https://github.com/saketkc/pysradb.

Download Full-text

A highly efficient method for extracting next-generation sequencing quality RNA from adipose tissue of recalcitrant animal species

Journal of Cellular Physiology ◽

10.1002/jcp.25951 ◽

2017 ◽

Vol 233 (3) ◽

pp. 1971-1974 ◽

Cited By ~ 2

Author(s):

Davinder Sharma ◽

Naresh Golla ◽

Dheer Singh ◽

Suneel K. Onteru

Keyword(s):

Adipose Tissue ◽

Next Generation Sequencing ◽

Efficient Method ◽

Animal Species ◽

Next Generation ◽

Highly Efficient ◽

Sequencing Quality ◽

Generation Sequencing

Download Full-text

Effects of Improved DNA Integrity by Punch From Tissue Blocks as Compared to Pinpoint Extraction From Unstained Slides on Next-Generation Sequencing Quality Metrics

American Journal of Clinical Pathology ◽

10.1093/ajcp/aqz014 ◽

2019 ◽

Vol 152 (1) ◽

pp. 27-35 ◽

Cited By ~ 1

Author(s):

Diana Morlote ◽

Karen M Janowski ◽

Rance C Siniard ◽

Rong Jun Guo ◽

Thomas Winokur ◽

...

Keyword(s):

Next Generation Sequencing ◽

Quality Metrics ◽

Dna Integrity ◽

Next Generation ◽

Sequencing Quality ◽

Generation Sequencing

Download Full-text

Compression of next-generation sequencing quality scores using memetic algorithm

BMC Bioinformatics ◽

10.1186/1471-2105-15-s15-s10 ◽

2014 ◽

Vol 15 (Suppl 15) ◽

pp. S10 ◽

Cited By ~ 10

Author(s):

Jiarui Zhou ◽

Zhen Ji ◽

Zexuan Zhu ◽

Shan He

Keyword(s):

Next Generation Sequencing ◽

Memetic Algorithm ◽

Next Generation ◽

Sequencing Quality ◽

Generation Sequencing

Download Full-text

easyfm: An easy software suite for file manipulation of Next Generation Sequencing data on desktops

10.22541/au.163845474.49811073/v1 ◽

2021 ◽

Author(s):

Hyungtaek Jung ◽

Brendan Jeon ◽

Daniel Ortiz-Barrientos

Keyword(s):

Next Generation Sequencing ◽

Life Sciences ◽

Next Generation Sequencing Data ◽

Command Line ◽

Next Generation ◽

Web Based ◽

File Formats ◽

Wide Range ◽

Ngs Data ◽

Generation Sequencing

Storing and manipulating Next Generation Sequencing (NGS) file formats for understanding biological phenomena is an essential but difficult task in the life sciences. Yet, most methods for analysing NGS data require complex command-line tools in high-performance computing (HPC) or web-based servers and have not yet been implemented in comprehensive, easy-to-use software. Here we present easyfm (easy file manipulation), a free standalone Graphical User Interface (GUI) software with Python support that can be used to facilitate the rapid discovery of target sequences (or user’s interest) in NGS datasets for novice users (more accessible to biologists). It enables them to perform end-to-end reproducible data analyses using a desktop application (Windows, Mac and Linux). Unlike existing tools, the GUI-based easyfm is not dependent on any HPC system and can be operated without an internet connection. For user-friendliness and convenience, easyfm was developed with four work modules and a secondary GUI window, covering different aspects of NGS data analysis, including post-processing, filtering, format conversion, generating results, real-time log, and help. In combination with the executable tools (BLAST+ and BLAT) and Python, easyfm allows the user to set analysis parameters, select/extract regions of interest, examine the input and output results, and convert to a wide range of file formats. To help augment the functionality of existing web-based and command-line tools, easyfm, a self-contained program, comes with extensive documentation (https://github.com/TaekAndBrendan/easyfm). This specific benefit allows easyfm to seamlessly integrate visual and interactive representations of NGS files, supporting a wider scope of bioinformatics applications in the life sciences.

Download Full-text

A Comparison of Next Generation Sequencing Quality Metrics in Residual Cytology Material, Cell Blocks and Surgical Biopsy Specimens

Journal of the American Society of Cytopathology ◽

10.1016/j.jasc.2018.06.010 ◽

2018 ◽

Vol 7 (5) ◽

pp. S85

Author(s):

Deepu Alex ◽

Sumit Middha ◽

Jason Hwee ◽

Oscar Lin

Keyword(s):

Next Generation Sequencing ◽

Quality Metrics ◽

Next Generation ◽

Surgical Biopsy ◽

Biopsy Specimens ◽

Sequencing Quality ◽

Generation Sequencing

Download Full-text

ForestQC: quality control on genetic variants from next-generation sequencing data using random forest

10.1101/444828 ◽

2018 ◽

Cited By ~ 2

Author(s):

Jiajin Li ◽

Brandon Jew ◽

Lingyu Zhan ◽

Sungoo Hwang ◽

Giovanni Coppola ◽

...

Keyword(s):

Machine Learning ◽

Quality Control ◽

Next Generation Sequencing ◽

Genetic Variants ◽

Next Generation ◽

Sequencing Data ◽

Machine Learning Approach ◽

Sequencing Quality ◽

Generation Sequencing ◽

Filtering Approach

ABSTRACTNext-generation sequencing technology (NGS) enables discovery of nearly all genetic variants present in a genome. A subset of these variants, however, may have poor sequencing quality due to limitations in sequencing technology or in variant calling algorithms. In genetic studies that analyze a large number of sequenced individuals, it is critical to detect and remove those variants with poor quality as they may cause spurious findings. In this paper, we present a statistical approach for performing quality control on variants identified from NGS data by combining a traditional filtering approach and a machine learning approach. Our method uses information on sequencing quality such as sequencing depth, genotyping quality, and GC contents to predict whether a certain variant is likely to contain errors. To evaluate our method, we applied it to two whole-genome sequencing datasets where one dataset consists of related individuals from families while the other consists of unrelated individuals. Results indicate that our method outperforms widely used methods for performing quality control on variants such as VQSR of GATK by considerably improving the quality of variants to be included in the analysis. Our approach is also very efficient, and hence can be applied to large sequencing datasets. We conclude that combining a machine learning algorithm trained with sequencing quality information and the filtering approach is an effective approach to perform quality control on genetic variants from sequencing data.Author SummaryGenetic disorders can be caused by many types of genetic mutations, including common and rare single nucleotide variants, structural variants, insertions and deletions. Nowadays, next generation sequencing (NGS) technology allows us to identify various genetic variants that are associated with diseases. However, variants detected by NGS might have poor sequencing quality due to biases and errors in sequencing technologies and analysis tools. Therefore, it is critical to remove variants with low quality, which could cause spurious findings in follow-up analyses. Previously, people applied either hard filters or machine learning models for variant quality control (QC), which failed to filter out those variants accurately. Here, we developed a statistical tool, ForestQC, for variant QC by combining a filtering approach and a machine learning approach. We applied ForestQC to one family-based whole genome sequencing (WGS) dataset and one general case-control WGS dataset, to evaluate our method. Results show that ForestQC outperforms widely used methods for variant QC by considerably improving the quality of variants. Also, ForestQC is very efficient and scalable to large-scale sequencing datasets. Our study indicates that combining filtering approaches and machine learning approaches enables effective variant QC.

Download Full-text

nQuire: A Statistical Framework For Ploidy Estimation Using Next Generation Sequencing

10.1101/143537 ◽

2017 ◽

Cited By ~ 1

Author(s):

Clemens L. Weiß ◽

Marina Pais ◽

Liliana M. Cano ◽

Sophien Kamoun ◽

Hernán A. Burbano

Keyword(s):

Next Generation Sequencing ◽

Intraspecific Variation ◽

Ploidy Level ◽

Three Dimensions ◽

Command Line ◽

Next Generation ◽

Statistical Framework ◽

Wide Range ◽

Command Line Tool ◽

Generation Sequencing

AbstractIntraspecific variation in ploidy occurs in a wide range of species including pathogenic and nonpathogenic eukaryotes such as yeasts and oomycetes. Ploidy can be inferred indirectly - without measuring DNA content - from experiments using next-generation sequencing (NGS). We present nQuire, a statistical framework that distinguishes between diploids, triploids and tetraploids using NGS. The command-line tool models the distribution of base frequencies at variable sites using a Gaussian Mixture Model, and uses maximum likelihood to select the most plausible ploidy model. nQuire handles large genomes at high coverage efficiently and uses standard input file formats.We demonstrate the utility of nQuire analyzing individual samples of the pathogenic oomycete Phytophthora infestans and the Baker’s yeast Saccharomyces cerevisiae. Using these organisms we show the dependence between reliability of the ploidy assignment and sequencing depth. Additionally, we employ normalized maximized log-likelihoods generated by nQuire to ascertain ploidy level in a population of samples with ploidy heterogeneity. Using these normalized values we cluster samples in three dimensions using multivariate Gaussian mixtures. The cluster assignments retrieved from a S. cerevisiae population recovered the true ploidy level in over 96% of samples. Finally, we show that nQuire can be used regionally to identify chromosomal aneuploidies.nQuire provides a statistical framework to study organisms with intraspecific variation in ploidy. nQuire is likely to be useful in epidemiological studies of pathogens, artificial selection experiments, and for historical or ancient samples where intact nuclei are not preserved. It is implemented as a stand-alone Linux command line tool in the C programming language and is available at github.com/clwgg/nQuire under the MIT license.

Download Full-text

An efficient method for extracting next‐generation sequencing quality RNA from liver tissue of recalcitrant animal species

Journal of Cellular Physiology ◽

10.1002/jcp.28226 ◽

2019 ◽

Vol 234 (9) ◽

pp. 14405-14412 ◽

Cited By ~ 1

Author(s):

Davinder Sharma ◽

Naresh Golla ◽

Sudhakar Singh ◽

Pankaj Kumar Singh ◽

Dheer Singh ◽

...

Keyword(s):

Next Generation Sequencing ◽

Efficient Method ◽

Liver Tissue ◽

Animal Species ◽

Next Generation ◽

Sequencing Quality ◽

Generation Sequencing

Download Full-text