Communicating Regulatory High Throughput Sequencing Data Using BioCompute Objects

Mapping Intimacies ◽

10.1101/2020.12.07.415059 ◽

2020 ◽

Author(s):

Charles Hadley S. King ◽

Jonathon Keeney ◽

Nuria Guimera ◽

Souvik Das ◽

Brian Fochtman ◽

...

Keyword(s):

High Throughput Sequencing ◽

Biological Data ◽

Sequencing Data ◽

Regulatory Submission ◽

High Throughput Sequencing Data ◽

Analysis Workflow ◽

Regulatory Submissions ◽

High Concordance ◽

Next Generation Sequencing Ngs ◽

Ngs Data

AbstractFor regulatory submissions of next generation sequencing (NGS) data it is vital for the analysis workflow to be robust, reproducible, and understandable. This project demonstrates that the use of the IEEE 2791-2020 Standard, (BioCompute objects [BCO]) enables complete and concise communication of NGS data analysis results. One arm of a clinical trial was replicated using synthetically generated data made to resemble real biological data. Two separate, independent analyses were then carried out using BCOs as the tool for communication of analysis: one to simulate a pharmaceutical regulatory submission to the FDA, and another to simulate the FDA review. The two results were compared and tabulated for concordance analysis: of the 118 simulated patient samples generated, the final results of 117 (99.15%) were in agreement. This high concordance rate demonstrates the ability of a BCO, when a verification kit is included, to effectively capture and clearly communicate NGS analyses within regulatory submissions. BCO promotes transparency and induces reproducibility, thereby reinforcing trust in the regulatory submission process.

Download Full-text

An NGS Workflow Blueprint for DNA Sequencing Data and Its Application in Individualized Molecular Oncology

Cancer Informatics ◽

10.4137/cin.s30793 ◽

2015 ◽

Vol 14s5 ◽

pp. CIN.S30793 ◽

Cited By ~ 2

Author(s):

Jian Li ◽

Aarif Mohamed Nazeer Batcha ◽

Björn Gaining ◽

Ulrich R. Mansmann

Keyword(s):

Data Storage ◽

Individualized Medicine ◽

Integrated Analysis ◽

Sequencing Data ◽

Molecular Oncology ◽

Analysis Workflow ◽

Sequencing Quality ◽

Technical Simplicity ◽

Next Generation Sequencing Ngs ◽

Ngs Data

Next-generation sequencing (NGS) technologies that have advanced rapidly in the past few years possess the potential to classify diseases, decipher the molecular code of related cell processes, identify targets for decision-making on targeted therapy or prevention strategies, and predict clinical treatment response. Thus, NGS is on its way to revolutionize oncology. With the help of NGS, we can draw a finer map for the genetic basis of diseases and can improve our understanding of diagnostic and prognostic applications and therapeutic methods. Despite these advantages and its potential, NGS is facing several critical challenges, including reduction of sequencing cost, enhancement of sequencing quality, improvement of technical simplicity and reliability, and development of semiautomated and integrated analysis workflow. In order to address these challenges, we conducted a literature research and summarized a four-stage NGS workflow for providing a systematic review on NGS-based analysis, explaining the strength and weakness of diverse NGS-based software tools, and elucidating its potential connection to individualized medicine. By presenting this four-stage NGS workflow, we try to provide a minimal structural layout required for NGS data storage and reproducibility.

Download Full-text

MitoSuite: a graphical tool for human mitochondrial genome profiling in massive parallel sequencing

PeerJ ◽

10.7717/peerj.3406 ◽

2017 ◽

Vol 5 ◽

pp. e3406 ◽

Cited By ~ 12

Author(s):

Koji Ishiya ◽

Shintaroh Ueda

Keyword(s):

Mitochondrial Genome ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Coverage ◽

Graphical Tool ◽

Genome Variations ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Human Mitochondrial Genome

Recent rapid advances in high-throughput, next-generation sequencing (NGS) technologies have promoted mitochondrial genome studies in the fields of human evolution, medical genetics, and forensic casework. However, scientists unfamiliar with computer programming often find it difficult to handle the massive volumes of data that are generated by NGS. To address this limitation, we developed MitoSuite, a user-friendly graphical tool for analysis of data from high-throughput sequencing of the human mitochondrial genome. MitoSuite generates a visual report on NGS data with simple mouse operations. Moreover, it analyzes high-coverage sequencing data but runs on a stand-alone computer, without the need for file upload. Therefore, MitoSuite offers outstanding usability for handling massive NGS data, and is ideal for evolutionary, clinical, and forensic studies on the human mitochondrial genome variations. It is freely available for download from the website https://mitosuite.com.

Download Full-text

Streamlining data-intensive biology with workflow systems

GigaScience ◽

10.1093/gigascience/giaa140 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Taylor Reiter ◽

Phillip T Brooks† ◽

Luiz Irber† ◽

Shannon E K Joslin† ◽

Charles M Reid† ◽

...

Keyword(s):

Data Analysis ◽

Large Scale ◽

High Throughput Sequencing ◽

Biological Data ◽

Data Generation ◽

Sequencing Data ◽

Workflow Systems ◽

Data Intensive ◽

High Throughput Sequencing Data ◽

Project Data

Abstract As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. Data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. We present these practices in the context of high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field.

Download Full-text

High throughput sequencing data analysis workflow: mtDNA variant detection and identification of STR/Y-STR alleles and iso-alleles

Forensic Science International Genetics Supplement Series ◽

10.1016/j.fsigss.2019.10.121 ◽

2019 ◽

Vol 7 (1) ◽

pp. 639-640

Author(s):

C.S. Liu ◽

L. Luo ◽

J. McGuigan ◽

J. Wu ◽

J. Todd ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Detection And Identification ◽

Analysis Workflow ◽

Variant Detection ◽

Sequencing Data Analysis

Download Full-text

G-DNA – a highly efficient multi-GPU/MPI tool for aligning nucleotide reads

Bulletin of the Polish Academy of Sciences Technical Sciences ◽

10.2478/bpasts-2013-0106 ◽

2013 ◽

Vol 61 (4) ◽

pp. 989-992 ◽

Cited By ~ 4

Author(s):

W. Frohmberg ◽

M. Kierzynka ◽

J. Blazewicz ◽

P. Gawron ◽

P. Wojciechowski

Keyword(s):

Graphics Processing Units ◽

High Throughput Sequencing ◽

Pairwise Alignment ◽

Biological Data ◽

Sequencing Data ◽

Multiple Gpus ◽

Parallel Solution ◽

The Past ◽

High Throughput Sequencing Data ◽

Graphics Processing

Abstract DNA/RNA sequencing has recently become a primary way researchers generate biological data for further analysis. Assembling algorithms are an integral part of this process. However, some of them require pairwise alignment to be applied to a great deal of reads. Although several efficient alignment tools have been released over the past few years, including those taking advantage of GPUs (Graphics Processing Units), none of them directly targets high-throughput sequencing data. As a result, a need arose to create software that could handle such data as effectively as possible. G-DNA (GPU-based DNA aligner) is the first highly parallel solution that has been optimized to process nucleotide reads (DNA/RNA) from modern sequencing machines. Results show that the software reaches up to 89 GCUPS (Giga Cell Updates Per Second) on a single GPU and as a result it is the fastest tool in its class. Moreover, it scales up well on multiple GPUs systems, including MPI-based computational clusters, where its performance is counted in TCUPS (Tera CUPS).

Download Full-text

Faculty Opinions recommendation of Coalescent Inference Using Serially Sampled, High-Throughput Sequencing Data from Intrahost HIV Infection.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726132071.793531014 ◽

2017 ◽

Author(s):

Sarah Rowland-Jones ◽

Sophie Andrews

Keyword(s):

Hiv Infection ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

BlindCall: ultra-fast base-calling of high-throughput sequencing data by blind deconvolution

Bioinformatics ◽

10.1093/bioinformatics/btu010 ◽

2014 ◽

Vol 30 (9) ◽

pp. 1214-1219 ◽

Cited By ~ 6

Author(s):

C. Ye ◽

C. Hsiao ◽

H. Corrada Bravo

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Blind Deconvolution ◽

Sequencing Data ◽

Base Calling ◽

High Throughput Sequencing Data

Download Full-text

Improvement, identification, and target prediction for miRNAs in the porcine genome by using massive, public high-throughput sequencing data

Journal of Animal Science ◽

10.1093/jas/skab018 ◽

2021 ◽

Vol 99 (2) ◽

Author(s):

Yuhua Fu ◽

Pengyu Fan ◽

Lu Wang ◽

Ziqiang Shu ◽

Shilin Zhu ◽

...

Keyword(s):

High Throughput Sequencing ◽

Target Genes ◽

Target Prediction ◽

Large Data ◽

Sequencing Data ◽

Regulate Gene Expression ◽

High Throughput Sequencing Data ◽

Annotation Information ◽

Public Data ◽

Broad Variety

Abstract Despite the broad variety of available microRNA (miRNA) research tools and methods, their application to the identification, annotation, and target prediction of miRNAs in nonmodel organisms is still limited. In this study, we collected nearly all public sRNA-seq data to improve the annotation for known miRNAs and identify novel miRNAs that have not been annotated in pigs (Sus scrofa). We newly annotated 210 mature sequences in known miRNAs and found that 43 of the known miRNA precursors were problematic due to redundant/missing annotations or incorrect sequences. We also predicted 811 novel miRNAs with high confidence, which was twice the current number of known miRNAs for pigs in miRBase. In addition, we proposed a correlation-based strategy to predict target genes for miRNAs by using a large amount of sRNA-seq and RNA-seq data. We found that the correlation-based strategy provided additional evidence of expression compared with traditional target prediction methods. The correlation-based strategy also identified the regulatory pairs that were controlled by nonbinding sites with a particular pattern, which provided abundant complementarity for studying the mechanism of miRNAs that regulate gene expression. In summary, our study improved the annotation of known miRNAs, identified a large number of novel miRNAs, and predicted target genes for all pig miRNAs by using massive public data. This large data-based strategy is also applicable for other nonmodel organisms with incomplete annotation information.

Download Full-text

deepBase v3.0: expression atlas and interactive analysis of ncRNAs from thousands of deep-sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkaa1039 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D877-D883

Author(s):

Fangzhou Xie ◽

Shurong Liu ◽

Junhao Wang ◽

Jiajia Xuan ◽

Xiaoqin Zhang ◽

...

Keyword(s):

High Throughput Sequencing ◽

Clinical Information ◽

Sequencing Data ◽

Normal Tissues ◽

Interactive Analysis ◽

High Throughput Sequencing Data ◽

Expression Atlas ◽

Expression Evolution ◽

Noninvasive Biomarkers ◽

Cancer Tissues

Abstract Eukaryotic genomes encode thousands of small and large non-coding RNAs (ncRNAs). However, the expression, functions and evolution of these ncRNAs are still largely unknown. In this study, we have updated deepBase to version 3.0 (deepBase v3.0, http://rna.sysu.edu.cn/deepbase3/index.html), an increasingly popular and openly licensed resource that facilitates integrative and interactive display and analysis of the expression, evolution, and functions of various ncRNAs by deeply mining thousands of high-throughput sequencing data from tissue, tumor and exosome samples. We updated deepBase v3.0 to provide the most comprehensive expression atlas of small RNAs and lncRNAs by integrating ∼67 620 data from 80 normal tissues and ∼50 cancer tissues. The extracellular patterns of various ncRNAs were profiled to explore their applications for discovery of noninvasive biomarkers. Moreover, we constructed survival maps of tRNA-derived RNA Fragments (tRFs), miRNAs, snoRNAs and lncRNAs by analyzing >45 000 cancer sample data and corresponding clinical information. We also developed interactive webs to analyze the differential expression and biological functions of various ncRNAs in ∼50 types of cancers. This update is expected to provide a variety of new modules and graphic visualizations to facilitate analyses and explorations of the functions and mechanisms of various types of ncRNAs.

Download Full-text

Improving gene function predictions using independent transcriptional components

Nature Communications ◽

10.1038/s41467-021-21671-w ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Carlos G. Urzúa-Traslaviña ◽

Vincent C. Leeuwenburgh ◽

Arkajyoti Bhattacharya ◽

Stefan Loipfinger ◽

Marcel A. T. M. van Vugt ◽

...

Keyword(s):

Independent Component Analysis ◽

High Throughput Sequencing ◽

Principal Component ◽

Component Analysis ◽

Independent Component ◽

Sequencing Data ◽

New Members ◽

High Throughput Sequencing Data ◽

Gene Sets ◽

Functional Understanding

AbstractThe interpretation of high throughput sequencing data is limited by our incomplete functional understanding of coding and non-coding transcripts. Reliably predicting the function of such transcripts can overcome this limitation. Here we report the use of a consensus independent component analysis and guilt-by-association approach to predict over 23,000 functional groups comprised of over 55,000 coding and non-coding transcripts using publicly available transcriptomic profiles. We show that, compared to using Principal Component Analysis, Independent Component Analysis-derived transcriptional components enable more confident functionality predictions, improve predictions when new members are added to the gene sets, and are less affected by gene multi-functionality. Predictions generated using human or mouse transcriptomic data are made available for exploration in a publicly available web portal.

Download Full-text