The RNA workbench: best practices for RNA and high-throughput sequencing bioinformatics in Galaxy

Abstract Background Population genetic studies of humans make increasing use of high-throughput sequencing in order to capture diversity in an unbiased way. There is an abundance of sequencing technologies, bioinformatic tools and the available genomes are increasing in number. Studies have evaluated and compared some of these technologies and tools, such as the Genome Analysis Toolkit (GATK) and its “Best Practices” bioinformatic pipelines. However, studies often focus on a few genomes of Eurasian origin in order to detect technical issues. We instead surveyed the use of the GATK tools and established a pipeline for processing high coverage full genomes from a diverse set of populations, including Sub-Saharan African groups, in order to reveal challenges from human diversity and stratification. Results We surveyed 29 studies using high-throughput sequencing data, and compared their strategies for data pre-processing and variant calling. We found that processing of data is very variable across studies and that the GATK “Best Practices” are seldom followed strictly. We then compared three versions of a GATK pipeline, differing in the inclusion of an indel realignment step and with a modification of the base quality score recalibration step. We applied the pipelines on a diverse set of 28 individuals. We compared the pipelines in terms of count of called variants and overlap of the callsets. We found that the pipelines resulted in similar callsets, in particular after callset filtering. We also ran one of the pipelines on a larger dataset of 179 individuals. We noted that including more individuals at the joint genotyping step resulted in different counts of variants. At the individual level, we observed that the average genome coverage was correlated to the number of variants called. Conclusions We conclude that applying the GATK “Best Practices” pipeline, including their recommended reference datasets, to underrepresented populations does not lead to a decrease in the number of called variants compared to alternative pipelines. We recommend to aim for coverage of > 30X if identifying most variants is important, and to work with large sample sizes at the variant calling stage, also for underrepresented individuals and populations.

Download Full-text

Decision letter for "High throughput sequencing reveals high specificity of TNFAIP3 mutations in ocular adnexal marginal zone B-cell lymphomas"

10.1002/hon.2718/v2/decision1 ◽

2020 ◽

Keyword(s):

B Cell ◽

High Throughput ◽

High Throughput Sequencing ◽

Marginal Zone ◽

High Specificity ◽

B Cell Lymphomas

Download Full-text

Review for "High throughput sequencing reveals high specificity of TNFAIP3 mutations in ocular adnexal marginal zone B-cell lymphomas"

10.1002/hon.2718/v1/review1 ◽

2019 ◽

Keyword(s):

B Cell ◽

High Throughput ◽

High Throughput Sequencing ◽

Marginal Zone ◽

High Specificity ◽

B Cell Lymphomas

Download Full-text

Reuse Recipe Document for: Directional high-throughput sequencing of RNAs without gene-specific primers

10.23942/biotechniques.1559152589000 ◽

2019 ◽

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Specific Primers

Download Full-text

Clinical Application of High-throughput Sequencing Technology for the Diagnosis of Patients With Severe Infection

Case Medical Research ◽

10.31525/ct1-nct04217252 ◽

2020 ◽

Author(s):

Keyword(s):

Clinical Application ◽

High Throughput ◽

High Throughput Sequencing ◽

Severe Infection ◽

Sequencing Technology

Download Full-text

Faculty Opinions recommendation of High-throughput sequencing of B- and T-lymphocyte antigen receptors in hematology.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718037227.793480385 ◽

2013 ◽

Author(s):

Jan Starý ◽

Michaela Kotrová

Keyword(s):

T Lymphocyte ◽

High Throughput ◽

High Throughput Sequencing ◽

Antigen Receptors ◽

Lymphocyte Antigen

Download Full-text

Faculty Opinions recommendation of Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718061751.793481778 ◽

2013 ◽

Author(s):

Hans Lehrach

Keyword(s):

Direct Measurement ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Instrument

Download Full-text

Faculty Opinions recommendation of Genetic complexity in hypertrophic cardiomyopathy revealed by high-throughput sequencing.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718240415.793489685 ◽

2014 ◽

Author(s):

WH Wilson Tang

Keyword(s):

Hypertrophic Cardiomyopathy ◽

High Throughput ◽

High Throughput Sequencing ◽

Genetic Complexity

Download Full-text

Faculty Opinions recommendation of Coalescent Inference Using Serially Sampled, High-Throughput Sequencing Data from Intrahost HIV Infection.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726132071.793531014 ◽

2017 ◽

Author(s):

Sarah Rowland-Jones ◽

Sophie Andrews

Keyword(s):

Hiv Infection ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

DETECTION OF ALFACORONAVIRUSES, BETACORONAVIRUSES AND ASTROVIRUSES IN BAT FECAL SAMPLES FROM MOSCOW REGION

Molecular Diagnostics and Biosafety – 2020. Russian national scientific and practical conference with international participation (October, 6–8, 2020): Conference Proceedings ◽

10.36233/978-5-9900432-9-9-42 ◽

2020 ◽

Author(s):

E.V. Korneenko ◽

◽

А.E. Samoilov ◽

I.V. Artyushin ◽

M.V. Safonova ◽

...

Keyword(s):

High Throughput ◽

Viral Genome ◽

High Throughput Sequencing ◽

Pcr Amplification ◽

Moscow Region ◽

Fecal Samples ◽

Genus Level ◽

Zoonotic Viruses ◽

Reservoir Hosts ◽

Blastn Search

In our study we analyzed viral RNA in bat fecal samples from Moscow region (Zvenigorod district) collected in 2015. To detect various virus families and genera in bat fecal samples we used PCR amplification of viral genome fragments, followed by high-throughput sequencing. Blastn search of unassembled reads revealed the presence of viruses from families Astroviridae, Coronaviridae and Herpesviridae. Assembly using SPAdes 3.14 yields contigs of length 460–530 b.p. which correspond to genome fragments of Coronaviridae and Astroviridae. The taxonomy of coronaviruses has been determined to the genus level. We also showed that one bat can be a reservoir of several virus genuses. Thus, the bats in the Moscow region were confirmed as reservoir hosts for potentially zoonotic viruses.

Download Full-text