On the optimal trimming of high-throughput mRNA sequence data

Mapping Intimacies ◽

10.1101/000422 ◽

2013 ◽

Cited By ~ 1

Author(s):

Matthew D. MacManes

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Sequence Data ◽

Deep Understanding ◽

Sequencing Technologies ◽

Software Packages ◽

Sequencing Quality ◽

Functional Biology ◽

Genome Level ◽

Optimal Strength

AbstractThe widespread and rapid adoption of high-throughput sequencing technologies has afforded researchers the opportunity to gain a deep understanding of genome level processes that underlie evolutionary change, and perhaps more importantly, the links between genotype and phenotype. In particular, researchers interested in functional biology and adaptation have used these technologies to sequence mRNA transcriptomes of specific tissues, which in turn are often compared to other tissues, or other individuals with different phenotypes. While these techniques are extremely powerful, careful attention to data quality is required. In particular, because high-throughput sequencing is more error-prone than traditional Sanger sequencing, quality trimming of sequence reads should be an important step in all data processing pipelines. While several software packages for quality trimming exist, no general guidelines for the specifics of trimming have been developed. Here, using empirically derived sequence data, I provide general recommendations regarding the optimal strength of trimming, specifically in mRNA-Seq studies. Although very aggressive quality trimming is common, this study suggests that a more gentle trimming, specifically of those nucleotides whose Phred score <2 or <5, is optimal for most studies across a wide variety of metrics.

GrigoraSNPs: Optimized HTS DNA Forensic SNP Analysis

10.1101/173716 ◽

2017 ◽

Cited By ~ 3

Author(s):

Darrell O. Ricke ◽

Anna Shcherbina ◽

Adam Michaleas ◽

Philip Fremont-Smith

Keyword(s):

High Throughput ◽

Tandem Repeats ◽

High Throughput Sequencing ◽

Dna Analysis ◽

Sequence Data ◽

Snp Analysis ◽

Analysis Pipeline ◽

Sequencing Technologies ◽

High Throughput Dna Sequencing

AbstractHigh throughput DNA sequencing technologies enable improved characterization of forensic DNA samples enabling greater insights into DNA contributor(s). Current DNA forensics techniques rely upon allele sizing of short tandem repeats by capillary electrophoresis. High throughput sequencing enables forensic sample characterizations for large numbers of single nucleotide polymorphism loci. The slowest computational component of the DNA forensics analysis pipeline is the characterization of raw sequence data. This paper optimizes the SNP calling module of the DNA analysis pipeline with runtime results that scale linearly with the number of HTS sequences (patent pending)[1]. GrigoraSNPs can analyze 100 million reads in less than 5 minutes using 3 threads on a 4.0 GHz Intel i7-6700K laptop CPU.

Advances in high throughput DNA sequence data compression

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016300021 ◽

2016 ◽

Vol 14 (03) ◽

pp. 1630002 ◽

Cited By ~ 13

Author(s):

Muhammad Sardaraz ◽

Muhammad Tahir ◽

Ataul Aziz Ikram

Keyword(s):

Data Compression ◽

Dna Sequence ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequence Data ◽

Sequencing Data ◽

Research Directions ◽

Compression Algorithms ◽

Dna Sequence Data ◽

Sequencing Technologies

Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.

Assessing genotyping errors in mammalian museum study skins using high-throughput genotyping-by-sequencing

Conservation Genetics Resources ◽

10.1007/s12686-021-01213-8 ◽

2021 ◽

Author(s):

Stella C. Yuan ◽

Eric Malekos ◽

Melissa T. R. Hawkins

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Massively Parallel Sequencing ◽

Massively Parallel ◽

Museum Specimens ◽

Museum Specimen ◽

Genotyping Errors ◽

Allelic Dropout ◽

Parallel Sequencing ◽

Sequencing Technologies

AbstractThe use of museum specimens held in natural history repositories for population and conservation genetic research is increasing in tandem with the use of massively parallel sequencing technologies. Short Tandem Repeats (STRs), or microsatellite loci, are commonly used genetic markers in wildlife and population genetic studies. However, they traditionally suffered from a host of issues including length homoplasy, high costs, low throughput, and difficulties in reproducibility across laboratories. Massively parallel sequencing technologies can address these problems, but the incorporation of museum specimen derived DNA suffers from significant fragmentation and exogenous DNA contamination. Combatting these issues requires extra measures of stringency in the lab and during data analysis, yet there have not been any high-throughput sequencing studies evaluating microsatellite allelic dropout from museum specimen extracted DNA. In this study, we evaluate genotyping errors derived from mammalian museum skin DNA extracts for previously characterized microsatellites across PCR replicates utilizing high-throughput sequencing. We found it useful to classify samples based on DNA concentration, which determined the rate by which genotypes were accurately recovered. Longer microsatellites performed worse in all museum specimens. Allelic dropout rates across loci were dependent on sample quantity, with high concentration museum specimens performing as well and recovering quality metrics nearly as high as the frozen tissue sample. Based on our results, we provide a set of best practices for quality assurance and incorporation of reliable genotypes from museum specimens.

Detection of novel allelic variations in soybean mutant population using Tilling by Sequencing

10.1101/711440 ◽

2019 ◽

Author(s):

Reneth Millas ◽

Mary Espina ◽

CM Sabbir Ahmed ◽

Angelina Bernardini ◽

Ekundayo Adeleke ◽

...

Keyword(s):

Fatty Acid ◽

High Throughput ◽

Reverse Genetics ◽

High Throughput Sequencing ◽

Fatty Acid Biosynthesis ◽

Induced Mutations ◽

Mutant Population ◽

Sequencing Technologies ◽

Allelic Variations ◽

Tilling By Sequencing

ABSTRACTOne of the most important tools in genetic improvement is mutagenesis, which is a useful tool to induce genetic and phenotypic variation for trait improvement and discovery of novel genes. JTN-5203 (MG V) mutant population was generated using an induced ethyl methane sulfonate (EMS) mutagenesis and was used for detection of induced mutations in FAD2-1A and FAD2-1B genes using reverse genetics approach. Optimum concentration of EMS was used to treat 15,000 bulk JTN-5203 seeds producing 1,820 M2 population. DNA was extracted, normalized, and pooled from these individuals. Specific primers were designed from FAD2-1A and FAD2-1B genes that are involved in the fatty acid biosynthesis pathway for further analysis using next-generation sequencing. High throughput mutation discovery through TILLING-by-Sequencing approach was used to detect novel allelic variations in this population. Several mutations and allelic variations with high impacts were detected for FAD2-1A and FAD2-1B. This includes GC to AT transition mutations in FAD2-1A (20%) and FAD2-1B (69%). Mutation density for this population is estimated to be about 1/136kb. Through mutagenesis and high-throughput sequencing technologies, novel alleles underlying the mutations observed in mutants with reduced polyunsaturated fatty acids will be identified, and these mutants can be further used in breeding soybean lines with improved fatty acid profile, thereby developing heart-healthy-soybeans.

Adenosine-to-inosine RNA editing may be implicated in human pathogenesis

Bulletin of Russian State Medical University ◽

10.24075/brsmu.2019.028 ◽

2019 ◽

pp. 22-25

Author(s):

AA Kliuchnikova ◽

SA Moshkovskii

Keyword(s):

Immune Responses ◽

Rna Editing ◽

High Throughput ◽

High Throughput Sequencing ◽

Common Mechanism ◽

Adenosine Deaminases ◽

Human Transcriptome ◽

Sequencing Technologies

Adenosine-to-inosine (A-to-I) RNA editing is a common mechanism of post-transcriptional modification in many metazoans including vertebrates; the process is catalyzed by adenosine deaminases acting on RNA (ADARs). Using high-throughput sequencing technologies resulted in finding thousands of RNA editing sites throughout the human transcriptome however, their functions are still poorly understood. The aim of this brief review is to draw attention of clinicians and biomedical researchers to ADAR-mediated RNA editing phenomenon and its possible implication in development of neuropathologies, antiviral immune responses and cancer.

Overview of High Throughput Sequencing Technologies to Elucidate Molecular Pathways in Cardiovascular Diseases

Circulation Research ◽

10.1161/circresaha.113.300939 ◽

2013 ◽

Vol 112 (12) ◽

pp. 1613-1623 ◽

Cited By ~ 49

Author(s):

Jared M. Churko ◽

Gary L. Mantalas ◽

Michael P. Snyder ◽

Joseph C. Wu

Keyword(s):

Cardiovascular Diseases ◽

High Throughput ◽

High Throughput Sequencing ◽

Molecular Pathways ◽

Sequencing Technologies

Minirmd: accurate and fast duplicate removal tool for short reads via multiple minimizers

Bioinformatics ◽

10.1093/bioinformatics/btaa915 ◽

2020 ◽

Author(s):

Yuansheng Liu ◽

Xiaocai Zhang ◽

Quan Zou ◽

Xiangxiang Zeng

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

De Novo ◽

Supplementary Information ◽

Supplementary Data ◽

Complementary Strand ◽

Short Reads ◽

Sequencing Technologies ◽

Computational Resources

Abstract Summary Removing duplicate and near-duplicate reads, generated by high-throughput sequencing technologies, is able to reduce computational resources in downstream applications. Here we develop minirmd, a de novo tool to remove duplicate reads via multiple rounds of clustering using different length of minimizer. Experiments demonstrate that minirmd removes more near-duplicate reads than existing clustering approaches and is faster than existing multi-core tools. To the best of our knowledge, minirmd is the first tool to remove near-duplicates on reverse-complementary strand. Availability and implementation https://github.com/yuansliu/minirmd. Supplementary information Supplementary data are available at Bioinformatics online.

Comprehensive pathogen detection in sera of Kawasaki disease patients by high-throughput sequencing: a retrospective exploratory study

BMC Pediatrics ◽

10.1186/s12887-020-02380-7 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Yuka Torii ◽

Kazuhiro Horiba ◽

Satoshi Hayano ◽

Taichi Kato ◽

Takako Suzuki ◽

...

Keyword(s):

Kawasaki Disease ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequence Data ◽

Early Stage ◽

Rna Virus ◽

Systemic Vasculitis ◽

Human Herpesvirus ◽

Acute Stage ◽

Serum Samples

Abstract Background Kawasaki disease (KD) is an idiopathic systemic vasculitis that predominantly damages coronary arteries in children. Various pathogens have been investigated as triggers for KD, but no definitive causative pathogen has been determined. As KD is diagnosed by symptoms, several days are needed for diagnosis. Therefore, at the time of diagnosis of KD, the pathogen of the trigger may already be diminished. The aim of this study was to explore comprehensive pathogens in the sera at the acute stage of KD using high-throughput sequencing (HTS). Methods Sera of 12 patients at an extremely early stage of KD and 12 controls were investigated. DNA and RNA sequences were read separately using HTS. Sequence data were imported into the home-brew meta-genomic analysis pipeline, PATHDET, to identify the pathogen sequences. Results No RNA virus reads were detected in any KD case except for that of equine infectious anemia, which is known as a contaminant of commercial reverse transcriptase. Concerning DNA viruses, human herpesvirus 6B (HHV-6B, two cases) and Anelloviridae (eight cases) were detected among KD cases as well as controls. Multiple bacterial reads were obtained from KD and controls. Bacteria of the genera Acinetobacter, Pseudomonas, Delfita, Roseomonas, and Rhodocyclaceae appeared to be more common in KD sera than in the controls. Conclusion No single pathogen was identified in serum samples of patients at the acute phase of KD. With multiple bacteria detected in the serum samples, it is difficult to exclude the possibility of contamination; however, it is possible that these bacteria might stimulate the immune system and induce KD.

Grafting-responsive miRNAs in cucumber and pumpkin seedlings identified by high-throughput sequencing at whole genome level

Physiologia Plantarum ◽

10.1111/ppl.12122 ◽

2013 ◽

Vol 151 (4) ◽

pp. 406-422 ◽

Cited By ~ 25

Author(s):

Chaohan Li ◽

Yansu Li ◽

Longqiang Bai ◽

Tieyao Zhang ◽

Chaoxing He ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Whole Genome ◽

Genome Level

Deciphering transcriptional control mechanisms in hematopoiesis—The impact of high-throughput sequencing technologies

Experimental Hematology ◽

10.1016/j.exphem.2011.07.005 ◽

2011 ◽

Vol 39 (10) ◽

pp. 961-968 ◽

Cited By ~ 5

Author(s):

Nicola K. Wilson ◽

Marloes R. Tijssen ◽

Berthold Göttgens

Keyword(s):

High Throughput ◽

Transcriptional Control ◽

High Throughput Sequencing ◽

Control Mechanisms ◽

Sequencing Technologies ◽

The Impact