GrigoraSNPs: Optimized HTS DNA Forensic SNP Analysis

AbstractHigh throughput DNA sequencing technologies enable improved characterization of forensic DNA samples enabling greater insights into DNA contributor(s). Current DNA forensics techniques rely upon allele sizing of short tandem repeats by capillary electrophoresis. High throughput sequencing enables forensic sample characterizations for large numbers of single nucleotide polymorphism loci. The slowest computational component of the DNA forensics analysis pipeline is the characterization of raw sequence data. This paper optimizes the SNP calling module of the DNA analysis pipeline with runtime results that scale linearly with the number of HTS sequences (patent pending)[1]. GrigoraSNPs can analyze 100 million reads in less than 5 minutes using 3 threads on a 4.0 GHz Intel i7-6700K laptop CPU.

Download Full-text

Recent Advances on Detection and Characterization of Fruit Tree Viruses Using High-Throughput Sequencing Technologies

Viruses ◽

10.3390/v10080436 ◽

2018 ◽

Vol 10 (8) ◽

pp. 436 ◽

Cited By ~ 30

Author(s):

Varvara Maliogka ◽

Angelantonio Minafra ◽

Pasquale Saldarelli ◽

Ana Ruiz-García ◽

Miroslav Glasa ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Infected Plant ◽

Fruit Trees ◽

Plant Viruses ◽

Fruit Tree ◽

Advantages And Disadvantages ◽

Sequencing Technologies ◽

Recent Advances

Perennial crops, such as fruit trees, are infected by many viruses, which are transmitted through vegetative propagation and grafting of infected plant material. Some of these pathogens cause severe crop losses and often reduce the productive life of the orchards. Detection and characterization of these agents in fruit trees is challenging, however, during the last years, the wide application of high-throughput sequencing (HTS) technologies has significantly facilitated this task. In this review, we present recent advances in the discovery, detection, and characterization of fruit tree viruses and virus-like agents accomplished by HTS approaches. A high number of new viruses have been described in the last 5 years, some of them exhibiting novel genomic features that have led to the proposal of the creation of new genera, and the revision of the current virus taxonomy status. Interestingly, several of the newly identified viruses belong to virus genera previously unknown to infect fruit tree species (e.g., Fabavirus, Luteovirus) a fact that challenges our perspective of plant viruses in general. Finally, applied methodologies, including the use of different molecules as templates, as well as advantages and disadvantages and future directions of HTS in fruit tree virology are discussed.

Download Full-text

GtTR: Bayesian estimation of absolute tandem repeat copy number using sequence capture and high throughput sequencing

10.1101/246108 ◽

2018 ◽

Cited By ~ 1

Author(s):

Devika Ganesamoorthy ◽

Minh Duc Cao ◽

Tania Duarte ◽

Wenhan Chen ◽

Lachlan Coin

Keyword(s):

High Throughput ◽

Tandem Repeat ◽

Copy Number ◽

Tandem Repeats ◽

High Throughput Sequencing ◽

Sequence Data ◽

Complex Diseases ◽

Sequencing Analysis ◽

Reference Dataset ◽

Long Read

ABSTRACTBackgroundTandem repeats comprise significant proportion of the human genome including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between individuals. Despite recent advances in high throughput sequencing, analysis of tandem repeats in the context of complex diseases is still hindered by technical limitations.MethodsWe report a novel targeted sequencing approach, which allows simultaneous analysis of hundreds of repeats. We developed a Bayesian algorithm, namely – GtTR - which combines information from a reference long-read dataset with a short read counting approach to genotype tandem repeats at population scale. PCR sizing analysis was used for validation.ResultsWe used a PacBio long-read sequenced sample to generate a reference tandem repeat genotype dataset with on average 13% absolute deviation from PCR sizing results. Using this reference dataset GtTR generated estimates of VNTR copy number with accuracy within 95% high posterior density (HPD) intervals of 68% and 83% for capture sequence data and 200X WGS data respectively, improving to 87% and 94% with use of a PCR reference. We show that the genotype resolution increases as a function of depth, such that the median 95% HPD interval lies within 25%, 14%, 12% and 8% of the its midpoint copy number value for 30X, 200X WGS, 395X and 800X capture sequence data respectively. We validated nine targets by PCR sizing analysis and genotype estimates from sequencing results correlated well with PCR results.ConclusionsThe novel genotyping approach described here presents a new cost-effective method to explore previously unrecognized class of repeat variation in GWAS studies of complex diseases at the population level. Further improvements in accuracy can be obtained by improving accuracy of the reference dataset.

Download Full-text

Advances in high throughput DNA sequence data compression

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016300021 ◽

2016 ◽

Vol 14 (03) ◽

pp. 1630002 ◽

Cited By ~ 13

Author(s):

Muhammad Sardaraz ◽

Muhammad Tahir ◽

Ataul Aziz Ikram

Keyword(s):

Data Compression ◽

Dna Sequence ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequence Data ◽

Sequencing Data ◽

Research Directions ◽

Compression Algorithms ◽

Dna Sequence Data ◽

Sequencing Technologies

Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.

Download Full-text

On the optimal trimming of high-throughput mRNA sequence data

10.1101/000422 ◽

2013 ◽

Cited By ~ 1

Author(s):

Matthew D. MacManes

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Sequence Data ◽

Deep Understanding ◽

Sequencing Technologies ◽

Software Packages ◽

Sequencing Quality ◽

Functional Biology ◽

Genome Level ◽

Optimal Strength

AbstractThe widespread and rapid adoption of high-throughput sequencing technologies has afforded researchers the opportunity to gain a deep understanding of genome level processes that underlie evolutionary change, and perhaps more importantly, the links between genotype and phenotype. In particular, researchers interested in functional biology and adaptation have used these technologies to sequence mRNA transcriptomes of specific tissues, which in turn are often compared to other tissues, or other individuals with different phenotypes. While these techniques are extremely powerful, careful attention to data quality is required. In particular, because high-throughput sequencing is more error-prone than traditional Sanger sequencing, quality trimming of sequence reads should be an important step in all data processing pipelines. While several software packages for quality trimming exist, no general guidelines for the specifics of trimming have been developed. Here, using empirically derived sequence data, I provide general recommendations regarding the optimal strength of trimming, specifically in mRNA-Seq studies. Although very aggressive quality trimming is common, this study suggests that a more gentle trimming, specifically of those nucleotides whose Phred score <2 or <5, is optimal for most studies across a wide variety of metrics.

Download Full-text

Reassortment of Genome Segments Creates Stable Lineages Among Strains of Orchid Fleck Virus Infecting Citrus in Mexico

Phytopathology ◽

10.1094/phyto-07-19-0253-fi ◽

2020 ◽

Vol 110 (1) ◽

pp. 106-120 ◽

Cited By ~ 1

Author(s):

Avijit Roy ◽

Andrew L. Stone ◽

Gabriel Otero-Colina ◽

Gang Wei ◽

Ronald H. Brlansky ◽

...

Keyword(s):

High Throughput Sequencing ◽

Sensu Stricto ◽

Genome Segment ◽

Rt Pcr ◽

Sequence Comparisons ◽

Orchid Fleck Virus ◽

Reverse Transcription Pcr ◽

Sequencing Technologies ◽

Negative Sense

The genus Dichorhavirus contains viruses with bipartite, negative-sense, single-stranded RNA genomes that are transmitted by flat mites to hosts that include orchids, coffee, the genus Clerodendrum, and citrus. A dichorhavirus infecting citrus in Mexico is classified as a citrus strain of orchid fleck virus (OFV-Cit). We previously used RNA sequencing technologies on OFV-Cit samples from Mexico to develop an OFV-Cit–specific reverse transcription PCR (RT-PCR) assay. During assay validation, OFV-Cit–specific RT-PCR failed to produce an amplicon from some samples with clear symptoms of OFV-Cit. Characterization of this virus revealed that dichorhavirus-like particles were found in the nucleus. High-throughput sequencing of small RNAs from these citrus plants revealed a novel citrus strain of OFV, OFV-Cit2. Sequence comparisons with known orchid and citrus strains of OFV showed variation in the protein products encoded by genome segment 1 (RNA1). Strains of OFV clustered together based on host of origin, whether orchid or citrus, and were clearly separated from other dichorhaviruses described from infected citrus in Brazil. The variation in RNA1 between the original (now OFV-Cit1) and the new (OFV-Cit2) strain was not observed with genome segment 2 (RNA2), but instead, a common RNA2 molecule was shared among strains of OFV-Cit1 and -Cit2, a situation strikingly similar to OFV infecting orchids. We also collected mites at the affected groves, identified them as Brevipalpus californicus sensu stricto, and confirmed that they were infected by OFV-Cit1 or with both OFV-Cit1 and -Cit2. OFV-Cit1 and -Cit2 have coexisted at the same site in Toliman, Queretaro, Mexico since 2012. OFV strain-specific diagnostic tests were developed.

Download Full-text

Assessing genotyping errors in mammalian museum study skins using high-throughput genotyping-by-sequencing

Conservation Genetics Resources ◽

10.1007/s12686-021-01213-8 ◽

2021 ◽

Author(s):

Stella C. Yuan ◽

Eric Malekos ◽

Melissa T. R. Hawkins

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Massively Parallel Sequencing ◽

Massively Parallel ◽

Museum Specimens ◽

Museum Specimen ◽

Genotyping Errors ◽

Allelic Dropout ◽

Parallel Sequencing ◽

Sequencing Technologies

AbstractThe use of museum specimens held in natural history repositories for population and conservation genetic research is increasing in tandem with the use of massively parallel sequencing technologies. Short Tandem Repeats (STRs), or microsatellite loci, are commonly used genetic markers in wildlife and population genetic studies. However, they traditionally suffered from a host of issues including length homoplasy, high costs, low throughput, and difficulties in reproducibility across laboratories. Massively parallel sequencing technologies can address these problems, but the incorporation of museum specimen derived DNA suffers from significant fragmentation and exogenous DNA contamination. Combatting these issues requires extra measures of stringency in the lab and during data analysis, yet there have not been any high-throughput sequencing studies evaluating microsatellite allelic dropout from museum specimen extracted DNA. In this study, we evaluate genotyping errors derived from mammalian museum skin DNA extracts for previously characterized microsatellites across PCR replicates utilizing high-throughput sequencing. We found it useful to classify samples based on DNA concentration, which determined the rate by which genotypes were accurately recovered. Longer microsatellites performed worse in all museum specimens. Allelic dropout rates across loci were dependent on sample quantity, with high concentration museum specimens performing as well and recovering quality metrics nearly as high as the frozen tissue sample. Based on our results, we provide a set of best practices for quality assurance and incorporation of reliable genotypes from museum specimens.

Download Full-text

Characterization of oral bacterial diversity of irradiated patients by high-throughput sequencing

International Journal of Oral Science ◽

10.1038/ijos.2013.15 ◽

2013 ◽

Vol 5 (1) ◽

pp. 21-25 ◽

Cited By ~ 15

Author(s):

Yue-Jian Hu ◽

Qian Wang ◽

Yun-Tao Jiang ◽

Rui Ma ◽

Wen-Wei Xia ◽

...

Keyword(s):

Bacterial Diversity ◽

High Throughput ◽

High Throughput Sequencing

Download Full-text

Characterization of Bacteria in Biopsies of Colon and Stools by High Throughput Sequencing of the V2 Region of Bacterial 16S rRNA Gene in Human

PLoS ONE ◽

10.1371/journal.pone.0016952 ◽

2011 ◽

Vol 6 (2) ◽

pp. e16952 ◽

Cited By ~ 76

Author(s):

Yukihide Momozawa ◽

Valérie Deffontaine ◽

Edouard Louis ◽

Juan F. Medrano

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

High Throughput ◽

High Throughput Sequencing ◽

Rrna Gene ◽

Bacterial 16S Rrna Gene

Download Full-text

Sequencing and Computational Approaches to Identification and Characterization of Microbial Organisms

Biomedical Engineering and Computational Biology ◽

10.4137/becb.s10886 ◽

2013 ◽

Vol 5 ◽

pp. BECB.S10886 ◽

Cited By ~ 2

Author(s):

Brijesh Singh Yadav ◽

Venkateswarlu Ronda ◽

Dinesh P. Vashista ◽

Bhaskar Sharma

Keyword(s):

Sequence Data ◽

Microbial Interactions ◽

Microbial Pathogens ◽

Nucleotide Sequence Data ◽

Computational Approaches ◽

Microbial Detection ◽

Sequencing Technologies ◽

Sequencing Platforms ◽

Identification And Characterization

The recent advances in sequencing technologies and computational approaches are propelling scientists ever closer towards complete understanding of human-microbial interactions. The powerful sequencing platforms are rapidly producing huge amounts of nucleotide sequence data which are compiled into huge databases. This sequence data can be retrieved, assembled, and analyzed for identification of microbial pathogens and diagnosis of diseases. In this article, we present a commentary on how the metagenomics incorporated with microarray and new sequencing techniques are helping microbial detection and characterization.

Download Full-text