KinLinks: Software Toolkit for Kinship Analysis and Pedigree Generation from HTS Datasets

AbstractThe ability to predict familial relationships from source DNA in multiple samples has a number of forensic and medical applications. Kinship testing of suspect DNA profiles against relatives in a law enforcement database can provide valuable investigative leads, determination of familial relationships can inform immigration decisions, and remains identification can provide closure to families of missing individuals. The proliferation of High-Throughput Sequencing technologies allows for enhanced capabilities to accurately predict familial relationships to the third degree and beyond. KinLinks, developed by MIT Lincoln Laboratory, is a software tool that predicts pairwise relationships and reconstructs kinship pedigrees for multiple input samples using single-nucleotide polymorphism (SNP) profiles. The software has been trained and evaluated on a set of 175 subjects (30,450 pairwise relationships), consisting of three multi-generational families and 52 geographically diverse subjects. Though a panel of 5396 SNPs was selected for kinship prediction, KinLinks is highly modular, allowing for the substitution of expanded SNP panels and additional training models as sequencing capabilities continue to progress. KinLinks builds on the SNP-calling capabilities of Sherlocks Toolkit, and is fully integrated with the Sherlocks Toolkit pipeline.

Download Full-text

Reassortment of Genome Segments Creates Stable Lineages Among Strains of Orchid Fleck Virus Infecting Citrus in Mexico

Phytopathology ◽

10.1094/phyto-07-19-0253-fi ◽

2020 ◽

Vol 110 (1) ◽

pp. 106-120 ◽

Cited By ~ 1

Author(s):

Avijit Roy ◽

Andrew L. Stone ◽

Gabriel Otero-Colina ◽

Gang Wei ◽

Ronald H. Brlansky ◽

...

Keyword(s):

High Throughput Sequencing ◽

Sensu Stricto ◽

Genome Segment ◽

Rt Pcr ◽

Sequence Comparisons ◽

Orchid Fleck Virus ◽

Reverse Transcription Pcr ◽

Sequencing Technologies ◽

Negative Sense

The genus Dichorhavirus contains viruses with bipartite, negative-sense, single-stranded RNA genomes that are transmitted by flat mites to hosts that include orchids, coffee, the genus Clerodendrum, and citrus. A dichorhavirus infecting citrus in Mexico is classified as a citrus strain of orchid fleck virus (OFV-Cit). We previously used RNA sequencing technologies on OFV-Cit samples from Mexico to develop an OFV-Cit–specific reverse transcription PCR (RT-PCR) assay. During assay validation, OFV-Cit–specific RT-PCR failed to produce an amplicon from some samples with clear symptoms of OFV-Cit. Characterization of this virus revealed that dichorhavirus-like particles were found in the nucleus. High-throughput sequencing of small RNAs from these citrus plants revealed a novel citrus strain of OFV, OFV-Cit2. Sequence comparisons with known orchid and citrus strains of OFV showed variation in the protein products encoded by genome segment 1 (RNA1). Strains of OFV clustered together based on host of origin, whether orchid or citrus, and were clearly separated from other dichorhaviruses described from infected citrus in Brazil. The variation in RNA1 between the original (now OFV-Cit1) and the new (OFV-Cit2) strain was not observed with genome segment 2 (RNA2), but instead, a common RNA2 molecule was shared among strains of OFV-Cit1 and -Cit2, a situation strikingly similar to OFV infecting orchids. We also collected mites at the affected groves, identified them as Brevipalpus californicus sensu stricto, and confirmed that they were infected by OFV-Cit1 or with both OFV-Cit1 and -Cit2. OFV-Cit1 and -Cit2 have coexisted at the same site in Toliman, Queretaro, Mexico since 2012. OFV strain-specific diagnostic tests were developed.

Download Full-text

Application of Oxford Nanopore Technology to Plant Virus Detection

Viruses ◽

10.3390/v13081424 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1424

Author(s):

Lia W. Liefting ◽

David W. Waite ◽

Jeremy R. Thompson

Keyword(s):

Plant Virus ◽

High Throughput Sequencing ◽

Virus Detection ◽

Diagnostic Methods ◽

Plant Virus Detection ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Virus Diagnostics ◽

Post Entry ◽

Read Accuracy

The adoption of Oxford Nanopore Technologies (ONT) sequencing as a tool in plant virology has been relatively slow despite its promise in more recent years to yield large quantities of long nucleotide sequences in real time without the need for prior amplification. The portability of the MinION and Flongle platforms combined with lowering costs and continued improvements in read accuracy make ONT an attractive method for both low- and high-scale virus diagnostics. Here, we provide a detailed step-by-step protocol using the ONT Flongle platform that we have developed for the routine application on a range of symptomatic post-entry quarantine and domestic surveillance plant samples. The aim of this methods paper is to highlight ONT’s feasibility as a valuable component to the diagnostician’s toolkit and to hopefully stimulate other laboratories towards the eventual goal of integrating high-throughput sequencing technologies as validated plant virus diagnostic methods in their own right.

Download Full-text

Assessing genotyping errors in mammalian museum study skins using high-throughput genotyping-by-sequencing

Conservation Genetics Resources ◽

10.1007/s12686-021-01213-8 ◽

2021 ◽

Author(s):

Stella C. Yuan ◽

Eric Malekos ◽

Melissa T. R. Hawkins

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Massively Parallel Sequencing ◽

Massively Parallel ◽

Museum Specimens ◽

Museum Specimen ◽

Genotyping Errors ◽

Allelic Dropout ◽

Parallel Sequencing ◽

Sequencing Technologies

AbstractThe use of museum specimens held in natural history repositories for population and conservation genetic research is increasing in tandem with the use of massively parallel sequencing technologies. Short Tandem Repeats (STRs), or microsatellite loci, are commonly used genetic markers in wildlife and population genetic studies. However, they traditionally suffered from a host of issues including length homoplasy, high costs, low throughput, and difficulties in reproducibility across laboratories. Massively parallel sequencing technologies can address these problems, but the incorporation of museum specimen derived DNA suffers from significant fragmentation and exogenous DNA contamination. Combatting these issues requires extra measures of stringency in the lab and during data analysis, yet there have not been any high-throughput sequencing studies evaluating microsatellite allelic dropout from museum specimen extracted DNA. In this study, we evaluate genotyping errors derived from mammalian museum skin DNA extracts for previously characterized microsatellites across PCR replicates utilizing high-throughput sequencing. We found it useful to classify samples based on DNA concentration, which determined the rate by which genotypes were accurately recovered. Longer microsatellites performed worse in all museum specimens. Allelic dropout rates across loci were dependent on sample quantity, with high concentration museum specimens performing as well and recovering quality metrics nearly as high as the frozen tissue sample. Based on our results, we provide a set of best practices for quality assurance and incorporation of reliable genotypes from museum specimens.

Download Full-text

Multi-omics approach to precision medicine for immune-mediated diseases

Inflammation and Regeneration ◽

10.1186/s41232-021-00173-8 ◽

2021 ◽

Vol 41 (1) ◽

Author(s):

Mineto Ota ◽

Keishi Fujio

Keyword(s):

Treatment Response ◽

High Throughput Sequencing ◽

Disease Risk ◽

Clinical Information ◽

Clinical Settings ◽

Social Significance ◽

Sequencing Technologies ◽

Immune Mediated ◽

Recent Innovation ◽

Future Direction

AbstractRecent innovation in high-throughput sequencing technologies has drastically empowered the scientific research. Consequently, now, it is possible to capture comprehensive profiles of samples at multiple levels including genome, epigenome, and transcriptome at a time. Applying these kinds of rich information to clinical settings is of great social significance. For some traits such as cardiovascular diseases, attempts to apply omics datasets in clinical practice for the prediction of the disease risk have already shown promising results, although still under way for immune-mediated diseases. Multiple studies have tried to predict treatment response in immune-mediated diseases using genomic, transcriptomic, or clinical information, showing various possible indicators. For better prediction of treatment response or disease outcome in immune-mediated diseases, combining multi-layer information together may increase the power. In addition, in order to efficiently pick up meaningful information from the massive data, high-quality annotation of genomic functions is also crucial. In this review, we discuss the achievement so far and the future direction of multi-omics approach to immune-mediated diseases.

Download Full-text

Profiling DNA Methylation Based on Next-Generation Sequencing Approaches: New Insights and Clinical Applications

Genes ◽

10.3390/genes9090429 ◽

2018 ◽

Vol 9 (9) ◽

pp. 429 ◽

Cited By ~ 38

Author(s):

Daniela Barros-Silva ◽

C. Marques ◽

Rui Henrique ◽

Carmen Jerónimo

Keyword(s):

Dna Methylation ◽

Next Generation Sequencing ◽

High Throughput Sequencing ◽

Epigenetic Modification ◽

Response To Therapy ◽

Next Generation ◽

Sequencing Technologies ◽

Prognosis And Prediction ◽

Novel Biomarkers ◽

Generation Sequencing

DNA methylation is an epigenetic modification that plays a pivotal role in regulating gene expression and, consequently, influences a wide variety of biological processes and diseases. The advances in next-generation sequencing technologies allow for genome-wide profiling of methyl marks both at a single-nucleotide and at a single-cell resolution. These profiling approaches vary in many aspects, such as DNA input, resolution, coverage, and bioinformatics analysis. Thus, the selection of the most feasible method according with the project’s purpose requires in-depth knowledge of those techniques. Currently, high-throughput sequencing techniques are intensively used in epigenomics profiling, which ultimately aims to find novel biomarkers for detection, diagnosis prognosis, and prediction of response to therapy, as well as to discover new targets for personalized treatments. Here, we present, in brief, a portrayal of next-generation sequencing methodologies’ evolution for profiling DNA methylation, highlighting its potential for translational medicine and presenting significant findings in several diseases.

Download Full-text

Detection of novel allelic variations in soybean mutant population using Tilling by Sequencing

10.1101/711440 ◽

2019 ◽

Author(s):

Reneth Millas ◽

Mary Espina ◽

CM Sabbir Ahmed ◽

Angelina Bernardini ◽

Ekundayo Adeleke ◽

...

Keyword(s):

Fatty Acid ◽

High Throughput ◽

Reverse Genetics ◽

High Throughput Sequencing ◽

Fatty Acid Biosynthesis ◽

Induced Mutations ◽

Mutant Population ◽

Sequencing Technologies ◽

Allelic Variations ◽

Tilling By Sequencing

ABSTRACTOne of the most important tools in genetic improvement is mutagenesis, which is a useful tool to induce genetic and phenotypic variation for trait improvement and discovery of novel genes. JTN-5203 (MG V) mutant population was generated using an induced ethyl methane sulfonate (EMS) mutagenesis and was used for detection of induced mutations in FAD2-1A and FAD2-1B genes using reverse genetics approach. Optimum concentration of EMS was used to treat 15,000 bulk JTN-5203 seeds producing 1,820 M2 population. DNA was extracted, normalized, and pooled from these individuals. Specific primers were designed from FAD2-1A and FAD2-1B genes that are involved in the fatty acid biosynthesis pathway for further analysis using next-generation sequencing. High throughput mutation discovery through TILLING-by-Sequencing approach was used to detect novel allelic variations in this population. Several mutations and allelic variations with high impacts were detected for FAD2-1A and FAD2-1B. This includes GC to AT transition mutations in FAD2-1A (20%) and FAD2-1B (69%). Mutation density for this population is estimated to be about 1/136kb. Through mutagenesis and high-throughput sequencing technologies, novel alleles underlying the mutations observed in mutants with reduced polyunsaturated fatty acids will be identified, and these mutants can be further used in breeding soybean lines with improved fatty acid profile, thereby developing heart-healthy-soybeans.

Download Full-text

A New Paralog Removal Pipeline Resolves Conflict between RAD-seq and Enrichment

10.1101/2020.10.26.355248 ◽

2020 ◽

Author(s):

Wenbin Zhou ◽

John Soghigian ◽

Qiu-yun (Jenny) Xiang

Keyword(s):

High Throughput Sequencing ◽

Sequence Similarity ◽

Phylogenetic Analyses ◽

Disjunct Distribution ◽

Divergence Times ◽

Target Enrichment ◽

Sequencing Technologies ◽

Duplication Events ◽

The Witch ◽

Phylogenomic Analyses

ABSTRACTTarget enrichment and RAD-seq are well-established high throughput sequencing technologies that have been increasingly used for phylogenomic studies, and the choice between methods is a practical issue for plant systematists studying the evolutionary histories of biodiversity of relatively recent origins. However, few studies have compared the congruence and conflict between results from the two methods within the same group of organisms, especially in plants, where extensive genome duplication events may complicate phylogenomic analyses. Unfortunately, currently widely used pipelines for target enrichment data analysis do not have a vigorous procedure for remove paralogs in Hyb-Seq data. In this study, we employed RAD-seq and Hyb-Seq of Angiosperm 353 genes in phylogenomic and biogeographic studies of Hamamelis (the witch-hazels) and Castanea (chestnuts), two classic examples exhibiting the well-known eastern Asian-eastern North American disjunct distribution. We compared these two methods side by side and developed a new pipeline (PPD) with a more vigorous removal of putative paralogs from Hyb-Seq data. The new pipeline considers both sequence similarity and heterozygous sites at each locus in identification of paralogous. We used our pipeline to construct robust datasets for comparison between methods and downstream analyses on the two genera. Our results demonstrated that the PPD identified many more putative paralogs than the popular method HybPiper. Comparisons of tree topologies and divergence times showed significant differences between data from HybPiper and data from our new PPD pipeline, likely due to the error signals from the paralogous genes undetected by HybPiper, but trimmed by PPD. We found that phylogenies and divergence times estimated from our RAD-seq and Hyb-Seq-PPD were largely congruent. We highlight the importance of removal paralogs in enrichment data, and discuss the merits of RAD-seq and Hyb-Seq. Finally, phylogenetic analyses of RAD-seq and Hyb-Seq resulted in well-resolved species relationships, and revealed ancient introgression in both genera. Biogeographic analyses including fossil data revealed a complicated history of each genus involving multiple intercontinental dispersals and local extinctions in areas outside of the taxa’s modern ranges in both the Paleogene and Neogene. Our study demonstrates the value of additional steps for filtering paralogous gene content from Angiosperm 353 data, such as our new PPD pipeline described in this study. [RAD-seq, Hyb-Seq, paralogs, Castanea, Hamamelis, eastern Asia-eastern North America disjunction, biogeography, ancient introgression]

Download Full-text

Utilizing the VirIdAl Pipeline to Search for Viruses in the Metagenomic Data of Bat Samples

Viruses ◽

10.3390/v13102006 ◽

2021 ◽

Vol 13 (10) ◽

pp. 2006

Author(s):

Anna Y Budkina ◽

Elena V Korneenko ◽

Ivan A Kotov ◽

Daniil A Kiselev ◽

Ilya V Artyushin ◽

...

Keyword(s):

Large Scale ◽

High Throughput Sequencing ◽

Metagenomic Data ◽

Sequencing Data ◽

Viral Pathogens ◽

Genomic Databases ◽

Bioinformatic Pipeline ◽

Viral Genomes ◽

Sequencing Technologies ◽

Viral Screening

According to various estimates, only a small percentage of existing viruses have been discovered, naturally much less being represented in the genomic databases. High-throughput sequencing technologies develop rapidly, empowering large-scale screening of various biological samples for the presence of pathogen-associated nucleotide sequences, but many organisms are yet to be attributed specific loci for identification. This problem particularly impedes viral screening, due to vast heterogeneity in viral genomes. In this paper, we present a new bioinformatic pipeline, VirIdAl, for detecting and identifying viral pathogens in sequencing data. We also demonstrate the utility of the new software by applying it to viral screening of the feces of bats collected in the Moscow region, which revealed a significant variety of viruses associated with bats, insects, plants, and protozoa. The presence of alpha and beta coronavirus reads, including the MERS-like bat virus, deserves a special mention, as it once again indicates that bats are indeed reservoirs for many viral pathogens. In addition, it was shown that alignment-based methods were unable to identify the taxon for a large proportion of reads, and we additionally applied other approaches, showing that they can further reveal the presence of viral agents in sequencing data. However, the incompleteness of viral databases remains a significant problem in the studies of viral diversity, and therefore necessitates the use of combined approaches, including those based on machine learning methods.

Download Full-text

Assessing pollution of aquatic environments with diatoms’ DNA metabarcoding: experience and developments from France water framework directive networks

Metabarcoding and Metagenomics ◽

10.3897/mbmg.3.39646 ◽

2019 ◽

Vol 3 ◽

Cited By ~ 3

Author(s):

Vasselon Valentin ◽

Rimet Frédéric ◽

Domaizon Isabelle ◽

Monnier Olivier ◽

Reyjol Yorick ◽

...

Keyword(s):

Water Framework Directive ◽

High Throughput Sequencing ◽

Ecological Status ◽

Aquatic Environments ◽

Morphological Identification ◽

The Past ◽

Sequencing Technologies ◽

Dna Metabarcoding ◽

Morphological Approach ◽

Status Assessment

Ecological status assessment of watercourses is based on the calculation of quality indices using pollution sensitivity of targeted biological groups, including diatoms. The determination and quantification of diatom species is generally based on microscopic morphological identification, which requires expertise and is time-consuming and costly. In Europe, this morphological approach is legally imposed by standards and regulatory decrees by the Water Framework Directive (WFD). Over the past decade, a DNA-based molecular biology approach has newly been developed to identify species based on genetic criteria rather than morphological ones (i.e. DNA metabarcoding). In combination with high throughput sequencing technologies, metabarcoding makes it possible both to identify all species present in an environmental sample and to process several hundred samples in parallel. This article presents the results of two recent studies carried out on the WFD networks of rivers of Mayotte (2013–2018) and metropolitan France (2016–2018). These studies aimed at testing the potential application of metabarcoding for biomonitoring in the context of the WFD. We discuss the various methodological developments and optimisations that have been made to make the taxonomic inventories of diatoms produced by metabarcoding more reliable, particularly in terms of species quantification. We present the results of the application of this DNA approach on more than 500 river sites, comparing them with those obtained using the standardised morphological method. Finally, we discuss the potential of metabarcoding for routine application, its limits of application and propose some recommendations for future implementation in WFD.

Download Full-text

Adenosine-to-inosine RNA editing may be implicated in human pathogenesis

Bulletin of Russian State Medical University ◽

10.24075/brsmu.2019.028 ◽

2019 ◽

pp. 22-25

Author(s):

AA Kliuchnikova ◽

SA Moshkovskii

Keyword(s):

Immune Responses ◽

Rna Editing ◽

High Throughput ◽

High Throughput Sequencing ◽

Common Mechanism ◽

Adenosine Deaminases ◽

Human Transcriptome ◽

Sequencing Technologies

Adenosine-to-inosine (A-to-I) RNA editing is a common mechanism of post-transcriptional modification in many metazoans including vertebrates; the process is catalyzed by adenosine deaminases acting on RNA (ADARs). Using high-throughput sequencing technologies resulted in finding thousands of RNA editing sites throughout the human transcriptome however, their functions are still poorly understood. The aim of this brief review is to draw attention of clinicians and biomedical researchers to ADAR-mediated RNA editing phenomenon and its possible implication in development of neuropathologies, antiviral immune responses and cancer.

Download Full-text