A high-throughput sequencing determination method for upstream genetic structure (UGS) of ISEcp1-blaCTX-M transposition unit and application of the UGS to classification of bacterial isolates possessing blaCTX-M

Knowledge on population diversity and structure is of fundamental importance for conifer breeding programs. In this study, we concentrated on the development and application of high-density single nucleotide polymorphism (SNP) markers through a high-throughput sequencing technique termed as specific-locus amplified fragment sequencing (SLAF-seq) for the economically important conifer tree species, Chinese fir (Cunninghamia lanceolata). Based on the SLAF-seq, we successfully established a high-density SNP panel consisting of 108,753 genomic SNPs from Chinese fir. This SNP panel facilitated us in gaining insight into the genetic base of the Chinese fir advance breeding population with 221 genotypes for its genetic variation, relationship and diversity, and population structure status. Overall, the present population appears to have considerable genetic variability. Most (94.15%) of the variability was attributed to the genetic differentiation of genotypes, very limited (5.85%) variation occurred on the population (sub-origin set) level. Correspondingly, low FST (0.0285–0.0990) values were seen for the sub-origin sets. When viewing the genetic structure of the population regardless of its sub-origin set feature, the present SNP data opened a new population picture where the advanced Chinese fir breeding population could be divided into four genetic sets, as evidenced by phylogenetic tree and population structure analysis results, albeit some difference in membership of the corresponding set (cluster vs. group). It also suggested that all the genetic sets were admixed clades revealing a complex relationship of the genotypes of this population. With a step wise pruning procedure, we captured a core collection (core 0.650) harboring 143 genotypes that maintains all the allele, diversity, and specific genetic structure of the whole population. This generalist core is valuable for the Chinese fir advanced breeding program and further genetic/genomic studies.

Download Full-text

Microsatellites obtained using high throughput sequencing and a novel microsatellite genotyping method reveals population genetic structure in Norway Lobster, Nephrops norvegicus

Journal of Sea Research ◽

10.1016/j.seares.2021.102139 ◽

2021 ◽

pp. 102139

Author(s):

Jeanne Gallagher ◽

Colm Lordan ◽

Graham M. Hughes ◽

Jónas P. Jonasson ◽

Jens Carlsson

Keyword(s):

Genetic Structure ◽

High Throughput ◽

Population Genetic Structure ◽

Population Genetic ◽

High Throughput Sequencing ◽

Microsatellite Genotyping ◽

Nephrops Norvegicus ◽

Norway Lobster ◽

Genotyping Method

Download Full-text

Use of Targeted High-throughput Sequencing for Genetic Classification of Patients with Bleeding Diathesis and Suspected Platelet Disorder

10.1055/s-0039-1680086 ◽

2019 ◽

Author(s):

O. Andres ◽

E.-M. König ◽

E. Klopocki ◽

H. Schulze ◽

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Bleeding Diathesis ◽

Genetic Classification ◽

Platelet Disorder

Download Full-text

Use of Targeted High-throughput Sequencing for Genetic Classification of Patients with Bleeding Diathesis and Suspected Platelet Disorder

10.1055/s-0039-1680247 ◽

2019 ◽

Author(s):

O. Andres ◽

E.-M. König ◽

E. Klopocki ◽

H. Schulze ◽

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Bleeding Diathesis ◽

Genetic Classification ◽

Platelet Disorder

Download Full-text

The interpretation of somatic genetic variants identified with high-throughput sequencing of DNA from paediatric solid tumors

Nauchno-prakticheskii zhurnal «Medicinskaia genetika» ◽

10.25557/2073-7998.2021.03.3-25 ◽

2021 ◽

pp. 3-25

Author(s):

М.А. Спектор ◽

Л.А. Ясько ◽

А.Е. Друй

Keyword(s):

Solid Tumors ◽

High Throughput ◽

Genetic Variants ◽

High Throughput Sequencing ◽

Genetic Research ◽

The United States ◽

Sequencing Analysis ◽

Clinical Implementation ◽

American Society

Активное внедрение высокопроизводительного секвенирования в клиническую практику требует общего подхода к интерпретации обнаруженных генетических вариантов, в частности, вариантов с соматическим статусом. В 2017 году Ассоциация молекулярной патологии США (AMP), Американская коллегия медицинской генетики и геномики (ACMG), Американское общество клинической онкологии (ASCO) и Коллегия американских патологов (CAP) опубликовали руководство по интерпретации соматических генетических вариантов и выдаче заключений по результатам высокопроизводительного секвенирования опухолевой ДНК. Данный обзор посвящен специфике применения руководства AMP/ACMG/ASCO/CAP для интерпретации результатов генетических исследований детских солидных опухолей. В статье приводятся критерии, на которых основана классификация соматических генетических вариантов, обсуждаются проблемы оценки клинической значимости генетических находок и приводятся примеры классификации генетических вариантов, выявленных в различных типах детских солидных опухолей. Active clinical implementation of high-throughput DNA sequencing requires a common approach to the interpretation of detected genetic variants, including variants with somatic status. In 2017, the United States Association of Molecular Pathology (AMP), the American College of Medical Genetics and Genomics (ACMG), the American Society of Clinical Oncology (ASCO), and the College of American Pathologists (CAP) published the guidelines for interpreting and reporting the somatic genetic variants in cancer identified using high-throughput sequencing analysis. This review focuses on the specific application of the AMP/ACMG/ASCO/CAP guidelines in the field of genetic research on paediatric solid tumors. In particular, the review provides the criteria for classification of somatic genetic variants, discusses the problems of evaluating the clinical significance of genetic findings in paediatric tumors, and provides examples of classification of genetic variants specific for certain types of childhood solid malignancies.

Download Full-text

Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection

mSystems ◽

10.1128/msystems.00230-19 ◽

2020 ◽

Vol 5 (2) ◽

Cited By ~ 2

Author(s):

Thomas P. Quinn ◽

Ionas Erb

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Simple Procedure ◽

Feature Space ◽

Cost Effective ◽

Molecular Profile ◽

Experimental Conditions ◽

Effective Manner ◽

Balance Analysis

ABSTRACT Since the turn of the century, technological advances have made it possible to obtain the molecular profile of any tissue in a cost-effective manner. Among these advances are sophisticated high-throughput assays that measure the relative abundances of microorganisms, RNA molecules, and metabolites. While these data are most often collected to gain new insights into biological systems, they can also be used as biomarkers to create clinically useful diagnostic classifiers. How best to classify high-dimensional -omics data remains an area of active research. However, few explicitly model the relative nature of these data and instead rely on cumbersome normalizations. This report (i) emphasizes the relative nature of health biomarkers, (ii) discusses the literature surrounding the classification of relative data, and (iii) benchmarks how different transformations perform for regularized logistic regression across multiple biomarker types. We show how an interpretable set of log contrasts, called balances, can prepare data for classification. We propose a simple procedure, called discriminative balance analysis, to select groups of 2 and 3 bacteria that can together discriminate between experimental conditions. Discriminative balance analysis is a fast, accurate, and interpretable alternative to data normalization. IMPORTANCE High-throughput sequencing provides an easy and cost-effective way to measure the relative abundance of bacteria in any environmental or biological sample. When these samples come from humans, the microbiome signatures can act as biomarkers for disease prediction. However, because bacterial abundance is measured as a composition, the data have unique properties that make conventional analyses inappropriate. To overcome this, analysts often use cumbersome normalizations. This article proposes an alternative method that identifies pairs and trios of bacteria whose stoichiometric presence can differentiate between diseased and nondiseased samples. By using interpretable log contrasts called balances, we developed an entirely normalization-free classification procedure that reduces the feature space and improves the interpretability, without sacrificing classifier performance.

Download Full-text

Genetic structure of the American ginseng (Panax quinquefolius L.) in Eastern Canada using reduced-representation high-throughput sequencing

Botany ◽

10.1139/cjb-2016-0144 ◽

2017 ◽

Vol 95 (4) ◽

pp. 429-434 ◽

Cited By ~ 3

Author(s):

Simon Joly ◽

Annie Archambault ◽

Stéphanie Pellerin ◽

Andrée Nault

Keyword(s):

Genetic Variation ◽

Genetic Structure ◽

High Throughput ◽

High Throughput Sequencing ◽

Natural Populations ◽

American Ginseng ◽

Panax Quinquefolius ◽

Eastern Canada ◽

Reduced Representation ◽

Wide Range

The American ginseng (Panax quinquefolius L.) has been used for a wide range of medicinal purposes for more than 300 years, and is at risk in most of its range because of harvesting in natural populations, herbivory, and habitat loss. Its genetic structure is largely unknown in the previously glaciated areas of Eastern Canada, although such information could provide useful information for restoration strategies. We generated and analysed data from a reduced-representation high-throughput sequencing approach with a BAMOVA population model to partition the genetic variation within and among six natural populations of American ginseng in Eastern Canada. We found that an important and significant fraction of the genetic variation was structured among populations ([Formula: see text] = 42%; FST = 34%) at the geographical scale of the study (<250 km). No clear evidence of isolation-by-distance was observed. This important genetic structure observed among American ginseng populations from a region that was covered by ice during the last glaciations is similar to what had been found in previous studies on southern populations or throughout the species range.

Download Full-text

taxMaps - Ultra-comprehensive and highly accurate taxonomic classification of short-read data in reasonable time

10.1101/134023 ◽

2017 ◽

Cited By ~ 2

Author(s):

André Corvelo ◽

Wayne E. Clarke ◽

Nicolas Robine ◽

Michael C. Zody

Keyword(s):

High Throughput ◽

Classification Accuracy ◽

High Throughput Sequencing ◽

Computational Cost ◽

Taxonomic Classification ◽

Short Read ◽

Reasonable Time ◽

Classification Tool ◽

Commercial Research

AbstractHigh-throughput sequencing is a revolutionary technology for the analysis of metagenomic samples. However, querying large volumes of reads against comprehensive DNA/RNA databases in a sensitive manner can be compute-intensive. Here, we present taxMaps, a highly efficient, sensitive and fully scalable taxonomic classification tool, capable of delivering classification accuracy comparable to that of BLASTn, but at up to 3 orders of magnitude less computational cost. taxMaps is freely available for academic and non-commercial research purposes at https://github.com/nygenome/taxmaps.

Download Full-text

High-throughput sequencing reveals distinct regional genetic structure among remaining populations of an endangered salt marsh plant in California

Conservation Genetics ◽

10.1007/s10592-020-01269-3 ◽

2020 ◽

Vol 21 (3) ◽

pp. 547-559

Author(s):

Elizabeth R. Milano ◽

Margaret R. Mulligan ◽

Jon P. Rebman ◽

Amy G. Vandergast

Keyword(s):

Salt Marsh ◽

Genetic Structure ◽

High Throughput ◽

High Throughput Sequencing ◽

Marsh Plant ◽

Salt Marsh Plant

Download Full-text

Short k-mer Abundance Profiles Yield Robust Machine Learning Features and Accurate Classifiers for RNA Viruses

10.1101/2020.06.25.170779 ◽

2020 ◽

Author(s):

Md. Nafis Ul Alam ◽

Umar Faruq Chowdhury

Keyword(s):

Machine Learning ◽

Sequence Alignment ◽

High Throughput ◽

High Throughput Sequencing ◽

Rna Viruses ◽

Sequence Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Automated Annotation

AbstractHigh throughout sequencing technologies have greatly enabled the study of genomics, transcriptomics and metagenomics. Automated annotation and classification of the vast amounts of generated sequence data has become paramount for facilitating biological sciences. Genomes of viruses can be radically different from all life, both in terms of molecular structure and primary sequence. Alignment-based and profile-based searches are commonly employed for characterization of assembled viral contigs from high-throughput sequencing data. Recent attempts have highlighted the use of machine learning models for the task but these models rely entirely on DNA genomes and owing to the intrinsic genomic complexity of viruses, RNA viruses have gone completely overlooked. Here, we present a novel short k-mer based sequence scoring method that generates robust sequence information for training machine learning classifiers. We trained 18 classifiers for the task of distinguishing viral RNA from human transcripts. We challenged our models with very stringent testing protocols across different species and evaluated performance against BLASTn, BLASTx and HMMER3 searches. For clean sequence data retrieved from curated databases, our models display near perfect accuracy, outperforming all similar attempts previously reported. On de-novo assemblies of raw RNA-Seq data from cells subjected to Ebola virus, the area under the ROC curve varied from 0.6 to 0.86 depending on the software used for assembly. Our classifier was able to properly classify the majority of the false hits generated by BLAST and HMMER3 searches on the same data. The outstanding performance metrics of our model lays the groundwork for robust machine learning methods for the automated annotation of sequence data.Author SummaryIn this age of high-throughput sequencing, proper classification of copious amounts of sequence data remains to be a daunting challenge. Presently, sequence alignment methods are immediately assigned to the task. Owing to the selection forces of nature, there is considerable homology even between the sequences of different species which draws ambiguity to the results of alignment-based searches. Machine Learning methods are becoming more reliable for characterizing sequence data, but virus genomes are more variable than all forms of life and viruses with RNA-based genomes have gone overlooked in previous machine learning attempts. We designed a novel short k-mer based scoring criteria whereby a large number of highly robust numerical feature sets can be derived from sequence data. These features were able to accurately distinguish virus RNA from human transcripts with performance scores better than all previous reports. Our models were able to generalize well to distant species of viruses and mouse transcripts. The model correctly classifies the majority of false hits generated by current standard alignment tools. These findings strongly imply that this k-mer score based computational pipeline forges a highly informative, rich set of numerical machine learning features and similar pipelines can greatly advance the field of computational biology.

Download Full-text