A high-throughput sequencing determination method for upstream genetic structure (UGS) of ISEcp1-blaCTX-M transposition unit and application of the UGS to classification of bacterial isolates possessing blaCTX-M

Author(s):  
Nobuyoshi Yagi ◽  
Kouta Hamamoto ◽  
Kim Ngan Thi Bui ◽  
Shuhei Ueda ◽  
Saki Tawata ◽  
...  
Forests ◽  
2019 ◽  
Vol 10 (8) ◽  
pp. 681 ◽  
Author(s):  
Huiquan Zheng ◽  
Dehuo Hu ◽  
Ruping Wei ◽  
Shu Yan ◽  
Runhui Wang

Knowledge on population diversity and structure is of fundamental importance for conifer breeding programs. In this study, we concentrated on the development and application of high-density single nucleotide polymorphism (SNP) markers through a high-throughput sequencing technique termed as specific-locus amplified fragment sequencing (SLAF-seq) for the economically important conifer tree species, Chinese fir (Cunninghamia lanceolata). Based on the SLAF-seq, we successfully established a high-density SNP panel consisting of 108,753 genomic SNPs from Chinese fir. This SNP panel facilitated us in gaining insight into the genetic base of the Chinese fir advance breeding population with 221 genotypes for its genetic variation, relationship and diversity, and population structure status. Overall, the present population appears to have considerable genetic variability. Most (94.15%) of the variability was attributed to the genetic differentiation of genotypes, very limited (5.85%) variation occurred on the population (sub-origin set) level. Correspondingly, low FST (0.0285–0.0990) values were seen for the sub-origin sets. When viewing the genetic structure of the population regardless of its sub-origin set feature, the present SNP data opened a new population picture where the advanced Chinese fir breeding population could be divided into four genetic sets, as evidenced by phylogenetic tree and population structure analysis results, albeit some difference in membership of the corresponding set (cluster vs. group). It also suggested that all the genetic sets were admixed clades revealing a complex relationship of the genotypes of this population. With a step wise pruning procedure, we captured a core collection (core 0.650) harboring 143 genotypes that maintains all the allele, diversity, and specific genetic structure of the whole population. This generalist core is valuable for the Chinese fir advanced breeding program and further genetic/genomic studies.


Author(s):  
М.А. Спектор ◽  
Л.А. Ясько ◽  
А.Е. Друй

Активное внедрение высокопроизводительного секвенирования в клиническую практику требует общего подхода к интерпретации обнаруженных генетических вариантов, в частности, вариантов с соматическим статусом. В 2017 году Ассоциация молекулярной патологии США (AMP), Американская коллегия медицинской генетики и геномики (ACMG), Американское общество клинической онкологии (ASCO) и Коллегия американских патологов (CAP) опубликовали руководство по интерпретации соматических генетических вариантов и выдаче заключений по результатам высокопроизводительного секвенирования опухолевой ДНК. Данный обзор посвящен специфике применения руководства AMP/ACMG/ASCO/CAP для интерпретации результатов генетических исследований детских солидных опухолей. В статье приводятся критерии, на которых основана классификация соматических генетических вариантов, обсуждаются проблемы оценки клинической значимости генетических находок и приводятся примеры классификации генетических вариантов, выявленных в различных типах детских солидных опухолей. Active clinical implementation of high-throughput DNA sequencing requires a common approach to the interpretation of detected genetic variants, including variants with somatic status. In 2017, the United States Association of Molecular Pathology (AMP), the American College of Medical Genetics and Genomics (ACMG), the American Society of Clinical Oncology (ASCO), and the College of American Pathologists (CAP) published the guidelines for interpreting and reporting the somatic genetic variants in cancer identified using high-throughput sequencing analysis. This review focuses on the specific application of the AMP/ACMG/ASCO/CAP guidelines in the field of genetic research on paediatric solid tumors. In particular, the review provides the criteria for classification of somatic genetic variants, discusses the problems of evaluating the clinical significance of genetic findings in paediatric tumors, and provides examples of classification of genetic variants specific for certain types of childhood solid malignancies.


mSystems ◽  
2020 ◽  
Vol 5 (2) ◽  
Author(s):  
Thomas P. Quinn ◽  
Ionas Erb

ABSTRACT Since the turn of the century, technological advances have made it possible to obtain the molecular profile of any tissue in a cost-effective manner. Among these advances are sophisticated high-throughput assays that measure the relative abundances of microorganisms, RNA molecules, and metabolites. While these data are most often collected to gain new insights into biological systems, they can also be used as biomarkers to create clinically useful diagnostic classifiers. How best to classify high-dimensional -omics data remains an area of active research. However, few explicitly model the relative nature of these data and instead rely on cumbersome normalizations. This report (i) emphasizes the relative nature of health biomarkers, (ii) discusses the literature surrounding the classification of relative data, and (iii) benchmarks how different transformations perform for regularized logistic regression across multiple biomarker types. We show how an interpretable set of log contrasts, called balances, can prepare data for classification. We propose a simple procedure, called discriminative balance analysis, to select groups of 2 and 3 bacteria that can together discriminate between experimental conditions. Discriminative balance analysis is a fast, accurate, and interpretable alternative to data normalization. IMPORTANCE High-throughput sequencing provides an easy and cost-effective way to measure the relative abundance of bacteria in any environmental or biological sample. When these samples come from humans, the microbiome signatures can act as biomarkers for disease prediction. However, because bacterial abundance is measured as a composition, the data have unique properties that make conventional analyses inappropriate. To overcome this, analysts often use cumbersome normalizations. This article proposes an alternative method that identifies pairs and trios of bacteria whose stoichiometric presence can differentiate between diseased and nondiseased samples. By using interpretable log contrasts called balances, we developed an entirely normalization-free classification procedure that reduces the feature space and improves the interpretability, without sacrificing classifier performance.


Botany ◽  
2017 ◽  
Vol 95 (4) ◽  
pp. 429-434 ◽  
Author(s):  
Simon Joly ◽  
Annie Archambault ◽  
Stéphanie Pellerin ◽  
Andrée Nault

The American ginseng (Panax quinquefolius L.) has been used for a wide range of medicinal purposes for more than 300 years, and is at risk in most of its range because of harvesting in natural populations, herbivory, and habitat loss. Its genetic structure is largely unknown in the previously glaciated areas of Eastern Canada, although such information could provide useful information for restoration strategies. We generated and analysed data from a reduced-representation high-throughput sequencing approach with a BAMOVA population model to partition the genetic variation within and among six natural populations of American ginseng in Eastern Canada. We found that an important and significant fraction of the genetic variation was structured among populations ([Formula: see text] = 42%; FST = 34%) at the geographical scale of the study (<250 km). No clear evidence of isolation-by-distance was observed. This important genetic structure observed among American ginseng populations from a region that was covered by ice during the last glaciations is similar to what had been found in previous studies on southern populations or throughout the species range.


2017 ◽  
Author(s):  
André Corvelo ◽  
Wayne E. Clarke ◽  
Nicolas Robine ◽  
Michael C. Zody

AbstractHigh-throughput sequencing is a revolutionary technology for the analysis of metagenomic samples. However, querying large volumes of reads against comprehensive DNA/RNA databases in a sensitive manner can be compute-intensive. Here, we present taxMaps, a highly efficient, sensitive and fully scalable taxonomic classification tool, capable of delivering classification accuracy comparable to that of BLASTn, but at up to 3 orders of magnitude less computational cost. taxMaps is freely available for academic and non-commercial research purposes at https://github.com/nygenome/taxmaps.


2020 ◽  
Author(s):  
Md. Nafis Ul Alam ◽  
Umar Faruq Chowdhury

AbstractHigh throughout sequencing technologies have greatly enabled the study of genomics, transcriptomics and metagenomics. Automated annotation and classification of the vast amounts of generated sequence data has become paramount for facilitating biological sciences. Genomes of viruses can be radically different from all life, both in terms of molecular structure and primary sequence. Alignment-based and profile-based searches are commonly employed for characterization of assembled viral contigs from high-throughput sequencing data. Recent attempts have highlighted the use of machine learning models for the task but these models rely entirely on DNA genomes and owing to the intrinsic genomic complexity of viruses, RNA viruses have gone completely overlooked. Here, we present a novel short k-mer based sequence scoring method that generates robust sequence information for training machine learning classifiers. We trained 18 classifiers for the task of distinguishing viral RNA from human transcripts. We challenged our models with very stringent testing protocols across different species and evaluated performance against BLASTn, BLASTx and HMMER3 searches. For clean sequence data retrieved from curated databases, our models display near perfect accuracy, outperforming all similar attempts previously reported. On de-novo assemblies of raw RNA-Seq data from cells subjected to Ebola virus, the area under the ROC curve varied from 0.6 to 0.86 depending on the software used for assembly. Our classifier was able to properly classify the majority of the false hits generated by BLAST and HMMER3 searches on the same data. The outstanding performance metrics of our model lays the groundwork for robust machine learning methods for the automated annotation of sequence data.Author SummaryIn this age of high-throughput sequencing, proper classification of copious amounts of sequence data remains to be a daunting challenge. Presently, sequence alignment methods are immediately assigned to the task. Owing to the selection forces of nature, there is considerable homology even between the sequences of different species which draws ambiguity to the results of alignment-based searches. Machine Learning methods are becoming more reliable for characterizing sequence data, but virus genomes are more variable than all forms of life and viruses with RNA-based genomes have gone overlooked in previous machine learning attempts. We designed a novel short k-mer based scoring criteria whereby a large number of highly robust numerical feature sets can be derived from sequence data. These features were able to accurately distinguish virus RNA from human transcripts with performance scores better than all previous reports. Our models were able to generalize well to distant species of viruses and mouse transcripts. The model correctly classifies the majority of false hits generated by current standard alignment tools. These findings strongly imply that this k-mer score based computational pipeline forges a highly informative, rich set of numerical machine learning features and similar pipelines can greatly advance the field of computational biology.


Sign in / Sign up

Export Citation Format

Share Document