The power of next-generation sequencing and machine learning for causal gene finding and prediction of phenotypes.

2021 ◽  
pp. 401-410
Author(s):  
Anna S. Sowa ◽  
Lisa Dussling ◽  
Jörg Hagmann ◽  
Sebastian J. Schultheiss

Abstract The wide application of next-generation sequencing (NGS) has facilitated and accelerated causal gene finding and breeding in the field of plant sciences. A wide variety of techniques and computational strategies is available that needs to be appropriately tailored to the species, genetic architecture of the trait of interest, breeding system and available resources. Utilizing these NGS methods, the typical computational steps of marker discovery, genetic mapping and identification of causal mutations can be achieved in a single step in a cost- and time-efficient manner. Rather than focusing on a few high-impact genetic variants that explain phenotypes, increased computational power allows modelling of phenotypes based on genome-wide molecular markers, known as genomic selection (GS). Solely based on this genotype information, modern GS approaches can accurately predict breeding values for a given trait (the average effects of alleles over all loci that are anticipated to be transferred from the parent to the progeny) based on a large training population of genotyped and phenotyped individuals (Crossa et al., 2017). Once trained, the model offers great reductions in breeding speed and costs. We advocate for improving conventional GS methods by applying advanced techniques based on machine learning (ML) and outline how this approach can also be used for causal gene finding. Subsequent to genetic causes of agronomically important traits, epigenetic mechanisms such as DNA methylation play a crucial role in shaping phenotypes and can become interesting targets in breeding pipelines. We highlight an ML approach shown to detect functional methylation changes sensitively from NGS data. We give an overview about commonly applied strategies and provide practical considerations in choosing and performing NGS-based gene finding and NGS-assisted breeding.

Molecules ◽  
2018 ◽  
Vol 23 (2) ◽  
pp. 399 ◽  
Author(s):  
Sima Taheri ◽  
Thohirah Lee Abdullah ◽  
Mohd Yusop ◽  
Mohamed Hanafi ◽  
Mahbod Sahebi ◽  
...  

F1000Research ◽  
2015 ◽  
Vol 4 ◽  
pp. 50 ◽  
Author(s):  
Michael T. Wolfinger ◽  
Jörg Fallmann ◽  
Florian Eggenhofer ◽  
Fabian Amman

Recent achievements in next-generation sequencing (NGS) technologies lead to a high demand for reuseable software components to easily compile customized analysis workflows for big genomics data. We present ViennaNGS, an integrated collection of Perl modules focused on building efficient pipelines for NGS data processing. It comes with functionality for extracting and converting features from common NGS file formats, computation and evaluation of read mapping statistics, as well as normalization of RNA abundance. Moreover, ViennaNGS provides software components for identification and characterization of splice junctions from RNA-seq data, parsing and condensing sequence motif data, automated construction of Assembly and Track Hubs for the UCSC genome browser, as well as wrapper routines for a set of commonly used NGS command line tools.


2019 ◽  
Vol 24 (2) ◽  
Author(s):  
Anja Berger ◽  
Alexandra Dangel ◽  
Tilmann Schober ◽  
Birgit Schmidbauer ◽  
Regina Konrad ◽  
...  

In September 2018, a child who had returned from Somalia to Germany presented with cutaneous diphtheria by toxigenic Corynebacterium diphtheriae biovar mitis. The child’s sibling had superinfected insect bites harbouring also toxigenic C. diphtheriae. Next generation sequencing (NGS) revealed the same strain in both patients suggesting very recent human-to-human transmission. Epidemiological and NGS data suggest that the two cutaneous diphtheria cases constitute the first outbreak by toxigenic C. diphtheriae in Germany since the 1980s.


F1000Research ◽  
2015 ◽  
Vol 4 ◽  
pp. 50 ◽  
Author(s):  
Michael T. Wolfinger ◽  
Jörg Fallmann ◽  
Florian Eggenhofer ◽  
Fabian Amman

Recent achievements in next-generation sequencing (NGS) technologies lead to a high demand for reuseable software components to easily compile customized analysis workflows for big genomics data. We present ViennaNGS, an integrated collection of Perl modules focused on building efficient pipelines for NGS data processing. It comes with functionality for extracting and converting features from common NGS file formats, computation and evaluation of read mapping statistics, as well as normalization of RNA abundance. Moreover, ViennaNGS provides software components for identification and characterization of splice junctions from RNA-seq data, parsing and condensing sequence motif data, automated construction of Assembly and Track Hubs for the UCSC genome browser, as well as wrapper routines for a set of commonly used NGS command line tools.


2020 ◽  
Vol 20 (22) ◽  
pp. 1968-1980
Author(s):  
Nidhi Shukla ◽  
Narmadhaa Siva ◽  
Babita Malik ◽  
Prashanth Suravajhala

In the recent past, next-generation sequencing (NGS) approaches have heralded the omics era. With NGS data burgeoning, there arose a need to disseminate the omic data better. Proteogenomics has been vividly used for characterising the functions of candidate genes and is applied in ascertaining various diseased phenotypes, including cancers. However, not much is known about the role and application of proteogenomics, especially Prostate Cancer (PCa). In this review, we outline the need for proteogenomic approaches, their applications and their role in PCa.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e23528-e23528
Author(s):  
Gang Zhao ◽  
Lu Xie ◽  
Wei Guo ◽  
Yanfeng Xi ◽  
Yanzhi Cui ◽  
...  

e23528 Background: The rarity and heterogeneity of sarcoma has been complicating the diagnosis of sarcoma for years. Even expert pathologists of sarcoma could make mistakes in the diagnosis of this disease. The availability of Next Generation Sequencing (NGS) data enabled more accurate diagnosis of sarcoma. In this study, we systematically described the application of NGS on the diagnosis of sarcoma and the contribution of NGS to the diagnostic accuracy of sarcoma. Methods: A multi-center, retrospective study included 235 sarcoma patients’ tumor and paired normal samples that were sent from 56 hospitals to a College of American Pathologists (CAP) accredited and Clinical Laboratory Improvement Amendments (CLIA) certified laboratory, at Shanghai, China for Next Generation Sequencing (NGS) was performed. Using next generation sequencing based YS panel consisting 450 genes, these 235 sarcoma patients’ sample were sequenced and the NGS data was analyzed. The initial diagnosis without NGS information was reconsidered by expert pathologists. Results: Taking into consideration both the initial diagnosis and the NGS results, the final diagnosis of these 235 sarcoma cases included 8 low grade malignant fibromyxoid tumors, 11 dermatofibrosarcoma protuberans (DFSP), 38 myxoliposarcomas, 22 alveolar rhabdomyosarcomas, 11 alveolar soft tissue sarcoma, 2 desmoplastic small round cell tumors, 37 NTRK rearrangement spindle cell tumors, 40 Ewing’s sarcoma and 66 synoviosarcomas. In total, 29% initial diagnoses were changed according to NGS identified fusions, including 13% low grade malignant fibromyxoid tumors (1 FUS- CREB3L2 fusion), 27% DFSPs (3 COL1A1- PDGFB fusions), 11% myxoliposarcomas (3 FUS- DDIT3 fusions and 1 EWSR1- DDIT3 fusion), 14% alveolar rhabdomyosarcomas (2 PAX7- FOXO1 fusions and 1 FOXO1- LINC00598 fusion), 18% alveolar soft tissue sarcomas (2 ASPSCR1- TFE3 fusions), 50% desmoplastic small round cell tumor (1 EWSR1- WT1 fusion), 95% NTRK rearrangement spindle cell tumors, 13% Ewing’s sarcomas (3 EWSR1- FLI1 fusions and 2 EWSR1- ERG fusions) and 21% synoviosarcomas (9 SS18- SSX1 fusions and 5 SS18- SSX2 fusions). Conclusions: NGS would be highly recommended for accurate diagnosis of sarcoma, especially for NTRK rearrangement spindle cell tumor, the majority of which were confirmed according to NGS identified fusions.


Sign in / Sign up

Export Citation Format

Share Document