scholarly journals Making Sense of Genetic Information: The Promising Evolution of Clinical Stratification and Precision Oncology Using Machine Learning

Genes ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 722
Author(s):  
Mahaly Baptiste ◽  
Sarah Shireen Moinuddeen ◽  
Courtney Lace Soliz ◽  
Hashimul Ehsan ◽  
Gen Kaneko

Precision medicine is a medical approach to administer patients with a tailored dose of treatment by taking into consideration a person’s variability in genes, environment, and lifestyles. The accumulation of omics big sequence data led to the development of various genetic databases on which clinical stratification of high-risk populations may be conducted. In addition, because cancers are generally caused by tumor-specific mutations, large-scale systematic identification of single nucleotide polymorphisms (SNPs) in various tumors has propelled significant progress of tailored treatments of tumors (i.e., precision oncology). Machine learning (ML), a subfield of artificial intelligence in which computers learn through experience, has a great potential to be used in precision oncology chiefly to help physicians make diagnostic decisions based on tumor images. A promising venue of ML in precision oncology is the integration of all available data from images to multi-omics big data for the holistic care of patients and high-risk healthy subjects. In this review, we provide a focused overview of precision oncology and ML with attention to breast cancer and glioma as well as the Bayesian networks that have the flexibility and the ability to work with incomplete information. We also introduce some state-of-the-art attempts to use and incorporate ML and genetic information in precision oncology.

2020 ◽  
Vol 79 (2) ◽  
pp. 105-113
Author(s):  
Abdul Bari Muneera Parveen ◽  
Divya Lakshmanan ◽  
Modhumita Ghosh Dasgupta

The advent of next-generation sequencing has facilitated large-scale discovery and mapping of genomic variants for high-throughput genotyping. Several research groups working in tree species are presently employing next generation sequencing (NGS) platforms for marker discovery, since it is a cost effective and time saving strategy. However, most trees lack a chromosome level genome map and validation of variants for downstream application becomes obligatory. The cost associated with identifying potential variants from the enormous amount of sequence data is a major limitation. In the present study, high resolution melting (HRM) analysis was optimized for rapid validation of single nucleotide polymorphisms (SNPs), insertions or deletions (InDels) and simple sequence repeats (SSRs) predicted from exome sequencing of parents and hybrids of Eucalyptus tereticornis Sm. ? Eucalyptus grandis Hill ex Maiden generated from controlled hybridization. The cost per data point was less than 0.5 USD, providing great flexibility in terms of cost and sensitivity, when compared to other validation methods. The sensitivity of this technology in variant detection can be extended to other applications including Bar-HRM for species authentication and TILLING for detection of mutants.


2020 ◽  
Author(s):  
Wail Ba-Alawi ◽  
Sisira Kadambat Nair ◽  
Bo Li ◽  
Anthony Mammoliti ◽  
Petr Smirnov ◽  
...  

ABSTRACTIdentifying biomarkers predictive of cancer cells’ response to drug treatment constitutes one of the main challenges in precision oncology. Recent large-scale cancer pharmacogenomic studies have boosted the research for finding predictive biomarkers by profiling thousands of human cancer cell lines at the molecular level and screening them with hundreds of approved drugs and experimental chemical compounds. Many studies have leveraged these data to build predictive models of response using various statistical and machine learning methods. However, a common challenge in these methods is the lack of interpretability as to how they make the predictions and which features were the most associated with response, hindering the clinical translation of these models. To alleviate this issue, we develop a new machine learning pipeline based on the recent LOBICO approach that explores the space of bimodally expressed genes in multiple large in vitro pharmacogenomic studies and builds multivariate, nonlinear, yet interpretable logic-based models predictive of drug response. Using our method, we used a compendium of three of the largest pharmacogenomic data sets to build robust and interpretable models for 101 drugs that span 17 drug classes with high validation rate in independent datasets.


2008 ◽  
Vol 16 (2-3) ◽  
pp. 255-270 ◽  
Author(s):  
Michael Ott ◽  
Jaroslaw Zola ◽  
Srinivas Aluru ◽  
Andrew D. Johnson ◽  
Daniel Janies ◽  
...  

Phylogenetic inference is considered a grand challenge in Bioinformatics due to its immense computational requirements. The increasing popularity and availability of large multi-gene alignments as well as comprehensive datasets of single nucleotide polymorphisms (SNPs) in current biological studies, coupled with rapid accumulation of sequence data in general, pose new challenges for high performance computing. By example of RAxML, which is currently among the fastest and most accurate programs for phylogenetic inference under the Maximum Likelihood (ML) criterion, we demonstrate how the phylogenetic ML function can be efficiently scaled to current supercomputer architectures like the IBM BlueGene/L (BG/L) and SGI Altix. This is achieved by simultaneous exploitation of coarse- and fine-grained parallelism which is inherent to every ML-based biological analysis. Performance is assessed using datasets consisting of 270 sequences and 566,470 base pairs (haplotype map dataset), and 2,182 sequences and 51,089 base pairs, respectively. To the best of our knowledge, these are the largest datasets analyzed under ML to date. Experimental results indicate that the fine-grained parallelization scales well up to 1,024 processors. Moreover, a larger number of processors can be efficiently exploited by a combination of coarse- and fine-grained parallelism. We also demonstrate that our parallelization scales equally well on an AMD Opteron cluster with a less favorable network latency to processor speed ratio. Finally, we underline the practical relevance of our approach by including a biological discussion of the results from the haplotype map dataset analysis, which revealed novel biological insights via phylogenetic inference.


2020 ◽  
Author(s):  
George Hindy ◽  
Peter Dornbos ◽  
Mark D. Chaffin ◽  
Dajiang J. Liu ◽  
Minxian Wang ◽  
...  

SummaryLarge-scale gene sequencing studies for complex traits have the potential to identify causal genes with therapeutic implications. We performed gene-based association testing of blood lipid levels with rare (minor allele frequency<1%) predicted damaging coding variation using sequence data from >170,000 individuals from multiple ancestries: 97,493 European, 30,025 South Asian, 16,507 African, 16,440 Hispanic/Latino, 10,420 East Asian, and 1,182 Samoan. We identified 35 genes associated with circulating lipid levels. Ten of these: ALB, SRSF2, JAK2, CREB3L3, TMEM136, VARS, NR1H3, PLA2G12A, PPARG and STAB1 have not been implicated for lipid levels using rare coding variation in population-based samples. We prioritize 32 genes identified in array-based genome-wide association study (GWAS) loci based on gene-based associations, of which three: EVI5, SH2B3, and PLIN1, had no prior evidence of rare coding variant associations. Most of the associated genes showed evidence of association in multiple ancestries. Also, we observed an enrichment of gene-based associations for low-density lipoprotein cholesterol drug target genes, and for genes closest to GWAS index single nucleotide polymorphisms (SNP). Our results demonstrate that gene-based associations can be beneficial for drug target development and provide evidence that the gene closest to the array-based GWAS index SNP is often the functional gene for blood lipid levels.


2008 ◽  
Vol 216 (3) ◽  
pp. 135-146 ◽  
Author(s):  
Samantha Johnson ◽  
Dieter Wolke ◽  
Neil Marlow

Routine neurodevelopmental follow-up is crucial in high-risk populations, such as those born very preterm. Even in the absence of severe neurosensory impairment, very preterm children are at risk for a range of long-term cognitive, motor, and learning deficits. Infant developmental assessments are typically carried out at 2 years of age for both clinical and research purposes, and they are crucial for outcome monitoring. We review psychometric tests of infant developmental functioning most widely used as outcome measures for very preterm infants and other high-risk populations. We also consider parent-based assessments and methodological issues pertaining to the use of these tools in large-scale research studies and in outcome monitoring in this population.


2019 ◽  
Vol 74 (Supplement_5) ◽  
pp. v39-v46 ◽  
Author(s):  
Suzanne Barror ◽  
Gordana Avramovic ◽  
Cristiana Oprea ◽  
Julian Surey ◽  
Alistair Story ◽  
...  

Abstract Objectives Hepatitis C is one of the main causes of chronic liver diseases worldwide. One of the major barriers to effecting EU- and WHO-mandated HCV elimination by 2030 is underdiagnosis. Community-based screening strategies have been identified as important components of HCV models of care. HepCheck Europe is a large-scale intensified screening initiative aimed at enhancing identification of HCV infection among vulnerable populations and linkage to care. Methods Research teams across four European countries were engaged in the study and rolled out screening to high-risk populations in community addiction, homeless and prison services. Screening was offered to 2822 individuals and included a self-administered questionnaire, HCV antibody and RNA testing, liver fibrosis assessment and referral to specialist services. Results There was a 74% (n=2079) uptake of screening. The majority (85.8%, n=1783) were male. In total 44.6% (n=927) of the sample reported ever injecting drugs, 38.4% (n=799) reported ever being homeless and 27.9% (n=581) were prisoners. In total 397 (19%) active HCV infections were identified and 136 (7% of total sample and 34% of identified active infections) were new cases. Of those identified with active HCV infection, 80% were linked to care, which included liver fibrosis assessment and referral to specialist services. Conclusions HepCheck’s screening and linkage to care is a clear strategy for reaching high-risk populations, including those at highest risk of transmission who are not accessing any type of care in the community. Elimination of HCV in the EU will only be achieved by such innovative, patient-centred approaches.


2007 ◽  
Vol 2007 ◽  
pp. 1-7 ◽  
Author(s):  
B. Jayashree ◽  
Manindra S. Hanspal ◽  
Rajgopal Srinivasan ◽  
R. Vigneshwaran ◽  
Rajeev K. Varshney ◽  
...  

The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs). In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level.


Nutrients ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 411 ◽  
Author(s):  
Hermann Brenner

The COVID-19 pandemic poses an unprecedented threat to human health, health care systems, public life, and economy around the globe. The repertoire of effective therapies for severe courses of the disease has remained limited. A large proportion of the world population suffers from vitamin D insufficiency or deficiency, with prevalence being particularly high among the COVID-19 high-risk populations. Vitamin D supplementation has been suggested as a potential option to prevent COVID-19 infections, severe courses, and deaths from the disease, but is not widely practiced. This article provides an up-to-date summary of recent epidemiological and intervention studies on a possible role of vitamin D supplementation for preventing severe COVID-19 cases and deaths. Despite limitations and remaining uncertainties, accumulating evidence strongly supports widespread vitamin D supplementation, in particular of high-risk populations, as well as high-dose supplementation of those infected. Given the dynamics of the COVID-19 pandemic, the benefit–risk ratio of such supplementation calls for immediate action even before results of ongoing large-scale randomized trials become available.


10.2196/20545 ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. e20545
Author(s):  
Paul J Barr ◽  
James Ryan ◽  
Nicholas C Jacobson

COVID-19 cases are exponentially increasing worldwide; however, its clinical phenotype remains unclear. Natural language processing (NLP) and machine learning approaches may yield key methods to rapidly identify individuals at a high risk of COVID-19 and to understand key symptoms upon clinical manifestation and presentation. Data on such symptoms may not be accurately synthesized into patient records owing to the pressing need to treat patients in overburdened health care settings. In this scenario, clinicians may focus on documenting widely reported symptoms that indicate a confirmed diagnosis of COVID-19, albeit at the expense of infrequently reported symptoms. While NLP solutions can play a key role in generating clinical phenotypes of COVID-19, they are limited by the resulting limitations in data from electronic health records (EHRs). A comprehensive record of clinic visits is required—audio recordings may be the answer. A recording of clinic visits represents a more comprehensive record of patient-reported symptoms. If done at scale, a combination of data from the EHR and recordings of clinic visits can be used to power NLP and machine learning models, thus rapidly generating a clinical phenotype of COVID-19. We propose the generation of a pipeline extending from audio or video recordings of clinic visits to establish a model that factors in clinical symptoms and predict COVID-19 incidence. With vast amounts of available data, we believe that a prediction model can be rapidly developed to promote the accurate screening of individuals at a high risk of COVID-19 and to identify patient characteristics that predict a greater risk of a more severe infection. If clinical encounters are recorded and our NLP model is adequately refined, benchtop virologic findings would be better informed. While clinic visit recordings are not the panacea for this pandemic, they are a low-cost option with many potential benefits, which have recently begun to be explored.


Sign in / Sign up

Export Citation Format

Share Document