scholarly journals Computational Methods for the Discovery of Metabolic Markers of Complex Traits

Metabolites ◽  
2019 ◽  
Vol 9 (4) ◽  
pp. 66 ◽  
Author(s):  
Michael Lee ◽  
Ting Hu

Metabolomics uses quantitative analyses of metabolites from tissues or bodily fluids to acquire a functional readout of the physiological state. Complex diseases arise from the influence of multiple factors, such as genetics, environment and lifestyle. Since genes, RNAs and proteins converge onto the terminal downstream metabolome, metabolomics datasets offer a rich source of information in a complex and convoluted presentation. Thus, powerful computational methods capable of deciphering the effects of many upstream influences have become increasingly necessary. In this review, the workflow of metabolic marker discovery is outlined from metabolite extraction to model interpretation and validation. Additionally, current metabolomics research in various complex disease areas is examined to identify gaps and trends in the use of several statistical and computational algorithms. Then, we highlight and discuss three advanced machine-learning algorithms, specifically ensemble learning, artificial neural networks, and genetic programming, that are currently less visible, but are budding with high potential for utility in metabolomics research. With an upward trend in the use of highly-accurate, multivariate models in the metabolomics literature, diagnostic biomarker panels of complex diseases are more recently achieving accuracies approaching or exceeding traditional diagnostic procedures. This review aims to provide an overview of computational methods in metabolomics and promote the use of up-to-date machine-learning and computational methods by metabolomics researchers.

2019 ◽  
Vol 20 (3) ◽  
pp. 177-184 ◽  
Author(s):  
Nantao Zheng ◽  
Kairou Wang ◽  
Weihua Zhan ◽  
Lei Deng

Background:Targeting critical viral-host Protein-Protein Interactions (PPIs) has enormous application prospects for therapeutics. Using experimental methods to evaluate all possible virus-host PPIs is labor-intensive and time-consuming. Recent growth in computational identification of virus-host PPIs provides new opportunities for gaining biological insights, including applications in disease control. We provide an overview of recent computational approaches for studying virus-host PPI interactions.Methods:In this review, a variety of computational methods for virus-host PPIs prediction have been surveyed. These methods are categorized based on the features they utilize and different machine learning algorithms including classical and novel methods.Results:We describe the pivotal and representative features extracted from relevant sources of biological data, mainly include sequence signatures, known domain interactions, protein motifs and protein structure information. We focus on state-of-the-art machine learning algorithms that are used to build binary prediction models for the classification of virus-host protein pairs and discuss their abilities, weakness and future directions.Conclusion:The findings of this review confirm the importance of computational methods for finding the potential protein-protein interactions between virus and host. Although there has been significant progress in the prediction of virus-host PPIs in recent years, there is a lot of room for improvement in virus-host PPI prediction.


2019 ◽  
Author(s):  
Kushal K. Dey ◽  
Bryce Van de Geijn ◽  
Samuel Sungil Kim ◽  
Farhad Hormozdiari ◽  
David R. Kelley ◽  
...  

AbstractDeep learning models have shown great promise in predicting genome-wide regulatory effects from DNA sequence, but their informativeness for human complex diseases and traits is not fully understood. Here, we evaluate the disease informativeness of allelic-effect annotations (absolute value of the predicted difference between reference and variant alleles) constructed using two previously trained deep learning models, DeepSEA and Basenji. We apply stratified LD score regression (S-LDSC) to 41 independent diseases and complex traits (average N=320K) to evaluate each annotation’s informativeness for disease heritability conditional on a broad set of coding, conserved, regulatory and LD-related annotations from the baseline-LD model and other sources; as a secondary metric, we also evaluate the accuracy of models that incorporate deep learning annotations in predicting disease-associated or fine-mapped SNPs. We aggregated annotations across all tissues (resp. blood cell types or brain tissues) in meta-analyses across all 41 traits (resp. 11 blood-related traits or 8 brain-related traits). These allelic-effect annotations were highly enriched for disease heritability, but produced only limited conditionally significant results – only Basenji-H3K4me3 in meta-analyses across all 41 traits and brain-specific Basenji-H3K4me3 in meta-analyses across 8 brain-related traits. We conclude that deep learning models are yet to achieve their full potential to provide considerable amount of unique information for complex disease, and that the informativeness of deep learning models for disease beyond established functional annotations cannot be inferred from metrics based on their accuracy in predicting regulatory annotations.


2019 ◽  
Vol 20 (3) ◽  
pp. 209-216 ◽  
Author(s):  
Yang Hu ◽  
Tianyi Zhao ◽  
Ningyi Zhang ◽  
Ying Zhang ◽  
Liang Cheng

Background:From a therapeutic viewpoint, understanding how drugs bind and regulate the functions of their target proteins to protect against disease is crucial. The identification of drug targets plays a significant role in drug discovery and studying the mechanisms of diseases. Therefore the development of methods to identify drug targets has become a popular issue.Methods:We systematically review the recent work on identifying drug targets from the view of data and method. We compiled several databases that collect data more comprehensively and introduced several commonly used databases. Then divided the methods into two categories: biological experiments and machine learning, each of which is subdivided into different subclasses and described in detail.Results:Machine learning algorithms are the majority of new methods. Generally, an optimal set of features is chosen to predict successful new drug targets with similar properties. The most widely used features include sequence properties, network topological features, structural properties, and subcellular locations. Since various machine learning methods exist, improving their performance requires combining a better subset of features and choosing the appropriate model for the various datasets involved.Conclusion:The application of experimental and computational methods in protein drug target identification has become increasingly popular in recent years. Current biological and computational methods still have many limitations due to unbalanced and incomplete datasets or imperfect feature selection methods


Author(s):  
Dr. K. Suresh

The current way of checking answer scripts is hectic for the college. They need to manually check the answers and allocate the marks to the students. Our proposed system uses Machine Learning and Natural Language Processing techniques to beat this. Machine learning algorithms use computational methods to find out directly from data without hopping on predetermined rules. NLP algorithms identify specific entities within the text, explore for key elements during a document, run a contextual search for synonyms and detect misspelled words or similar entries, and more. Our algorithm performs similarity checking and also the number of words associated with the question exactly matched between two documents. It also checks whether the grammar is correctly used or not within the student's answer. Our proposed system performs text extraction and evaluation of marks by applying Machine Learning and Natural Language Processing techniques.


Author(s):  
Vanessa Aguiar ◽  
Jose A. Seoane ◽  
Ana Freire ◽  
Ling Guo

A new algorithm is presented for finding genotype-phenotype association rules from data related to complex diseases. The algorithm was based on genetic algorithms, a technique of evolutionary computation. The algorithm was compared to several traditional data mining techniques and it was proved that it obtained better classification scores and found more rules from the data generated artificially. It also obtained similar results when using some UCI Machine Learning datasets. In this chapter it is assumed that several groups of Single Nucleotide Polymorphisms (SNPs) have an impact on the predisposition to develop a complex disease like schizophrenia. It is expected to validate this in a short period of time on real data.


2020 ◽  
Vol 39 (4) ◽  
pp. 5687-5698
Author(s):  
Chunfeng Guo

There are currently few studies on the stress of athletes, so it is impossible to provide effective stadium guidance for athletes. Based on this, this study combines machine learning algorithms to identify athletes’ pre-game emotions. At the same time, this study obtains the data related to the research through the survey access form and obtains the physiological parameters of the athletes under stress in the experimental way and processes the physiological parameters of the athletes with the machine learning algorithm. In order to improve the efficiency of data processing, this study improves the traditional machine learning algorithm, and combines the particle optimization algorithm with the support vector machine to realize the effective recognition of the athlete’s physiological state. In addition, through the experimental method combined with the contrast method, this paper compares the performance of the improved algorithm with the traditional algorithm and combines the data analysis to analyze the test results. Finally, this study analyzes the effectiveness of the proposed algorithm by example analysis. The research shows that the proposed algorithm has better performance than the traditional algorithm and has certain practical significance and can provide theoretical reference for subsequent related research.


1995 ◽  
Vol 9 (3) ◽  
pp. 169-174
Author(s):  
Craig H Warden ◽  
Jerome I Rotter

Identification of genes underlying complex traits has been difficult, but combined application of novel methods and mouse models provides new hope. Rare monogenic syndromes, and candidate gene and biochemical approaches are sometimes useful, but each of these approaches also has limitations. Some problems that prevent identification and isolation of genes underlying complex disease can be avoided by the use of whole genome mapping of mouse crosses or of human families. Mice have many advantages for the study of complex disease, including an extensive genetic map. A generic method has recently been developed and applied for detection of quantitative trait loci (QTLs) using whole genome maps of mouse crosses. Availability of more than 200 congenic strains provides another incentive for studies in mice. Congenic strains provide a rich, but previously unexploited, resource for the rapid identification of genes causing complex diseases. A congenic mouse strain is genetically identical to a background strain, except for a small chromosomal region derived from a donor strain. Thus, comparison of a phenotype in a congenic strain with the phenotype in its background strain allows study of the effects of single genes derived from the donor strain, isolated from the effects of other donor strain genes. Application of all or several techniques to complex disease studies in mice and in humans may lead to the identification and understanding of complex diseases whose etiology is currently unknown.


Heart ◽  
2018 ◽  
Vol 104 (14) ◽  
pp. 1156-1164 ◽  
Author(s):  
Khader Shameer ◽  
Kipp W Johnson ◽  
Benjamin S Glicksberg ◽  
Joel T Dudley ◽  
Partho P Sengupta

Artificial intelligence (AI) broadly refers to analytical algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. These include a family of operations encompassing several terms like machine learning, cognitive learning, deep learning and reinforcement learning-based methods that can be used to integrate and interpret complex biomedical and healthcare data in scenarios where traditional statistical methods may not be able to perform. In this review article, we discuss the basics of machine learning algorithms and what potential data sources exist; evaluate the need for machine learning; and examine the potential limitations and challenges of implementing machine in the context of cardiovascular medicine. The most promising avenues for AI in medicine are the development of automated risk prediction algorithms which can be used to guide clinical care; use of unsupervised learning techniques to more precisely phenotype complex disease; and the implementation of reinforcement learning algorithms to intelligently augment healthcare providers. The utility of a machine learning-based predictive model will depend on factors including data heterogeneity, data depth, data breadth, nature of modelling task, choice of machine learning and feature selection algorithms, and orthogonal evidence. A critical understanding of the strength and limitations of various methods and tasks amenable to machine learning is vital. By leveraging the growing corpus of big data in medicine, we detail pathways by which machine learning may facilitate optimal development of patient-specific models for improving diagnoses, intervention and outcome in cardiovascular medicine.


2019 ◽  
Author(s):  
Ameya C. Nanivadekar ◽  
Derek M. Miller ◽  
Stephanie Fulton ◽  
Liane Wong ◽  
John Ogren ◽  
...  

AbstractAlthough electrogastrography (EGG) could be a critical tool in the diagnosis and treatment of patients with gastrointestinal (GI) disease, it remains under-utilized. The lack of spatial and temporal resolution using current EGG methods presents a significant roadblock to more widespread usage. Human and preclinical studies have shown that GI myoelectric electrodes can record signals containing significantly more information than can be derived abdominal surface electrodes. The current study sought to assess the efficacy of multi-electrode arrays, surgically implanted on the serosal surface of the GI tract, from gastric fundus to duodenum, in recording myoelectric signals. It also examines the potential for machine learning algorithms to predict functional states, such as retching and emesis, from GI signal features. Studies were performed using ferrets, a gold standard model for emesis testing. Our results include simultaneous recordings from up to six GI recording sites in both anesthetized and chronically implanted free-moving ferrets. Testing conditions to produce different gastric states included gastric distension, intragastric infusion of emetine (a prototypical emetic agent), and feeding. Despite the observed variability in GI signals, machine learning algorithms, including k nearest neighbors and support vector machines, were able to detect the state of the stomach with high overall accuracy (>80%). The present study is the first demonstration of machine learning algorithms to detect the physiological state of the stomach and onset of retching and could provide insight into methodologies to treat GI diseases and control symptoms such as nausea and vomiting.


Diagnostics ◽  
2020 ◽  
Vol 10 (11) ◽  
pp. 958
Author(s):  
Alex Novaes Santana ◽  
Charles Novaes de Santana ◽  
Pedro Montoya

In the last decade, machine learning has been widely used in different fields, especially because of its capacity to work with complex data. With the support of machine learning techniques, different studies have been using data-driven approaches to better understand some syndromes like mild cognitive impairment, Alzheimer’s disease, schizophrenia, and chronic pain. Chronic pain is a complex disease that can recurrently be misdiagnosed due to its comorbidities with other syndromes with which it shares symptoms. Within that context, several studies have been suggesting different machine learning algorithms to classify or predict chronic pain conditions. Those algorithms were fed with a diversity of data types, from self-report data based on questionnaires to the most advanced brain imaging techniques. In this study, we assessed the sensitivity of different algorithms and datasets classifying chronic pain syndromes. Together with this assessment, we highlighted important methodological steps that should be taken into account when an experiment using machine learning is conducted. The best results were obtained by ensemble-based algorithms and the dataset containing the greatest diversity of information, resulting in area under the receiver operating curve (AUC) values of around 0.85. In addition, the performance of the algorithms is strongly related to the hyper-parameters. Thus, a good strategy for hyper-parameter optimization should be used to extract the most from the algorithm. These findings support the notion that machine learning can be a powerful tool to better understand chronic pain conditions.


Sign in / Sign up

Export Citation Format

Share Document