Machine learning approach to gene essentiality prediction: a review

Author(s):  
Olufemi Aromolaran ◽  
Damilare Aromolaran ◽  
Itunuoluwa Isewon ◽  
Jelili Oyelade

Abstract   Essential genes are critical for the growth and survival of any organism. The machine learning approach complements the experimental methods to minimize the resources required for essentiality assays. Previous studies revealed the need to discover relevant features that significantly classify essential genes, improve on the generalizability of prediction models across organisms, and construct a robust gold standard as the class label for the train data to enhance prediction. Findings also show that a significant limitation of the machine learning approach is predicting conditionally essential genes. The essentiality status of a gene can change due to a specific condition of the organism. This review examines various methods applied to essential gene prediction task, their strengths, limitations and the factors responsible for effective computational prediction of essential genes. We discussed categories of features and how they contribute to the classification performance of essentiality prediction models. Five categories of features, namely, gene sequence, protein sequence, network topology, homology and gene ontology-based features, were generated for Caenorhabditis elegans to perform a comparative analysis of their essentiality prediction capacity. Gene ontology-based feature category outperformed other categories of features majorly due to its high correlation with the genes’ biological functions. However, the topology feature category provided the highest discriminatory power making it more suitable for essentiality prediction. The major limiting factor of machine learning to predict essential genes conditionality is the unavailability of labeled data for interest conditions that can train a classifier. Therefore, cooperative machine learning could further exploit models that can perform well in conditional essentiality predictions. Short abstract Identification of essential genes is imperative because it provides an understanding of the core structure and function, accelerating drug targets’ discovery, among other functions. Recent studies have applied machine learning to complement the experimental identification of essential genes. However, several factors are limiting the performance of machine learning approaches. This review aims to present the standard procedure and resources available for predicting essential genes in organisms, and also highlight the factors responsible for the current limitation in using machine learning for conditional gene essentiality prediction. The choice of features and ML technique was identified as an important factor to predict essential genes effectively.

2019 ◽  
Author(s):  
Oskar Flygare ◽  
Jesper Enander ◽  
Erik Andersson ◽  
Brjánn Ljótsson ◽  
Volen Z Ivanov ◽  
...  

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.


2019 ◽  
Author(s):  
Anton Levitan ◽  
Andrew N. Gale ◽  
Emma K. Dallon ◽  
Darby W. Kozan ◽  
Kyle W. Cunningham ◽  
...  

ABSTRACTIn vivo transposon mutagenesis, coupled with deep sequencing, enables large-scale genome-wide mutant screens for genes essential in different growth conditions. We analyzed six large-scale studies performed on haploid strains of three yeast species (Saccharomyces cerevisiae, Schizosaccaromyces pombe, and Candida albicans), each mutagenized with two of three different heterologous transposons (AcDs, Hermes, and PiggyBac). Using a machine-learning approach, we evaluated the ability of the data to predict gene essentiality. Important data features included sufficient numbers and distribution of independent insertion events. All transposons showed some bias in insertion site preference because of jackpot events, and preferences for specific insertion sequences and short-distance vs long-distance insertions. For PiggyBac, a stringent target sequence limited the ability to predict essentiality in genes with few or no target sequences. The machine learning approach also robustly predicted gene function in less well-studied species by leveraging cross-species orthologs. Finally, comparisons of isogenic diploid versus haploid S. cerevisiae isolates identified several genes that are haplo-insufficient, while most essential genes, as expected, were recessive. We provide recommendations for the choice of transposons and the inference of gene essentiality in genome-wide studies of eukaryotic haploid microbes such as yeasts, including species that have been less amenable to classical genetic studies.


2020 ◽  
Author(s):  
Amir Mosavi

Several epidemiological models are being used around the world to project the number of infected individuals and the mortality rates of the COVID-19 outbreak. Advancing accurate prediction models is of utmost importance to take proper actions. Due to a high level of uncertainty or even lack of essential data, the standard epidemiological models have been challenged regarding the delivery of higher accuracy for long-term prediction. As an alternative to the susceptible-infected-resistant (SIR)-based models, this study proposes a hybrid machine learning approach to predict the COVID-19 and we exemplify its potential using data from Hungary. The hybrid machine learning methods of adaptive network-based fuzzy inference system (ANFIS) and multi-layered perceptron-imperialist competitive algorithm (MLP-ICA) are used to predict time series of infected individuals and mortality rate. The models predict that by late May, the outbreak and the total morality will drop substantially. The validation is performed for nine days with promising results, which confirms the model accuracy. It is expected that the model maintains its accuracy as long as no significant interruption occurs. Based on the results reported here, and due to the complex nature of the COVID-19 outbreak and variation in its behavior from nation-to-nation, this study suggests machine learning as an effective tool to model the outbreak. This paper provides an initial benchmarking to demonstrate the potential of machine learning for future research.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Moon-Jong Kim ◽  
Pil-Jong Kim ◽  
Hong-Gee Kim ◽  
Hong-Seop Kho

AbstractThe purpose of this study is to apply a machine learning approach to predict whether patients with burning mouth syndrome (BMS) respond to the initial approach and clonazepam therapy based on clinical data. Among the patients with the primary type of BMS who visited the clinic from 2006 to 2015, those treated with the initial approach of detailed explanation regarding home care instruction and use of oral topical lubricants, or who were prescribed clonazepam for a minimum of 1 month were included in this study. The clinical data and treatment outcomes were collected from medical records. Extreme Gradient-Boosted Decision Trees was used for machine learning algorithms to construct prediction models. Accuracy of the prediction models was evaluated and feature importance calculated. The accuracy of the prediction models for the initial approach and clonazepam therapy was 67.6% and 67.4%, respectively. Aggravating factors and psychological distress were important features in the prediction model for the initial approach, and intensity of symptoms before administration was the important feature in the prediction model for clonazepam therapy. In conclusion, the analysis of treatment outcomes in patients with BMS using a machine learning approach showed meaningful results of clinical applicability.


Author(s):  
Gergo Pinter ◽  
Imre Felde ◽  
Amir Mosavi ◽  
Pedram Ghamisi ◽  
Richard Gloaguen

Several epidemiological models are being used around the world to project the number of infected individuals and the mortality rates of the COVID-19 outbreak. Advancing accurate prediction models is of utmost importance to take proper actions. Due to a high level of uncertainty or even lack of essential data, the standard epidemiological models have been challenged regarding the delivery of higher accuracy for long-term prediction. As an alternative to the susceptible-infected-resistant (SIR)-based models, this study proposes a hybrid machine learning approach to predict the COVID-19 and we exemplify its potential using data from Hungary. The hybrid machine learning methods of adaptive network-based fuzzy inference system (ANFIS) and multi-layered perceptron-imperialist competitive algorithm (MLP-ICA) are used to predict time series of infected individuals and mortality rate. The models predict that by late May, the outbreak and the total morality will drop substantially. The validation is performed for nine days with promising results, which confirms the model accuracy. It is expected that the model maintains its accuracy as long as no significant interruption occurs. Based on the results reported here, and due to the complex nature of the COVID-19 outbreak and variation in its behavior from nation-to-nation, this study suggests machine learning as an effective tool to model the outbreak. This paper provides an initial benchmarking to demonstrate the potential of machine learning for future research.


Author(s):  
Yazeed Zoabi ◽  
Noam Shomron

AbstractEffective screening of SARS-CoV-2 enables quick and efficient diagnosis of COVID-19 and can mitigate the burden on healthcare systems. Prediction models that combine several features to estimate the risk of infection have been developed in hopes of assisting medical staff worldwide in triaging patients when allocating limited healthcare resources. We established a machine learning approach that trained on records from 51,831 tested individuals (of whom 4,769 were confirmed COVID-19 cases) while the test set contained data from the following week (47,401 tested individuals of whom 3,624 were confirmed COVID-19 cases). Our model predicts COVID-19 test results with high accuracy using only 8 features: gender, whether the age is above 60, information about close contact with an infected individual, and 5 initial clinical symptoms. Overall, using nationwide data representing the general population, we developed a model that enables screening suspected COVID-19 patients according to simple features accessed by asking them basic questions. Our model can be used, among other considerations, to prioritize testing for COVID-19 when allocating limited testing resources.


2017 ◽  
Author(s):  
Coryandar M. Gilvary ◽  
Neel S. Madhukar ◽  
Kaitlyn M. Gayvert ◽  
David S. Rickman ◽  
Olivier Elemento

Sign in / Sign up

Export Citation Format

Share Document