scholarly journals Gene pathogenicity prediction of Mendelian diseases via the random forest algorithm

2019 ◽  
Vol 138 (6) ◽  
pp. 673-679 ◽  
Author(s):  
Sijie He ◽  
Weiwei Chen ◽  
Hankui Liu ◽  
Shengting Li ◽  
Dongzhu Lei ◽  
...  
2019 ◽  
Author(s):  
Sijie He ◽  
Weiwei Chen ◽  
Hankui Liu ◽  
Shengting Li ◽  
Dongzhu Lei ◽  
...  

AbstractThe study of Mendelian diseases and the identification of their causative genes are of great significance in the field of genetics. The evaluation of the pathogenicity of genes and the total number of Mendelian disease genes are both important questions worth studying. However, very few studies have addressed these issues to date, so we attempt to answer them in this study.We calculated gene pathogenicity prediction (GPP) score by a machine learning approach (random forest algorithm) to evaluate the pathogenicity of genes. When we applied the GPP score to the testing gene set, we obtained accuracy of 80%, recall of 93% and area under the curve (AUC) of 0.87. Our results estimated that a total of 10,399 protein-coding genes were Mendelian disease genes. Furthermore, we found the GPP score was positively correlated with the severity of disease.Our results indicate that GPP score may provide a robust and reliable guideline to predict the pathogenicity of protein-coding genes. To our knowledge, this is the first trial to estimate the total number of Mendelian disease genes.


Author(s):  
A.E. Semenov

The method of pedestrian navigation in the cities illustrated by the example of Saint-Petersburg was investigated. The factors influencing people when they choose a route for their walk were determined. Based on acquired factors corresponding data was collected and used to develop model determining attractiveness of a street in the city using Random Forest algorithm. The results obtained shows that routes provided by the method are 14% more attractive and just 6% longer compared with the shortest ones.


2020 ◽  
Vol 15 (S359) ◽  
pp. 40-41
Author(s):  
L. M. Izuti Nakazono ◽  
C. Mendes de Oliveira ◽  
N. S. T. Hirata ◽  
S. Jeram ◽  
A. Gonzalez ◽  
...  

AbstractWe present a machine learning methodology to separate quasars from galaxies and stars using data from S-PLUS in the Stripe-82 region. In terms of quasar classification, we achieved 95.49% for precision and 95.26% for recall using a Random Forest algorithm. For photometric redshift estimation, we obtained a precision of 6% using k-Nearest Neighbour.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sofia Kapsiani ◽  
Brendan J. Howlin

AbstractAgeing is a major risk factor for many conditions including cancer, cardiovascular and neurodegenerative diseases. Pharmaceutical interventions that slow down ageing and delay the onset of age-related diseases are a growing research area. The aim of this study was to build a machine learning model based on the data of the DrugAge database to predict whether a chemical compound will extend the lifespan of Caenorhabditis elegans. Five predictive models were built using the random forest algorithm with molecular fingerprints and/or molecular descriptors as features. The best performing classifier, built using molecular descriptors, achieved an area under the curve score (AUC) of 0.815 for classifying the compounds in the test set. The features of the model were ranked using the Gini importance measure of the random forest algorithm. The top 30 features included descriptors related to atom and bond counts, topological and partial charge properties. The model was applied to predict the class of compounds in an external database, consisting of 1738 small-molecules. The chemical compounds of the screening database with a predictive probability of ≥ 0.80 for increasing the lifespan of Caenorhabditis elegans were broadly separated into (1) flavonoids, (2) fatty acids and conjugates, and (3) organooxygen compounds.


Sign in / Sign up

Export Citation Format

Share Document