scholarly journals Random forest classification for predicting lifespan-extending chemical compounds

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sofia Kapsiani ◽  
Brendan J. Howlin

AbstractAgeing is a major risk factor for many conditions including cancer, cardiovascular and neurodegenerative diseases. Pharmaceutical interventions that slow down ageing and delay the onset of age-related diseases are a growing research area. The aim of this study was to build a machine learning model based on the data of the DrugAge database to predict whether a chemical compound will extend the lifespan of Caenorhabditis elegans. Five predictive models were built using the random forest algorithm with molecular fingerprints and/or molecular descriptors as features. The best performing classifier, built using molecular descriptors, achieved an area under the curve score (AUC) of 0.815 for classifying the compounds in the test set. The features of the model were ranked using the Gini importance measure of the random forest algorithm. The top 30 features included descriptors related to atom and bond counts, topological and partial charge properties. The model was applied to predict the class of compounds in an external database, consisting of 1738 small-molecules. The chemical compounds of the screening database with a predictive probability of ≥ 0.80 for increasing the lifespan of Caenorhabditis elegans were broadly separated into (1) flavonoids, (2) fatty acids and conjugates, and (3) organooxygen compounds.

2020 ◽  
Author(s):  
Sofia Kapsiani ◽  
Brendan James Howlin

Abstract Ageing is a major risk factor for many conditions including cancer, cardiovascular and neurodegenerative diseases. Pharmaceutical interventions that slow down ageing and delay the onset of age-related diseases are a growing research area. The aim of this study was to build a machine learning model based on the data of the DrugAge database to predict whether a chemical compound will extend the lifespan of the worm species Caenorhabditis elegans. Five predictive models were built using the random forest algorithm with molecular fingerprints and/or molecular descriptors as features. Feature selection was achieved using variation and mutual information-based methods. The best performing classifier, built using molecular descriptors, achieved an area under the curve (AUC) score of 0.815 for classifying the compounds in the test set. The features of the model were ranked using the Gini importance measure of the random forest algorithm. The top 30 most important features included descriptors related to atom and bond counts, topological and partial charge properties. The model was applied to predict the class of compounds in an external database, consisting of 1,738 small-molecules. The chemical compounds of the screening database with a predictive probability of ≥ 0.80 for increasing the lifespan of Caenorhabditis elegans were broadly separated into (i) flavonoids, (ii) fatty acids and conjugates, and (iii) organooxygen compounds.


2021 ◽  
Author(s):  
Sofia Kapsiani ◽  
Brendan J. Howlin

Abstract Ageing is a major risk factor for many conditions including cancer, cardiovascular and neurodegenerative diseases. Pharmaceutical interventions that slow down ageing and delay the onset of age-related diseases are a growing research area. The aim of this study was to build a machine learning model based on the data of the DrugAge database to predict whether a chemical compound will extend the lifespan of Caenorhabditis elegans. Five predictive models were built using the random forest algorithm with molecular fingerprints and/or molecular descriptors as features. The best performing classifier, built using molecular descriptors, achieved an area under the curve (AUC) score of 0.815 for classifying the compounds in the test set. The features of the model were ranked using the Gini importance measure of the random forest algorithm. The top 30 features included descriptors related to atom and bond counts, topological and partial charge properties. The model was applied to predict the class of compounds in an external database, consisting of 1,738 small-molecules. The chemical compounds of the screening database with a predictive probability of ≥ 0.80 for increasing the lifespan of Caenorhabditis elegans were broadly separated into (i) flavonoids, (ii) fatty acids and conjugates, and (iii) organooxygen compounds.


PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0260195
Author(s):  
Marcelo Dantas Tavares de Melo ◽  
Jose de Arimatéia Batista Araujo-Filho ◽  
José Raimundo Barbosa ◽  
Camila Rocon ◽  
Carlos Danilo Miranda Regis ◽  
...  

Aims Noncompaction cardiomyopathy (NCC) is considered a genetic cardiomyopathy with unknown pathophysiological mechanisms. We propose to evaluate echocardiographic predictors for rigid body rotation (RBR) in NCC using a machine learning (ML) based model. Methods and results Forty-nine outpatients with NCC diagnosis by echocardiography and magnetic resonance imaging (21 men, 42.8±14.8 years) were included. A comprehensive echocardiogram was performed. The layer-specific strain was analyzed from the apical two-, three, four-chamber views, short axis, and focused right ventricle views using 2D echocardiography (2DE) software. RBR was present in 44.9% of patients, and this group presented increased LV mass indexed (118±43.4 vs. 94.1±27.1g/m2, P = 0.034), LV end-diastolic and end-systolic volumes (P< 0.001), E/e’ (12.2±8.68 vs. 7.69±3.13, P = 0.034), and decreased LV ejection fraction (40.7±8.71 vs. 58.9±8.76%, P < 0.001) when compared to patients without RBR. Also, patients with RBR presented a significant decrease of global longitudinal, radial, and circumferential strain. When ML model based on a random forest algorithm and a neural network model was applied, it found that twist, NC/C, torsion, LV ejection fraction, and diastolic dysfunction are the strongest predictors to RBR with accuracy, sensitivity, specificity, area under the curve of 0.93, 0.99, 0.80, and 0.88, respectively. Conclusion In this study, a random forest algorithm was capable of selecting the best echocardiographic predictors to RBR pattern in NCC patients, which was consistent with worse systolic, diastolic, and myocardium deformation indices. Prospective studies are warranted to evaluate the role of this tool for NCC risk stratification.


2019 ◽  
Author(s):  
Sijie He ◽  
Weiwei Chen ◽  
Hankui Liu ◽  
Shengting Li ◽  
Dongzhu Lei ◽  
...  

AbstractThe study of Mendelian diseases and the identification of their causative genes are of great significance in the field of genetics. The evaluation of the pathogenicity of genes and the total number of Mendelian disease genes are both important questions worth studying. However, very few studies have addressed these issues to date, so we attempt to answer them in this study.We calculated gene pathogenicity prediction (GPP) score by a machine learning approach (random forest algorithm) to evaluate the pathogenicity of genes. When we applied the GPP score to the testing gene set, we obtained accuracy of 80%, recall of 93% and area under the curve (AUC) of 0.87. Our results estimated that a total of 10,399 protein-coding genes were Mendelian disease genes. Furthermore, we found the GPP score was positively correlated with the severity of disease.Our results indicate that GPP score may provide a robust and reliable guideline to predict the pathogenicity of protein-coding genes. To our knowledge, this is the first trial to estimate the total number of Mendelian disease genes.


Author(s):  
A.E. Semenov

The method of pedestrian navigation in the cities illustrated by the example of Saint-Petersburg was investigated. The factors influencing people when they choose a route for their walk were determined. Based on acquired factors corresponding data was collected and used to develop model determining attractiveness of a street in the city using Random Forest algorithm. The results obtained shows that routes provided by the method are 14% more attractive and just 6% longer compared with the shortest ones.


Water ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 1198
Author(s):  
Stuart McMichael ◽  
Pilar Fernández-Ibáñez ◽  
John Anthony Byrne

The photoexcitation of suitable semiconducting materials in aqueous environments can lead to the production of reactive oxygen species (ROS). ROS can inactivate microorganisms and degrade a range of chemical compounds. In the case of heterogeneous photocatalysis, semiconducting materials may suffer from fast recombination of electron–hole pairs and require post-treatment to separate the photocatalyst when a suspension system is used. To reduce recombination and improve the rate of degradation, an externally applied electrical bias can be used where the semiconducting material is immobilised onto an electrically conducive support and connected to a counter electrode. These electrochemically assisted photocatalytic systems have been termed “photoelectrocatalytic” (PEC). This review will explain the fundamental mechanism of PECs, photoelectrodes, the different types of PEC reactors reported in the literature, the (photo)electrodes used, the contaminants degraded, the key findings and prospects in the research area.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Mercedes M. Pérez-Jiménez ◽  
José M. Monje-Moreno ◽  
Ana María Brokate-Llanos ◽  
Mónica Venegas-Calerón ◽  
Alicia Sánchez-García ◽  
...  

AbstractAging and fertility are two interconnected processes. From invertebrates to mammals, absence of the germline increases longevity. Here we show that loss of function of sul-2, the Caenorhabditis elegans steroid sulfatase (STS), raises the pool of sulfated steroid hormones, increases longevity and ameliorates protein aggregation diseases. This increased longevity requires factors involved in germline-mediated longevity (daf-16, daf-12, kri-1, tcer-1 and daf-36 genes) although sul-2 mutations do not affect fertility. Interestingly, sul-2 is only expressed in sensory neurons, suggesting a regulation of sulfated hormones state by environmental cues. Treatment with the specific STS inhibitor STX64, as well as with testosterone-derived sulfated hormones reproduces the longevity phenotype of sul-2 mutants. Remarkably, those treatments ameliorate protein aggregation diseases in C. elegans, and STX64 also Alzheimer’s disease in a mammalian model. These results open the possibility of reallocating steroid sulfatase inhibitors or derivates for the treatment of aging and aging related diseases.


Sign in / Sign up

Export Citation Format

Share Document