Classification study of solvation free energies of organic molecules using machine learning techniques

RSC Advances ◽  
2014 ◽  
Vol 4 (106) ◽  
pp. 61624-61630 ◽  
Author(s):  
N. S. Hari Narayana Moorthy ◽  
Silvia A. Martins ◽  
Sergio F. Sousa ◽  
Maria J. Ramos ◽  
Pedro A. Fernandes

Classification models to predict the solvation free energies of organic molecules were developed using decision tree, random forest and support vector machine approaches and with MACCS fingerprints, MOE and PaDEL descriptors.

2021 ◽  
Author(s):  
Roobaea Alroobaea ◽  
Seifeddine Mechti ◽  
Mariem Haoues ◽  
Saeed Rubaiee ◽  
Anas Ahmed ◽  
...  

Abstract Alzheimer's is the main reason for dementia, that affects frequently older adults. This disease is costly especially, in terms of treatment. In addition, Alzheimer's is one of the deaths causes in the old-age citizens. Early Alzheimer's detection helps medical staffs in this disease diagnosis, which will certainly decrease the risk of death. This made the early Alzheimer's disease detection a crucial problem in the healthcare industry. The objective of this research study is to introduce a computer-aided diagnosis system for Alzheimer's disease detection using machine learning techniques. We employed data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Open Access Series of Imaging Studies (OASIS) brain datasets. Common supervised machine learning techniques have been applied for automatic Alzheimer’s disease detection such as: logistic regression, support vector machine, random forest, linear discriminant analysis, etc. The best accuracy values provided by the machine learning classifiers are 99.43% and 99.10% given by respectively, logistic regression and support vector machine using ADNI dataset, whereas for the OASIS dataset, we obtained 84.33% and 83.92% given by respectively logistic regression and random forest.


A computerized system can improve the disease identifying abilities of doctor and also reduce the time needed for the identification and decision-making in healthcare. Gliomas are the brain tumors that can be labeled as Benign (non- cancerous) or Malignant (cancerous) tumor. Hence, the different stages of the tumor are extremely important for identification of appropriate medication. In this paper, a system has been proposed to detect brain tumor of different stages by MR images. The proposed system uses Fuzzy C-Mean (FCM) as a clustering technique for better outcome. The main focus in this paper is to refine the required features in two steps with the help of Discrete Wavelet Transform (DWT) and Independent Component Analysis (ICA) using three machine learning techniques i.e. Random Forest (RF), Artificial Neural Network (ANN) and Support Vector Machine (SVM). The final outcome of our experiment indicated that the proposed computerized system identifies the brain tumor using RF, ANN and SVM with 100%, 91.6% and 95.8%, accuracy respectively. We have also calculated Sensitivity, Specificity, Matthews’s Correlation Coefficient and AUC-ROC curve. Random forest shows the highest accuracy as compared to Support Vector Machine and Artificial Neural Networks.


2018 ◽  
Vol 77 (9) ◽  
pp. 2184-2189 ◽  
Author(s):  
Joshua Myrans ◽  
Zoran Kapelan ◽  
Richard Everson

Abstract This work presents a methodology for automatic detection of structural faults in sewers from CCTV footage, which has been improved by combining the outputs of different machine learning techniques. The predictions of support vector machine and random forest classifiers are combined using three distinct techniques: ‘both’, ‘most likely’ and ‘stacking’. Each technique is tested on CCTV data taken from real surveys covering a range of pipes at locations in the south-west of the UK. The best tested technique, stacking, offers a 5% increase in accuracy for minimal impact in efficiency, proving useful for future development and implementation of the fault detection methodology.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Maisa Cardoso Aniceto ◽  
Flavio Barboza ◽  
Herbert Kimura

AbstractCredit risk evaluation has a relevant role to financial institutions, since lending may result in real and immediate losses. In particular, default prediction is one of the most challenging activities for managing credit risk. This study analyzes the adequacy of borrower’s classification models using a Brazilian bank’s loan database, and exploring machine learning techniques. We develop Support Vector Machine, Decision Trees, Bagging, AdaBoost and Random Forest models, and compare their predictive accuracy with a benchmark based on a Logistic Regression model. Comparisons are analyzed based on usual classification performance metrics. Our results show that Random Forest and Adaboost perform better when compared to other models. Moreover, Support Vector Machine models show poor performance using both linear and nonlinear kernels. Our findings suggest that there are value creating opportunities for banks to improve default prediction models by exploring machine learning techniques.


2020 ◽  
Vol 13 (1) ◽  
pp. 130-149
Author(s):  
Puneet Misra ◽  
Siddharth Chaurasia

Stock market movements are affected by numerous factors making it one of the most challenging problems for forecasting. This article attempts to predict the direction of movement of stock and stock indices. The study uses three classifiers - Artificial Neural Network, Random Forest and Support Vector Machine with four different representation of inputs. First representation uses raw data (open, high, low, close and volume), The second uses ten features in the form of technical indicators generated by use of technical analysis. The third and fourth portrayal presents two different ways of converting the indicator data into discrete trend data. Experimental results suggest that for raw data support vector machine provides the best results. For other representations, there is no clear winner regarding models applied, but portrayal of data by the proposed approach gave best overall results for all the models and financial series. Consistency of the results highlight the importance of feature generation and right representation of dataset to machine learning techniques.


Analysis of credit scoring is an effective credit risk assessment technique, which is one of the major research fields in the banking sector. Machine learning has a variety of applications in the banking sector and it has been widely used for data analysis. Modern techniques such as machine learning have provided a self-regulating process to analyze the data using classification techniques. The classification method is a supervised learning process in which the computer learns from the input data provided and makes use of this information to classify the new dataset. This research paper presents a comparison of various machine learning techniques used to evaluate the credit risk. A credit transaction that needs to be accepted or rejected is trained and implemented on the dataset using different machine learning algorithms. The techniques are implemented on the German credit dataset taken from UCI repository which has 1000 instances and 21 attributes, depending on which the transactions are either accepted or rejected. This paper compares algorithms such as Support Vector Network, Neural Network, Logistic Regression, Naive Bayes, Random Forest, and Classification and Regression Trees (CART) algorithm and the results obtained show that Random Forest algorithm was able to predict credit risk with higher accuracy


The advancement in cyber-attack technologies have ushered in various new attacks which are difficult to detect using traditional intrusion detection systems (IDS).Existing IDS are trained to detect known patterns because of which newer attacks bypass the current IDS and go undetected. In this paper, a two level framework is proposed which can be used to detect unknown new attacks using machine learning techniques. In the first level the known types of classes for attacks are determined using supervised machine learning algorithms such as Support Vector Machine (SVM) and Neural networks (NN). The second level uses unsupervised machine learning algorithms such as K-means. The experimentation is carried out with four models with NSL- KDD dataset in Openstack cloud environment. The Model with Support Vector Machine for supervised machine learning, Gradual Feature Reduction (GFR) for feature selection and K-means for unsupervised algorithm provided the optimum efficiency of 94.56 %.


Author(s):  
Marcos Ruiz-Álvarez ◽  
Francisco Alonso-Sarría ◽  
Francisco Gomariz-Castillo

Several methods have been tried to estimate air temperature using satellite imagery. In this paper, the results of two machine learning algorithms, Support Vector Machine and Random Forest, are compared with Multivariate Linear Regression, TVX and Ordinary kriging. Several geographic, remote sensing and time variables are used as predictors. The validation is carried out using four different statistics on a daily basis allowing the use of ANOVA to compare the results. The main conclusion is that Random Forest with residual kriging produces the best results (R$^2$=0.612 $\pm$ 0.019, NSE=0.578 $\pm$ 0.025, RMSE=1.068 $\pm$ 0.027, PBIAS=-0.172 $\pm$ 0.046), whereas TVX produces the least accurate results. The environmental conditions in the study area are not really suited to TVX, moreover this method only takes into account satellite data. On the other hand, regression methods (Support Vector Machine, Random Forest and Multivariate Linear Regression) use several parameters that are easily calculated from a Digital Elevation Model, adding very little difficulty to the use of satellite data alone. The most important variables in the Random Forest Model were satellite temperature, potential irradiation and cdayt, a cosine transformation of the julian day.


2021 ◽  
Author(s):  
Rakesh Kumar Saroj ◽  
Pawan Kumar Yadav ◽  
Rajneesh Singh ◽  
Obvious Nchimunya Chilyabanyama

Abstract Background: The death rate of under-five children in India declined last few decades, but few bigger states have poor performance. This is a matter of serious concern for the child's health as well as social development. Nowadays, machine learning techniques play a crucial role in the smart health care system to capture the hidden factors and patterns of outcomes. In this paper, we used machine learning techniques to predict the important factors of under-five mortality.This study aims to explore the importance of machine learning techniques to predict under-five mortality and to find the important factors that cause under-five mortality.The data was taken from the National Family Health Survey-IV of Uttar Pradesh. We used four machine learning techniques like decision tree, support vector machine, random forest, and logistic regression to predict under-five mortality factors and model accuracy of each model. We have also used information gain to rank to know the important variables for accurate predictions in under-five mortality data.Result: Random Forest (RF) predicts the child mortality factors with the highest accuracy of 97.5 %, and the number of living children, births in the last five years, educational level, birth order, total children ever born, currently breastfeeding, and size of child at birth that identifying as essential factors for under-five mortality.Conclusion: The study focuses on machine learning techniques to predict and identify important factors for under-five mortality. The random forest model provides an excellent predictive result for estimating the risk factors of under-five mortality. Based on the resulting outcome, policymakers can make policies and plans to reduce under-five mortality.


Sign in / Sign up

Export Citation Format

Share Document