Diagnosis of Subclinical Keratoconus Based on Machine Learning Techniques

(1) Background: Keratoconus is a non-inflammatory corneal disease characterized by gradual thinning of the stroma, resulting in irreversible visual quality and quantity decline. Early detection of keratoconus and subsequent prevention of possible risks are crucial factors in its progression. Random forest is a machine learning technique for classification based on the construction of thousands of decision trees. The aim of this study was to use the random forest technique in the classification and prediction of subclinical keratoconus, considering the metrics proposed by Pentacam and Corvis. (2) Methods: The design was a retrospective cross-sectional study. A total of 81 eyes of 81 patients were enrolled: sixty-one eyes with healthy corneas and twenty patients with subclinical keratoconus (SCKC): This initial stage includes patients with the following conditions: (1) minor topographic signs of keratoconus and suspicious topographic findings (mild asymmetric bow tie, with or without deviation; (2) average K (mean corneal curvature) <46, 5 D; (3) minimum corneal thickness (ECM) > 490 μm; (4) no slit lamp found; and (5) contralateral clinical keratoconus of the eye. Pentacam topographic and Corvis biomechanical variables were collected. Decision tree and random forest were used as machine learning techniques for classifications. Random forest performed a ranking of the most critical variables in classification. (3) Results: The essential variable was SP A1 (stiffness parameter A1), followed by A2 time, posterior coma 0º, A2 velocity and peak distance. The model efficiently predicted all patients with subclinical keratoconus (Sp = 93%) and was also a good model for classifying healthy cases (Sen = 86%). The overall accuracy rate of the model was 89%. (4) Conclusions: The random forest model was a good model for classifying subclinical keratoconus. The SP A1 variable was the most critical determinant in classifying and identifying subclinical keratoconus, followed by A2 time.

Download Full-text

Application of Machine Learning Techniques for Clinical Predictive Modeling: A Cross-Sectional Study on Nonalcoholic Fatty Liver Disease in China

BioMed Research International ◽

10.1155/2018/4304376 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 9

Author(s):

Han Ma ◽

Cheng-fu Xu ◽

Zhe Shen ◽

Chao-hui Yu ◽

You-ming Li

Keyword(s):

Machine Learning ◽

Fatty Liver Disease ◽

Cross Sectional Study ◽

Machine Learning Techniques ◽

Sectional Study ◽

Bayesian Network Model ◽

Cross Sectional ◽

Nonalcoholic Fatty Liver ◽

Learning Techniques ◽

F Measure

Background. Nonalcoholic fatty liver disease (NAFLD) is one of the most common chronic liver diseases. Machine learning techniques were introduced to evaluate the optimal predictive clinical model of NAFLD. Methods. A cross-sectional study was performed with subjects who attended a health examination at the First Affiliated Hospital, Zhejiang University. Questionnaires, laboratory tests, physical examinations, and liver ultrasonography were employed. Machine learning techniques were then implemented using the open source software Weka. The tasks included feature selection and classification. Feature selection techniques built a screening model by removing the redundant features. Classification was used to build a prediction model, which was evaluated by the F-measure. 11 state-of-the-art machine learning techniques were investigated. Results. Among the 10,508 enrolled subjects, 2,522 (24%) met the diagnostic criteria of NAFLD. By leveraging a set of statistical testing techniques, BMI, triglycerides, gamma-glutamyl transpeptidase (γGT), the serum alanine aminotransferase (ALT), and uric acid were the top 5 features contributing to NAFLD. A 10-fold cross-validation was used in the classification. According to the results, the Bayesian network model demonstrated the best performance from among the 11 different techniques. It achieved accuracy, specificity, sensitivity, and F-measure scores of up to 83%, 0.878, 0.675, and 0.655, respectively. Compared with logistic regression, the Bayesian network model improves the F-measure score by 9.17%. Conclusion. Novel machine learning techniques may have screening and predictive value for NAFLD.

Download Full-text

Learning from Imbalanced Educational Data Using Ensemble Machine Learning Algorithms

Webology ◽

10.14704/web/v18si01/web18053 ◽

2021 ◽

Vol 18 (Special Issue 01) ◽

pp. 183-195

Author(s):

Thingbaijam Lenin ◽

N. Chandrasekaran

Keyword(s):

Machine Learning ◽

Random Forest ◽

Missing Values ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Adaptive Boosting ◽

Stochastic Gradient Boosting ◽

Ensemble Machine Learning ◽

Learning Techniques ◽

Student’S Performance

Student’s academic performance is one of the most important parameters for evaluating the standard of any institute. It has become a paramount importance for any institute to identify the student at risk of underperforming or failing or even drop out from the course. Machine Learning techniques may be used to develop a model for predicting student’s performance as early as at the time of admission. The task however is challenging as the educational data required to explore for modelling are usually imbalanced. We explore ensemble machine learning techniques namely bagging algorithm like random forest (rf) and boosting algorithms like adaptive boosting (adaboost), stochastic gradient boosting (gbm), extreme gradient boosting (xgbTree) in an attempt to develop a model for predicting the student’s performance of a private university at Meghalaya using three categories of data namely demographic, prior academic record, personality. The collected data are found to be highly imbalanced and also consists of missing values. We employ k-nearest neighbor (knn) data imputation technique to tackle the missing values. The models are developed on the imputed data with 10 fold cross validation technique and are evaluated using precision, specificity, recall, kappa metrics. As the data are imbalanced, we avoid using accuracy as the metrics of evaluating the model and instead use balanced accuracy and F-score. We compare the ensemble technique with single classifier C4.5. The best result is provided by random forest and adaboost with F-score of 66.67%, balanced accuracy of 75%, and accuracy of 96.94%.

Download Full-text

Heart Disease Prediction using Machine Learning Techniques

International Journal of Scientific Research in Science and Technology ◽

10.32628/ijsrst2183218 ◽

2021 ◽

pp. 42-47

Author(s):

Ramesh Ponnala ◽

K. Sai Sowjanya

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Random Forest ◽

Linear Model ◽

Machine Learning Techniques ◽

Disease Prediction ◽

Huge Amount ◽

Healthcare Enterprise ◽

Learning Techniques ◽

Accuracy Level

Prediction of Cardiovascular ailment is an important task inside the vicinity of clinical facts evaluation. Machine learning knowledge of has been proven to be effective in helping in making selections and predicting from the huge amount of facts produced by using the healthcare enterprise. on this paper, we advocate a unique technique that pursuits via finding good sized functions by means of applying ML strategies ensuing in improving the accuracy inside the prediction of heart ailment. The severity of the heart disease is classified primarily based on diverse methods like KNN, choice timber and so on. The prediction version is added with special combos of capabilities and several known classification techniques. We produce a stronger performance level with an accuracy level of a 100% through the prediction version for heart ailment with the Hybrid Random forest area with a linear model (HRFLM).

Download Full-text

Classification study of solvation free energies of organic molecules using machine learning techniques

RSC Advances ◽

10.1039/c4ra07961b ◽

2014 ◽

Vol 4 (106) ◽

pp. 61624-61630 ◽

Cited By ~ 8

Author(s):

N. S. Hari Narayana Moorthy ◽

Silvia A. Martins ◽

Sergio F. Sousa ◽

Maria J. Ramos ◽

Pedro A. Fernandes

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Organic Molecules ◽

Machine Learning Techniques ◽

Support Vector ◽

Classification Models ◽

Free Energies ◽

Learning Techniques ◽

Solvation Free Energies

Classification models to predict the solvation free energies of organic molecules were developed using decision tree, random forest and support vector machine approaches and with MACCS fingerprints, MOE and PaDEL descriptors.

Download Full-text

Prediction of probable backorder scenarios in the supply chain using Distributed Random Forest and Gradient Boosting Machine learning techniques

Journal Of Big Data ◽

10.1186/s40537-020-00345-2 ◽

2020 ◽

Vol 7 (1) ◽

Cited By ~ 1

Author(s):

Samiul Islam ◽

Saman Hassanzadeh Amin

Keyword(s):

Machine Learning ◽

Supply Chain ◽

Random Forest ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Learning Techniques ◽

Gradient Boosting Machine

Download Full-text

Enhancing alpine glacial lakes detection and mapping using multi-source data and machine learning techniques

10.5194/egusphere-egu2020-21811 ◽

2020 ◽

Author(s):

Sonam Wangchuk ◽

Tobias Bolch

Keyword(s):

Machine Learning ◽

Random Forest ◽

Satellite Images ◽

Random Forest Classifier ◽

Machine Learning Techniques ◽

Glacial Lake ◽

Glacial Lakes ◽

Alpine Regions ◽

Learning Techniques ◽

Source Data

<p>An accurate detection and mapping of glacial lakes in the Alpine regions such as the Himalayas, the Alps and the Andes are challenged by many factors. These factors include 1) a small size of glacial lakes, 2) cloud cover in optical satellite images, 3) cast shadows from mountains and clouds, 4) seasonal snow in satellite images, 5) varying degree of turbidity amongst glacial lakes, and 6) frozen glacial lake surface. In our study, we propose a fully automated approach, that overcomes most of the above mentioned challenges, to detect and map glacial lakes accurately using multi-source data and machine learning techniques such as the random forest classifier algorithm. The multi-source data are from the Sentinel-1 Synthetic Aperture Radar data (radar backscatter), the Sentinel-2 multispectral instrument data (NDWI), and the SRTM digital elevation model (slope). We use these data as inputs for the rule-based segmentation of potential glacial lakes, where decision rules are implemented from the expert system. The potential glacial lake polygons are then classified either as glacial lakes or non-glacial lakes by the trained and tested random forest classifier algorithm. The performance of the method was assessed in eight test sites located across the Alpine regions (e.g. the Boshula mountain range and Koshi basin in the Himalayas, the Tajiks Pamirs, the Swiss Alps and the Peruvian Andes) of the word. We show that the proposed method performs efficiently irrespective of geographic, geologic, climatic, and glacial lake conditions.</p>

Download Full-text

Video spam comment features selection using machine learning techniques

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v15.i2.pp1046-1053 ◽

2019 ◽

Vol 15 (2) ◽

pp. 1046

Author(s):

Nabilah Alias ◽

Cik Feresa Mohd Foozy ◽

Sofia Najwa Ramli ◽

Naqliyah Zainuddin

Keyword(s):

Machine Learning ◽

Social Media ◽

Random Forest ◽

Feature Detection ◽

Random Tree ◽

Machine Learning Techniques ◽

Decision Table ◽

Features Selection ◽

Video Sharing ◽

Learning Techniques

<p>Nowadays, social media (e.g., YouTube and Facebook) provides connection and interaction between people by posting comments or videos. In fact, comments are a part of contents in a website that can attract spammer to spreading phishing, malware or advertising. Due to existing malicious users that can spread malware or phishing in the comments, this work proposes a technique used for video sharing spam comments feature detection. The first phase of the methodology used in this work is dataset collection. For this experiment, a dataset from UCI Machine Learning repository is used. In the next phase, the development of framework and experimentation. The dataset will be pre-processed using tokenization and lemmatization process. After that, the features to detect spam is selected and the experiments for classification were performed by using six classifiers which are Random Tree, Random Forest, Naïve Bayes, KStar, Decision Table, and Decision Stump. The result shows the highest accuracy is 90.57% and the lowest was 58.86%.</p>

Download Full-text

Estimating Warehouse Rental Price using Machine Learning Techniques

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2018.2.3034 ◽

2018 ◽

Vol 13 (2) ◽

pp. 235-250 ◽

Cited By ~ 3

Author(s):

Yixuan Ma ◽

Zhenji Zhang ◽

Alexander Ihler ◽

Baoxiang Pan

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Estate ◽

Rapid Development ◽

Supply And Demand ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Logistics Industry ◽

Real Estate Price ◽

Learning Techniques

Boosted by the growing logistics industry and digital transformation, the sharing warehouse market is undergoing a rapid development. Both supply and demand sides in the warehouse rental business are faced with market perturbations brought by unprecedented peer competitions and information transparency. A key question faced by the participants is how to price warehouses in the open market. To understand the pricing mechanism, we built a real world warehouse dataset using data collected from the classified advertisements websites. Based on the dataset, we applied machine learning techniques to relate warehouse price with its relevant features, such as warehouse size, location and nearby real estate price. Four candidate models are used here: Linear Regression, Regression Tree, Random Forest Regression and Gradient Boosting Regression Trees. The case study in the Beijing area shows that warehouse rent is closely related to its location and land price. Models considering multiple factors have better skill in estimating warehouse rent, compared to singlefactor estimation. Additionally, tree models have better performance than the linear model, with the best model (Random Forest) achieving correlation coefficient of 0.57 in the test set. Deeper investigation of feature importance illustrates that distance from the city center plays the most important role in determining warehouse price in Beijing, followed by nearby real estate price and warehouse size.

Download Full-text

Credit Risk Assessment using Machine Learning Techniques

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a4936.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 3482-3486

Keyword(s):

Machine Learning ◽

Risk Assessment ◽

Random Forest ◽

Credit Risk ◽

Banking Sector ◽

Machine Learning Techniques ◽

Support Vector ◽

Credit Risk Assessment ◽

Learning Techniques ◽

Cart Algorithm

Analysis of credit scoring is an effective credit risk assessment technique, which is one of the major research fields in the banking sector. Machine learning has a variety of applications in the banking sector and it has been widely used for data analysis. Modern techniques such as machine learning have provided a self-regulating process to analyze the data using classification techniques. The classification method is a supervised learning process in which the computer learns from the input data provided and makes use of this information to classify the new dataset. This research paper presents a comparison of various machine learning techniques used to evaluate the credit risk. A credit transaction that needs to be accepted or rejected is trained and implemented on the dataset using different machine learning algorithms. The techniques are implemented on the German credit dataset taken from UCI repository which has 1000 instances and 21 attributes, depending on which the transactions are either accepted or rejected. This paper compares algorithms such as Support Vector Network, Neural Network, Logistic Regression, Naive Bayes, Random Forest, and Classification and Regression Trees (CART) algorithm and the results obtained show that Random Forest algorithm was able to predict credit risk with higher accuracy

Download Full-text

Analysis Of Solar Power Generation Forecasting Using Machine Learning Techniques

E3S Web of Conferences ◽

10.1051/e3sconf/202130901163 ◽

2021 ◽

Vol 309 ◽

pp. 01163

Author(s):

K. Anuradha ◽

Deekshitha Erlapally ◽

G. Karuna ◽

V. Srilakshmi ◽

K. Adilakshmi

Keyword(s):

Machine Learning ◽

Power Generation ◽

Random Forest ◽

Regression Models ◽

Solar Power ◽

Machine Learning Techniques ◽

Pv Systems ◽

Solar Power Generation ◽

Learning Techniques ◽

Solar Power Forecasting

Solar power is generated using photovoltaic (PV) systems all over the world. Because the output power of PV systems is alternating and highly dependent on environmental circumstances, solar power sources are unpredictable in nature. Irradiance, humidity, PV surface temperature, and wind speed are only a few of these variables. Because of the unpredictability in photovoltaic generating, it’s crucial to plan ahead for solar power generation as in solar power forecasting is required for electric grid. Solar power generation is weather-dependent and unpredictable, this forecast is complex and difficult. The impacts of various environmental conditions on the output of a PV system are discussed. Machine Learning (ML) algorithms have shown great results in time series forecasting and so can be used to anticipate power with weather conditions as model inputs. The use of multiple machine learning, Deep learning and artificial neural network techniques to perform solar power forecasting. Here in this regression models from machine learning techniques like support vector machine regressor, random forest regressor and linear regression model from which random forest regressor beaten the other two regression models with vast accuracy.

Download Full-text