Wheat Yellow Rust Disease Infection Type Classification Using Texture Features

Wheat is a staple crop of Pakistan that covers almost 40% of the cultivated land and contributes almost 3% in the overall Gross Domestic Product (GDP) of Pakistan. However, due to increasing seasonal variation, it was observed that wheat is majorly affected by rust disease, particularly in rain-fed areas. Rust is considered the most harmful fungal disease for wheat, which can cause reductions of 20–30% in wheat yield. Its capability to spread rapidly over time has made its management most challenging, becoming a major threat to food security. In order to counter this threat, precise detection of wheat rust and its infection types is important for minimizing yield losses. For this purpose, we have proposed a framework for classifying wheat yellow rust infection types using machine learning techniques. First, an image dataset of different yellow rust infections was collected using mobile cameras. Six Gray Level Co-occurrence Matrix (GLCM) texture features and four Local Binary Patterns (LBP) texture features were extracted from grayscale images of the collected dataset. In order to classify wheat yellow rust disease into its three classes (healthy, resistant, and susceptible), Decision Tree, Random Forest, Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting (XGBoost), and CatBoost were used with (i) GLCM, (ii) LBP, and (iii) combined GLCM-LBP texture features. The results indicate that CatBoost outperformed on GLCM texture features with an accuracy of 92.30%. This accuracy can be further improved by scaling up the dataset and applying deep learning models. The development of the proposed study could be useful for the agricultural community for the early detection of wheat yellow rust infection and assist in taking remedial measures to contain crop yield.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

Development and validation of a difficult laryngoscopy prediction model using machine learning of neck circumference and thyromental height

BMC Anesthesiology ◽

10.1186/s12871-021-01343-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jong Ho Kim ◽

Haewon Kim ◽

Ji Su Jang ◽

Sung Mi Hwang ◽

So Young Lim ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Confidence Interval ◽

Neck Circumference ◽

Difficult Laryngoscopy ◽

Gradient Boosting ◽

Test Set ◽

Equal Distribution ◽

Light Gradient ◽

Extreme Gradient Boosting

Abstract Background Predicting difficult airway is challengeable in patients with limited airway evaluation. The aim of this study is to develop and validate a model that predicts difficult laryngoscopy by machine learning of neck circumference and thyromental height as predictors that can be used even for patients with limited airway evaluation. Methods Variables for prediction of difficulty laryngoscopy included age, sex, height, weight, body mass index, neck circumference, and thyromental distance. Difficult laryngoscopy was defined as Grade 3 and 4 by the Cormack-Lehane classification. The preanesthesia and anesthesia data of 1677 patients who had undergone general anesthesia at a single center were collected. The data set was randomly stratified into a training set (80%) and a test set (20%), with equal distribution of difficulty laryngoscopy. The training data sets were trained with five algorithms (logistic regression, multilayer perceptron, random forest, extreme gradient boosting, and light gradient boosting machine). The prediction models were validated through a test set. Results The model’s performance using random forest was best (area under receiver operating characteristic curve = 0.79 [95% confidence interval: 0.72–0.86], area under precision-recall curve = 0.32 [95% confidence interval: 0.27–0.37]). Conclusions Machine learning can predict difficult laryngoscopy through a combination of several predictors including neck circumference and thyromental height. The performance of the model can be improved with more data, a new variable and combination of models.

Download Full-text

Establishing a Credit Risk Evaluation System for SMEs Using the Soft Voting Fusion Model

Risks ◽

10.3390/risks9110202 ◽

2021 ◽

Vol 9 (11) ◽

pp. 202

Author(s):

Ge Gao ◽

Hongxin Wang ◽

Pengbin Gao

Keyword(s):

Credit Risk ◽

Evaluation System ◽

Predictive Accuracy ◽

Assessment System ◽

Gradient Boosting ◽

Support Vector ◽

Fusion Model ◽

Light Gradient ◽

Extreme Gradient Boosting ◽

The Government

In China, SMEs are facing financing difficulties, and commercial banks and financial institutions are the main financing channels for SMEs. Thus, a reasonable and efficient credit risk assessment system is important for credit markets. Based on traditional statistical methods and AI technology, a soft voting fusion model, which incorporates logistic regression, support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), is constructed to improve the predictive accuracy of SMEs’ credit risk. To verify the feasibility and effectiveness of the proposed model, we use data from 123 SMEs nationwide that worked with a Chinese bank from 2016 to 2020, including financial information and default records. The results show that the accuracy of the soft voting fusion model is higher than that of a single machine learning (ML) algorithm, which provides a theoretical basis for the government to control credit risk in the future and offers important references for banks to make credit decisions.

Download Full-text

Interpretable Machine Learning for Early Neurological Deterioration Prediction in Atrial Fibrillation-Related Stroke

10.21203/rs.3.rs-446890/v1 ◽

2021 ◽

Author(s):

Seong Hwan Kim ◽

Eun-Tae Jeon ◽

Sungwook Yu ◽

Kyungmi O ◽

Chi Kyung Kim ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Neurological Deterioration ◽

Gradient Boosting ◽

Support Vector ◽

Light Gradient ◽

Interpretable Machine Learning ◽

Extreme Gradient Boosting ◽

Early Neurological Deterioration ◽

Feature Importance

Abstract We aimed to develop a novel prediction model for early neurological deterioration (END) based on an interpretable machine learning (ML) algorithm for atrial fibrillation (AF)-related stroke and to evaluate the prediction accuracy and feature importance of ML models. Data from multi-center prospective stroke registries in South Korea were collected. After stepwise data preprocessing, we utilized logistic regression, support vector machine, extreme gradient boosting, light gradient boosting machine (LightGBM), and multilayer perceptron models. We used the Shapley additive explanations (SHAP) method to evaluate feature importance. Of the 3,623 stroke patients, the 2,363 who had arrived at the hospital within 24 hours of symptom onset and had available information regarding END were included. Of these, 318 (13.5%) had END. The LightGBM model showed the highest area under the receiver operating characteristic curve (0.778, 95% CI, 0.726 - 0.830). The feature importance analysis revealed that fasting glucose level and the National Institute of Health Stroke Scale score were the most influential factors. Among ML algorithms, the LightGBM model was particularly useful for predicting END, as it revealed new and diverse predictors. Additionally, the SHAP method can be adjusted to individualize the features’ effects on the predictive power of the model.

Download Full-text

Performance Analysis of Boosting Classifiers in Recognizing Activities of Daily Living

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17031082 ◽

2020 ◽

Vol 17 (3) ◽

pp. 1082 ◽

Cited By ~ 7

Author(s):

Saifur Rahman ◽

Muhammad Irfan ◽

Mohsin Raza ◽

Khawaja Moyeezullah Ghori ◽

Shumayla Yaqoob ◽

...

Keyword(s):

Physical Activity ◽

Performance Analysis ◽

Activities Of Daily Living ◽

Daily Living ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Method Performance ◽

Light Gradient ◽

Extreme Gradient Boosting ◽

Boosting Algorithms

Physical activity is essential for physical and mental health, and its absence is highly associated with severe health conditions and disorders. Therefore, tracking activities of daily living can help promote quality of life. Wearable sensors in this regard can provide a reliable and economical means of tracking such activities, and such sensors are readily available in smartphones and watches. This study is the first of its kind to develop a wearable sensor-based physical activity classification system using a special class of supervised machine learning approaches called boosting algorithms. The study presents the performance analysis of several boosting algorithms (extreme gradient boosting—XGB, light gradient boosting machine—LGBM, gradient boosting—GB, cat boosting—CB and AdaBoost) in a fair and unbiased performance way using uniform dataset, feature set, feature selection method, performance metric and cross-validation techniques. The study utilizes the Smartphone-based dataset of thirty individuals. The results showed that the proposed method could accurately classify the activities of daily living with very high performance (above 90%). These findings suggest the strength of the proposed system in classifying activity of daily living using only the smartphone sensor’s data and can assist in reducing the physical inactivity patterns to promote a healthier lifestyle and wellbeing.

Download Full-text

Understanding Multi-Vehicle Collision Patterns on Freeways—A Machine Learning Approach

Infrastructures ◽

10.3390/infrastructures5080062 ◽

2020 ◽

Vol 5 (8) ◽

pp. 62

Author(s):

Clint Morris ◽

Jidong J. Yang

Keyword(s):

Machine Learning ◽

Statistical Methods ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Learning Approaches ◽

Crash Analysis ◽

Suitable Alternative ◽

Crash Data ◽

Extreme Gradient Boosting ◽

Modern Machine

Generating meaningful inferences from crash data is vital to improving highway safety. Classic statistical methods are fundamental to crash data analysis and often regarded for their interpretability. However, given the complexity of crash mechanisms and associated heterogeneity, classic statistical methods, which lack versatility, might not be sufficient for granular crash analysis because of the high dimensional features involved in crash-related data. In contrast, machine learning approaches, which are more flexible in structure and capable of harnessing richer data sources available today, emerges as a suitable alternative. With the aid of new methods for model interpretation, the complex machine learning models, previously considered enigmatic, can be properly interpreted. In this study, two modern machine learning techniques, Linear Discriminate Analysis and eXtreme Gradient Boosting, were explored to classify three major types of multi-vehicle crashes (i.e., rear-end, same-direction sideswipe, and angle) occurred on Interstate 285 in Georgia. The study demonstrated the utility and versatility of modern machine learning methods in the context of crash analysis, particularly in understanding the potential features underlying different crash patterns on freeways.

Download Full-text

Modeling of nitrogen solubility in normal alkanes using machine learning methods compared with cubic and PC-SAFT equations of state

Scientific Reports ◽

10.1038/s41598-021-03643-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Seyed Ali Madani ◽

Mohammad-Reza Mohammadi ◽

Saeid Atashrouz ◽

Ali Abedi ◽

Abdolhossein Hemmati-Sarapardeh ◽

...

Keyword(s):

Machine Learning ◽

Molecular Weight ◽

Oil Recovery ◽

Equations Of State ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Operating Pressure ◽

Normal Alkanes ◽

Light Gradient ◽

Extreme Gradient Boosting

AbstractAccurate prediction of the solubility of gases in hydrocarbons is a crucial factor in designing enhanced oil recovery (EOR) operations by gas injection as well as separation, and chemical reaction processes in a petroleum refinery. In this work, nitrogen (N2) solubility in normal alkanes as the major constituents of crude oil was modeled using five representative machine learning (ML) models namely gradient boosting with categorical features support (CatBoost), random forest, light gradient boosting machine (LightGBM), k-nearest neighbors (k-NN), and extreme gradient boosting (XGBoost). A large solubility databank containing 1982 data points was utilized to establish the models for predicting N2 solubility in normal alkanes as a function of pressure, temperature, and molecular weight of normal alkanes over broad ranges of operating pressure (0.0212–69.12 MPa) and temperature (91–703 K). The molecular weight range of normal alkanes was from 16 to 507 g/mol. Also, five equations of state (EOSs) including Redlich–Kwong (RK), Soave–Redlich–Kwong (SRK), Zudkevitch–Joffe (ZJ), Peng–Robinson (PR), and perturbed-chain statistical associating fluid theory (PC-SAFT) were used comparatively with the ML models to estimate N2 solubility in normal alkanes. Results revealed that the CatBoost model is the most precise model in this work with a root mean square error of 0.0147 and coefficient of determination of 0.9943. ZJ EOS also provided the best estimates for the N2 solubility in normal alkanes among the EOSs. Lastly, the results of relevancy factor analysis indicated that pressure has the greatest influence on N2 solubility in normal alkanes and the N2 solubility increases with increasing the molecular weight of normal alkanes.

Download Full-text

Artificial Intelligence-Based Prediction of Key Textural Properties from LUCAS and ICRAF Spectral Libraries

Agronomy ◽

10.3390/agronomy11081550 ◽

2021 ◽

Vol 11 (8) ◽

pp. 1550

Author(s):

Mohamed Zakaria Gouda ◽

El Mehdi Nagihi ◽

Lotfi Khiari ◽

Jacques Gallichand ◽

Mahmoud Ismail

Keyword(s):

Soil Texture ◽

High Performance ◽

Management Practices ◽

Textural Properties ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Accurate Estimation ◽

Light Gradient ◽

Extreme Gradient Boosting ◽

Spectral Libraries

Soil texture is a key soil property influencing many agronomic practices including fertilization and liming. Therefore, an accurate estimation of soil texture is essential for adopting sustainable soil management practices. In this study, we used different machine learning algorithms trained on vis–NIR spectra from existing soil spectral libraries (ICRAF and LUCAS) to predict soil textural fractions (sand–silt–clay %). In addition, we predicted the soil textural groups (G1: Fine, G2: Medium, and G3: Coarse) using routine chemical characteristics as auxiliary. With the ICRAF dataset, multilayer perceptron resulted in good predictions for sand and clay (R2 = 0.78 and 0.85, respectively) and categorical boosting outperformed the other algorithms (random forest, extreme gradient boosting, linear regression) for silt prediction (R2 = 0.81). For the LUCAS dataset, categorical boosting consistently showed a high performance for sand, silt, and clay predictions (R2 = 0.79, 0.76, and 0.85, respectively). Furthermore, the soil texture groups (G1, G2, and G3) were classified using the light gradient boosted machine algorithm with a high accuracy (83% and 84% for ICRAF and LUCAS, respectively). These results, using spectral data, are very promising for rapid diagnosis of soil texture and group in order to adjust agricultural practices.

Download Full-text

Machine learning techniques to predict daily rainfall amount

Journal Of Big Data ◽

10.1186/s40537-021-00545-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Chalachew Muluken Liyew ◽

Haileyesus Amsaya Melese

Keyword(s):

Machine Learning ◽

Pearson Correlation ◽

Daily Rainfall ◽

Learning Model ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Correlation Technique ◽

Learning Techniques ◽

Machine Learning Model ◽

Extreme Gradient Boosting

AbstractPredicting the amount of daily rainfall improves agricultural productivity and secures food and water supply to keep citizens healthy. To predict rainfall, several types of research have been conducted using data mining and machine learning techniques of different countries’ environmental datasets. An erratic rainfall distribution in the country affects the agriculture on which the economy of the country depends on. Wise use of rainfall water should be planned and practiced in the country to minimize the problem of the drought and flood occurred in the country. The main objective of this study is to identify the relevant atmospheric features that cause rainfall and predict the intensity of daily rainfall using machine learning techniques. The Pearson correlation technique was used to select relevant environmental variables which were used as an input for the machine learning model. The dataset was collected from the local meteorological office at Bahir Dar City, Ethiopia to measure the performance of three machine learning techniques (Multivariate Linear Regression, Random Forest, and Extreme Gradient Boost). Root mean squared error and Mean absolute Error methods were used to measure the performance of the machine learning model. The result of the study revealed that the Extreme Gradient Boosting machine learning algorithm performed better than others.

Download Full-text

HyP-ABC: A Novel Automated Hyper-Parameter Tuning Algorithm Using Evolutionary Optimization

10.36227/techrxiv.14714508.v3 ◽

2021 ◽

Author(s):

Leila Zahedi ◽

Farid Ghareh Mohammadi ◽

M. Hadi Amini

Keyword(s):

Real World ◽

Large Scale ◽

Convergence Rates ◽

Parameter Tuning ◽

Population Based ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Wide Range ◽

Extreme Gradient Boosting

<p>Machine learning techniques lend themselves as promising decision-making and analytic tools in a wide range of applications. Different ML algorithms have various hyper-parameters. In order to tailor an ML model towards a specific application working at its best, its hyper-parameters should be tuned. Tuning the hyper-parameters directly affects the performance. However, for large-scale search spaces, efficiently exploring the ample number of combinations of hyper-parameters is computationally expensive. Many of the automated hyper-parameter tuning techniques suffer from low convergence rates and high experimental time complexities. In this paper, we propose HyP-ABC, an automatic innovative hybrid hyper-parameter optimization algorithm using the modified artificial bee colony approach, to measure the classification accuracy of three ML algorithms: random forest, extreme gradient boosting, and support vector machine. In order to ensure the robustness of the proposed method, the algorithm takes a wide range of feasible hyper-parameter values and is tested using a real-world educational dataset. Experimental results show that HyP-ABC is competitive with state-of-the-art techniques. Also, it has fewer hyper-parameters to be tuned than other population-based algorithms, making it worthwhile for real-world HPO problems.</p>

Download Full-text