Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods

Estimating air pollution exposure has long been a challenge for environmental health researchers. Technological advances and novel machine learning methods have allowed us to increase the geographic range and accuracy of exposure models, making them a valuable tool in conducting health studies and identifying hotspots of pollution. Here, we have created a prediction model for daily PM2.5 levels in the Greater London area from 1st January 2005 to 31st December 2013 using an ensemble machine learning approach incorporating satellite aerosol optical depth (AOD), land use, and meteorological data. The predictions were made on a 1 km × 1 km scale over 3960 grid cells. The ensemble included predictions from three different machine learners: a random forest (RF), a gradient boosting machine (GBM), and a k-nearest neighbor (KNN) approach. Our ensemble model performed very well, with a ten-fold cross-validated R2 of 0.828. Of the three machine learners, the random forest outperformed the GBM and KNN. Our model was particularly adept at predicting day-to-day changes in PM2.5 levels with an out-of-sample temporal R2 of 0.882. However, its ability to predict spatial variability was weaker, with a R2 of 0.396. We believe this to be due to the smaller spatial variation in pollutant levels in this area.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

Ability of a Machine Learning Algorithm to Predict the Need for Perioperative Red Blood Cells Transfusion in Pelvic Fracture Patients: A Multicenter Cohort Study in China

Frontiers in Medicine ◽

10.3389/fmed.2021.694733 ◽

2021 ◽

Vol 8 ◽

Author(s):

Xueyuan Huang ◽

Yongjun Wang ◽

Bingyu Chen ◽

Yuanshuai Huang ◽

Xinhua Wang ◽

...

Keyword(s):

Machine Learning ◽

Cohort Study ◽

Random Forest ◽

Blood Cells ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Kappa Coefficient ◽

Gradient Boosting ◽

Machine Learning Algorithm ◽

K Nearest Neighbor

Background: Predicting the perioperative requirement for red blood cells (RBCs) transfusion in patients with the pelvic fracture may be challenging. In this study, we constructed a perioperative RBCs transfusion predictive model (ternary classifications) based on a machine learning algorithm.Materials and Methods: This study included perioperative adult patients with pelvic trauma hospitalized across six Chinese centers between September 2012 and June 2019. An extreme gradient boosting (XGBoost) algorithm was used to predict the need for perioperative RBCs transfusion, with data being split into training test (80%), which was subjected to 5-fold cross-validation, and test set (20%). The ability of the predictive transfusion model was compared with blood preparation based on surgeons' experience and other predictive models, including random forest, gradient boosting decision tree, K-nearest neighbor, logistic regression, and Gaussian naïve Bayes classifier models. Data of 33 patients from one of the hospitals were prospectively collected for model validation.Results: Among 510 patients, 192 (37.65%) have not received any perioperative RBCs transfusion, 127 (24.90%) received less-transfusion (RBCs < 4U), and 191 (37.45%) received more-transfusion (RBCs ≥ 4U). Machine learning-based transfusion predictive model produced the best performance with the accuracy of 83.34%, and Kappa coefficient of 0.7967 compared with other methods (blood preparation based on surgeons' experience with the accuracy of 65.94%, and Kappa coefficient of 0.5704; the random forest method with an accuracy of 82.35%, and Kappa coefficient of 0.7858; the gradient boosting decision tree with an accuracy of 79.41%, and Kappa coefficient of 0.7742; the K-nearest neighbor with an accuracy of 53.92%, and Kappa coefficient of 0.3341). In the prospective dataset, it also had a food performance with accuracy 81.82%.Conclusion: This multicenter retrospective cohort study described the construction of an accurate model that could predict perioperative RBCs transfusion in patients with pelvic fractures.

Download Full-text

Studi Komparasi Metode Machine Learning untuk Klasifikasi Citra Huruf Vokal Hiragana

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i3.3083 ◽

2021 ◽

Vol 5 (3) ◽

pp. 905

Author(s):

Muhammad Afrizal Amrustian ◽

Vika Febri Muliati ◽

Elsa Elvira Awal

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Image Classification ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods ◽

The Comparative Study

Japanese is one of the most difficult languages to understand and read. Japanese writing that does not use the alphabet is the reason for the difficulty of the Japanese language to read. There are three types of Japanese, namely kanji, katakana, and hiragana. Hiragana letters are the most commonly used type of writing. In addition, hiragana has a cursive nature, so each person's writing will be different. Machine learning methods can be used to read Japanese letters by recognizing the image of the letters. The Japanese letters that are used in this study are hiragana vowels. This study focuses on conducting a comparative study of machine learning methods for the image classification of Japanese letters. The machine learning methods that were successfully compared are Naïve Bayes, Support Vector Machine, Decision Tree, Random Forest, and K-Nearest Neighbor. The results of the comparative study show that the K-Nearest Neighbor method is the best method for image classification of hiragana vowels. K-Nearest Neighbor gets an accuracy of 89.4% with a low error rate.

Download Full-text

Ensemble machine learning methods for spatio-temporal data analysis of plant and ratoon sugarcane

Intelligent Data Analysis ◽

10.3233/ida-205302 ◽

2021 ◽

Vol 25 (5) ◽

pp. 1291-1322

Author(s):

Sandeep Kumar Singla ◽

Rahul Dev Garg ◽

Om Prakash Dubey

Keyword(s):

Machine Learning ◽

Random Forest ◽

Binary Classification ◽

Temporal Variations ◽

Classification Model ◽

Gradient Boosting ◽

Remotely Sensed Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Classification And Regression

Recent technological enhancements in the field of information technology and statistical techniques allowed the sophisticated and reliable analysis based on machine learning methods. A number of machine learning data analytical tools may be exploited for the classification and regression problems. These tools and techniques can be effectively used for the highly data-intensive operations such as agricultural and meteorological applications, bioinformatics and stock market analysis based on the daily prices of the market. Machine learning ensemble methods such as Decision Tree (C5.0), Classification and Regression (CART), Gradient Boosting Machine (GBM) and Random Forest (RF) has been investigated in the proposed work. The proposed work demonstrates that temporal variations in the spectral data and computational efficiency of machine learning methods may be effectively used for the discrimination of types of sugarcane. The discrimination has been considered as a binary classification problem to segregate ratoon from plantation sugarcane. Variable importance selection based on Mean Decrease in Accuracy (MDA) and Mean Decrease in Gini (MDG) have been used to create the appropriate dataset for the classification. The performance of the binary classification model based on RF is the best in all the possible combination of input images. Feature selection based on MDA and MDG measures of RF is also important for the dimensionality reduction. It has been observed that RF model performed best with 97% accuracy, whereas the performance of GBM method is the lowest. Binary classification based on the remotely sensed data can be effectively handled using random forest method.

Download Full-text

Evaluating Machine Learning Methods for Predicting Diabetes among Female Patients in Bangladesh

Information ◽

10.3390/info11080374 ◽

2020 ◽

Vol 11 (8) ◽

pp. 374

Author(s):

Badiuzzaman Pranto ◽

Sk. Maliha Mehnaz ◽

Esha Bintee Mahid ◽

Imran Mahmud Sadman ◽

Ahsanur Rahman ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Bayes Classifier ◽

K Nearest Neighbor ◽

Machine Learning Methods ◽

Learning Techniques

Machine Learning has a significant impact on different aspects of science and technology including that of medical researches and life sciences. Diabetes Mellitus, more commonly known as diabetes, is a chronic disease that involves abnormally high levels of glucose sugar in blood cells and the usage of insulin in the human body. This article has focused on analyzing diabetes patients as well as detection of diabetes using different Machine Learning techniques to build up a model with a few dependencies based on the PIMA dataset. The model has been tested on an unseen portion of PIMA and also on the dataset collected from Kurmitola General Hospital, Dhaka, Bangladesh. The research is conducted to demonstrate the performance of several classifiers trained on a particular country’s diabetes dataset and tested on patients from a different country. We have evaluated decision tree, K-nearest neighbor, random forest, and Naïve Bayes in this research and the results show that both random forest and Naïve Bayes classifier performed well on both datasets.

Download Full-text

Metabolic Syndrome Prediction Models Using Machine Learning and Sasang Constitution Type

Evidence-based Complementary and Alternative Medicine ◽

10.1155/2021/8315047 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Ji-Eun Park ◽

Sujeong Mun ◽

Siwoo Lee

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Prediction Models ◽

Support Vector ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods ◽

Sasang Constitution ◽

Constitution Type ◽

Conventional Regression

Background. Machine learning may be a useful tool for predicting metabolic syndrome (MetS), and previous studies also suggest that the risk of MetS differs according to Sasang constitution type. The present study investigated the development of MetS prediction models utilizing machine learning methods and whether the incorporation of Sasang constitution type could improve the performance of those prediction models. Methods. Participants visiting a medical center for a health check-up were recruited in 2005 and 2006. Six kinds of machine learning were utilized (K-nearest neighbor, naive Bayes, random forest, decision tree, multilayer perceptron, and support vector machine), as was conventional logistic regression. Machine learning-derived MetS prediction models with and without the incorporation of Sasang constitution type were compared to investigate whether the former would predict MetS with higher sensitivity. Age, sex, education level, marital status, body mass index, stress, physical activity, alcohol consumption, and smoking were included as potentially predictive factors. Results. A total of 750/2,871 participants had MetS. Among the six types of machine learning methods investigated, multiplayer perceptron and support vector machine exhibited the same performance as the conventional regression method, based on the areas under the receiver operating characteristic curves. The naive-Bayes method exhibited the highest sensitivity (0.49), which was higher than that of the conventional regression method (0.39). The incorporation of Sasang constitution type improved the sensitivity of all of the machine learning methods investigated except for the K-nearest neighbor method. Conclusion. Machine learning-derived models may be useful for MetS prediction, and the incorporation of Sasang constitution type may increase the sensitivity of such models.

Download Full-text

COMPARISON OF MACHINE LEARNING METHODS IN CLASSIFYING POVERTY IN INDONESIA IN 2018

Jurnal Teknik Informatika (Jutif) ◽

10.20884/1.jutif.2021.2.1.52 ◽

2021 ◽

Vol 2 (1) ◽

pp. 51-56

Author(s):

Pardomuan Robinson Sihombing ◽

Ade Marsinta Arsani

Keyword(s):

Machine Learning ◽

Sampling Method ◽

Nearest Neighbor ◽

Choice Model ◽

Imbalanced Data ◽

K Nearest Neighbor ◽

Learning Methods ◽

Rotation Forest ◽

Machine Learning Classification ◽

Machine Learning Methods

Poverty is still one of the main problems in economic development besides inequality, unemployment, and economic growth. This study aims to model poverty directly using a discrete choice model, namely the machine learning classification method. The data used are imbalanced data where one of the categories is small enough so that the resample of both sampling method is used. In this study, several machine learning methods were applied, including the Decision Tree, Naïve Bayes, K-Nearest Neighbor (KNN), and Rotation Forest. The results show that the technique of using resample both samplings provides optimal results for the four machine learning methods. If viewed from the indicators of accuracy, specificity, sensitivity, AUC, and the highest Kappa coefficient produced, the best method is the KNN method. The KNN model has an accuracy value of 0.73 percent, sensitivity of 0.68 percent, specificity of 78 percent, and AUC of 0.73.

Download Full-text

STATISTICAL PREDICTION OF EMOTIONAL STATES BY PHYSIOLOGICAL SIGNALS WITH MANOVA AND MACHINE LEARNING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001412500085 ◽

2012 ◽

Vol 26 (04) ◽

pp. 1250008 ◽

Cited By ~ 7

Author(s):

TUNG-HUNG CHUEH ◽

TAI-BEEN CHEN ◽

HENRY HORNG-SHING LU ◽

SHAN-SHAN JU ◽

TEH-HO TAO ◽

...

Keyword(s):

Machine Learning ◽

Logistic Model ◽

Nearest Neighbor ◽

Statistical Technique ◽

Physiological Signals ◽

Statistical Prediction ◽

Emotional States ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods

For the importance of communication between human and machine interface, it would be valuable to develop an implement which has the ability to recognize emotional states. In this paper, we proposed an approach which can deal with the daily dependence and personal dependence in the data of multiple subjects and samples. 30 features were extracted from the physiological signals of subject for three states of emotion. The physiological signals measured were: electrocardiogram (ECG), skin temperature (SKT) and galvanic skin response (GSR). After removing the daily dependence and personal dependence by the statistical technique of MANOVA, six machine learning methods including Bayesian network learning, naive Bayesian classification, SVM, decision tree of C4.5, Logistic model and K-nearest-neighbor (KNN) were implemented to differentiate the emotional states. The results showed that Logistic model gives the best classification accuracy and the statistical technique of MANOVA can significantly improve the performance of all six machine learning methods in emotion recognition system.

Download Full-text

The Tomatoes and Chilies Type Classifications by Using Machine Learning Methods

Journal of Development Research ◽

10.28926/jdr.v4i1.93 ◽

2020 ◽

Vol 4 (1) ◽

pp. 1-6

Author(s):

Irzal Ahmad Sabilla ◽

Chastine Fatichah

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Support Vector ◽

Staple Food ◽

K Nearest Neighbor ◽

Learning Methods ◽

Linear Discriminant ◽

Machine Learning Methods

Vegetables are ingredients for flavoring, such as tomatoes and chilies. A Both of these ingredients are processed to accompany the people's staple food in the form of sauce and seasoning. In supermarkets, these vegetables can be found easily, but many people do not understand how to choose the type and quality of chilies and tomatoes. This study discusses the classification of types of cayenne, curly, green, red chilies, and tomatoes with good and bad conditions using machine learning and contrast enhancement techniques. The machine learning methods used are Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), Linear Discriminant Analysis (LDA), and Random Forest (RF). The results of testing the best method are measured based on the value of accuracy. In addition to the accuracy of this study, it also measures the speed of computation so that the methods used are efficient.

Download Full-text

Integrating Machine/Deep Learning Methods and Filtering Techniques for Reliable Mineral Phase Segmentation of 3D X-ray Computed Tomography Images

Energies ◽

10.3390/en14154595 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4595

Author(s):

Parisa Asadi ◽

Lauren E. Beckingham

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Ct Images ◽

Ct Imaging ◽

Learning Method ◽

Learning Methods ◽

X Ray ◽

Machine Learning Methods ◽

Filtering Techniques

X-ray CT imaging provides a 3D view of a sample and is a powerful tool for investigating the internal features of porous rock. Reliable phase segmentation in these images is highly necessary but, like any other digital rock imaging technique, is time-consuming, labor-intensive, and subjective. Combining 3D X-ray CT imaging with machine learning methods that can simultaneously consider several extracted features in addition to color attenuation, is a promising and powerful method for reliable phase segmentation. Machine learning-based phase segmentation of X-ray CT images enables faster data collection and interpretation than traditional methods. This study investigates the performance of several filtering techniques with three machine learning methods and a deep learning method to assess the potential for reliable feature extraction and pixel-level phase segmentation of X-ray CT images. Features were first extracted from images using well-known filters and from the second convolutional layer of the pre-trained VGG16 architecture. Then, K-means clustering, Random Forest, and Feed Forward Artificial Neural Network methods, as well as the modified U-Net model, were applied to the extracted input features. The models’ performances were then compared and contrasted to determine the influence of the machine learning method and input features on reliable phase segmentation. The results showed considering more dimensionality has promising results and all classification algorithms result in high accuracy ranging from 0.87 to 0.94. Feature-based Random Forest demonstrated the best performance among the machine learning models, with an accuracy of 0.88 for Mancos and 0.94 for Marcellus. The U-Net model with the linear combination of focal and dice loss also performed well with an accuracy of 0.91 and 0.93 for Mancos and Marcellus, respectively. In general, considering more features provided promising and reliable segmentation results that are valuable for analyzing the composition of dense samples, such as shales, which are significant unconventional reservoirs in oil recovery.

Download Full-text