Machine Learning-Based Front Detection in Central Europe

Extreme weather phenomena such as wind gusts, heavy precipitation, hail, thunderstorms, tornadoes, and many others usually occur when there is a change in air mass and the passing of a weather front over a certain region. The climatology of weather fronts is difficult, since they are usually drawn onto maps manually by forecasters; therefore, the data concerning them are limited and the process itself is very subjective in nature. In this article, we propose an objective method for determining the position of weather fronts based on the random forest machine learning technique, digitized fronts from the DWD database, and ERA5 meteorological reanalysis. Several aspects leading to the improvement of scores are presented, such as adding new fields or dates to the training database or using the gradients of fields.

Download Full-text

Machine Learning Technique to Prognosis Diabetes Disease: Random Forest Classifier Approach

Advanced Computing and Intelligent Technologies - Lecture Notes in Networks and Systems ◽

10.1007/978-981-16-2164-2_19 ◽

2021 ◽

pp. 219-244

Author(s):

Prajyot Palimkar ◽

Rabindra Nath Shaw ◽

Ankush Ghosh

Keyword(s):

Machine Learning ◽

Random Forest ◽

Random Forest Classifier ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

Assessing the soil quality of Bansloi river basin, eastern India using soil-quality indices (SQIs) and Random Forest machine learning technique

Ecological Indicators ◽

10.1016/j.ecolind.2020.106804 ◽

2020 ◽

Vol 118 ◽

pp. 106804

Author(s):

Gopal Chandra Paul ◽

Sunil Saha ◽

Krishna Gopal Ghosh

Keyword(s):

Machine Learning ◽

Random Forest ◽

Soil Quality ◽

River Basin ◽

Eastern India ◽

Quality Indices ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

Mode Choice Prediction using Machine Learning Technique for A Door-to-Door Journey in Kuantan City

Mekatronika ◽

10.15282/mekatronika.v2i1.6745 ◽

2020 ◽

Vol 2 (1) ◽

pp. 73-78

Author(s):

Nur Fahriza Mohd Ali ◽

Ahmad Farhan Mohd Sadullah ◽

Anwar P.P. Abdul Majeed ◽

Mohd Azraai Mohd Razman ◽

Rabiu Muazu Musa

Keyword(s):

Machine Learning ◽

Random Forest ◽

Mode Choice ◽

Learning Models ◽

Machine Learning Technique ◽

Travel Mode Choice ◽

Testing Data ◽

Learning Technique ◽

The City ◽

Machine Learning Models

A door-to-door journey in a public transportation system is a notable concept that is practically being promoted among users to consider public transport as an important alternative. The door-to-door journey will integrate the travel segments starting from home to destination, including all visible amenities. Users’ preferences on the time travel of these key segments are necessary to be understood. In this case, Machine Learning technique has been seen as a robust computational advancement to forecast their travel mode choice. However, the most convenient model as the best predictor is still questionable. To address this issue, we employed some pre-eminent machine learning models, specifically Random Forest (RF), Naïve Bayes (NB), Logistic Regression (LR), k-Nearest Neighbor (kNN) as well as Support Vector Machine (SVM), to compare their travel mode choice prediction performance of users in the city of Kuantan. The data collection was conducted in Kuantan City via Revealed/Stated Preferences (RPSP) Survey between 8:00 AM to 5:00 PM on weekdays. The data collected was split into a ratio of 80:20 for training and testing before evaluating them between the aforesaid models. The results depicted that the Random Forest could provide satisfactory classification accuracies for both training and testing data up to 68.3% and 61.3%, respectively, compared to the other evaluated machine learning models. In summary, Random Forest provides a good result in the training and testing data and is considered as the best predictor in this research to forecast users’ mode choice in the city of Kuantan.

Download Full-text

Validity of the EQ-5D-5L questionnaire among the general population of Poland

Quality of Life Research ◽

10.1007/s11136-020-02667-3 ◽

2020 ◽

Author(s):

Katarzyna Młyńczak ◽

Dominik Golicki

Keyword(s):

Machine Learning ◽

Random Forest ◽

General Population ◽

Construct Validity ◽

Psychometric Properties ◽

Machine Learning Technique ◽

Validity Assessment ◽

Novel Approach ◽

Learning Technique ◽

Eq Vas

Abstract Purpose We aim to compare the psychometric properties of the EQ-5D-5L questionnaire with the EQ-5D-3L version and EQ VAS, based on a survey conducted in a sample representing the general adult population of Poland. Methods The survey comprised health-related quality of life (HRQoL) questionnaires: EQ-5D-5L, EQ VAS, SF-12 and EQ-5D-3L, together with demographic and socio-economic characteristics items. The EQ-5D index values were estimated based on a directly measured value set for Poland. The following psychometric properties were analysed: feasibility, distribution of responses, redistribution from EQ-5D-3L to EQ-5D-5L, inconsistencies, ceiling effects, informativity power and construct validity. We proposed a novel approach to the construct validity assessment, based on the use of a machine learning technique known as the random forest algorithm. Results From March to June 2014, 3978 subjects (aged 18–87, 53.2% female) were surveyed. The EQ-5D-5L questionnaire had a lower ceiling effect compared to EQ-5D-3L (38.0% vs 46.6%). Redistribution from EQ-5D-3L to EQ-5D-5L was similar for each dimension, and the mean inconsistency did not exceed 5%. The results of known-groups validation confirmed the hypothesis concerning the relationship between the EQ-5D index values and age, sex and occurrence of diabetes. Conclusions The EQ-5D-5L, in comparison with its EQ-5D-3L equivalent, showed similar or better psychometric properties within the general population of a country. We assessed the construct validity of the questionnaire with a novel approach that was based on a machine learning technique known as the random forest algorithm.

Download Full-text

Rainrate Estimation from FY-4A Cloud Top Temperature for Mesoscale Convective Systems by Using Machine Learning Algorithm

Remote Sensing ◽

10.3390/rs13163273 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3273

Author(s):

Ping Lao ◽

Qi Liu ◽

Yuhao Ding ◽

Yu Wang ◽

Yuan Li ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Mesoscale Convective Systems ◽

Support Vector ◽

Positive Contribution ◽

Machine Learning Technique ◽

Convective Systems ◽

Learning Technique ◽

Mesoscale Convective ◽

Cloud Top Temperature

Satellite rainrate estimation is a great challenge, especially in mesoscale convective systems (MCSs), which is mainly due to the absence of a direct physical connection between observable cloud parameters and surface rainrate. The machine learning technique was employed in this study to estimate rainrate in the MCS domain via using cloud top temperature (CTT) derived from a geostationary satellite. Five kinds of machine learning models were investigated, i.e., polynomial regression, support vector machine, decision tree, random forest, and multilayer perceptron, and the precipitation of Climate Prediction Center morphing technique (CMORPH) was used as the reference. A total of 31 CTT related features were designed to be the potential inputs for training an algorithm, and they were all proved to have a positive contribution in modulating the algorithm. Random forest (RF) shows the best performance among the five kinds of models. By combining the classification and regression schemes of the RF model, an RF-based hybrid algorithm was proposed first to discriminate the rainy pixel and then estimate its rainrate. For the MCS samples considered in this study, such an algorithm generates the best estimation, and its accuracy is definitely higher than the operational precipitation product of FY-4A. These results demonstrate the promising feasibility of applying a machine learning technique to solve the satellite precipitation retrieval problem.

Download Full-text

The effects of applying filters on EEG signals for classifying developers’ code comprehension

Journal of Applied Research and Technology ◽

10.22201/icat.24486736e.2021.19.6.1299 ◽

2021 ◽

Vol 19 (6) ◽

pp. 584-602

Author(s):

Lucian Jose Gonçales ◽

Kleinner Farias ◽

Lucas Kupssinskü ◽

Matheus Segalotto

Keyword(s):

Machine Learning ◽

Software Engineering ◽

Random Forest ◽

Random Forest Classifier ◽

Machine Learning Techniques ◽

Eeg Signals ◽

Machine Learning Technique ◽

Learning Techniques ◽

Learning Technique ◽

F Measure

EEG signals are a relevant indicator for measuring aspects related to human factors in Software Engineering. EEG is used in software engineering to train machine learning techniques for a wide range of applications, including classifying task difficulty, and developers’ level of experience. The EEG signal contains noise such as abnormal readings, electrical interference, and eye movements, which are usually not of interest to the analysis, and therefore contribute to the lack of precision of the machine learning techniques. However, research in software engineering has not evidenced the effectiveness when applying these filters on EEG signals. The objective of this work is to analyze the effectiveness of filters on EEG signals in the software engineering context. As literature did not focus on the classification of developers’ code comprehension, this study focuses on the analysis of the effectiveness of applying EEG filters for training a machine learning technique to classify developers' code comprehension. A Random Forest (RF) machine learning technique was trained with filtered EEG signals to classify the developers' code comprehension. This study also trained another random forest classifier with unfiltered EEG data. Both models were trained using 10-fold cross-validation. This work measures the classifiers' effectiveness using the f-measure metric. This work used the t-test, Wilcoxon, and U Mann Whitney to analyze the difference in the effectiveness measures (f-measure) between the classifier trained with filtered EEG and the classifier trained with unfiltered EEG. The tests pointed out that there is a significant difference after applying EEG filters to classify developers' code comprehension with the random forest classifier. The conclusion is that the use of EEG filters significantly improves the effectivity to classify code comprehension using the random forest technique.

Download Full-text

Genre e-sport gaming tournament classification using machine learning technique based on decision tree, Naïve Bayes, and random forest algorithm

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1088/1/012037 ◽

2021 ◽

Vol 1088 (1) ◽

pp. 012037

Author(s):

Arif Rinaldi Dikananda ◽

Irfan Ali ◽

Fathurrohman ◽

Rizki Ade Rinaldi ◽

Iin

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Random Forest Algorithm ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

Predictive model of cardiac arrest in smokers using machine learning technique based on Heart Rate Variability parameter

Applied Computing and Informatics ◽

10.1016/j.aci.2019.06.002 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Cited By ~ 5

Author(s):

R. Shashikant ◽

P. Chetankumar

Keyword(s):

Machine Learning ◽

Heart Rate ◽

Heart Rate Variability ◽

Logistic Regression ◽

Cardiac Arrest ◽

Random Forest ◽

Decision Tree ◽

Machine Learning Technique ◽

Forest Model ◽

Learning Technique

Cardiac arrest is a severe heart anomaly that results in billions of annual casualties. Smoking is a specific hazard factor for cardiovascular pathology, including coronary heart disease, but data on smoking and heart death not earlier reviewed. The Heart Rate Variability (HRV) parameters used to predict cardiac arrest in smokers using machine learning technique in this paper. Machine learning is a method of computing experience based on automatic learning and enhances performances to increase prognosis. This study intends to compare the performance of logistical regression, decision tree, and random forest model to predict cardiac arrest in smokers. In this paper, a machine learning technique implemented on the dataset received from the data science research group MITU Skillogies Pune, India. To know the patient has a chance of cardiac arrest or not, developed three predictive models as 19 input feature of HRV indices and two output classes. These model evaluated based on their accuracy, precision, sensitivity, specificity, F1 score, and Area under the curve (AUC). The model of logistic regression has achieved an accuracy of 88.50%, precision of 83.11%, the sensitivity of 91.79%, the specificity of 86.03%, F1 score of 0.87, and AUC of 0.88. The decision tree model has arrived with an accuracy of 92.59%, precision of 97.29%, the sensitivity of 90.11%, the specificity of 97.38%, F1 score of 0.93, and AUC of 0.94. The model of the random forest has achieved an accuracy of 93.61%, precision of 94.59%, the sensitivity of 92.11%, the specificity of 95.03%, F1 score of 0.93 and AUC of 0.95. The random forest model achieved the best accuracy classification, followed by the decision tree, and logistic regression shows the lowest classification accuracy.

Download Full-text

Random Forest Machine Learning Technique for Automatic Vegetation Detection and Modelling in LiDAR Data

International Journal of Environmental Sciences & Natural Resources ◽

10.19080/ijesnr.2021.28.556234 ◽

2021 ◽

Vol 28 (2) ◽

Author(s):

Fayez Tarsha Kurdi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Lidar Data ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

Prediction of the Concentration of Dissolved Oxygen in Running Water by Employing A Random Forest Machine Learning Technique

10.20944/preprints202004.0342.v1 ◽

2020 ◽

Author(s):

Mohammad Hafez Ahmed

Keyword(s):

Machine Learning ◽

Water Quality ◽

Random Forest ◽

Dissolved Oxygen ◽

Water Temperature ◽

Quality Data ◽

Linear Regression Method ◽

Machine Learning Technique ◽

Learning Technique ◽

Input Variables

Dissolved oxygen (DO) is a key indicator in the study of the ecological health of rivers. Modeling DO is a major challenge due to complex interactions among various process components of it. Considering the vital importance of it in water bodies, the accurate prediction of DO is a critical issue in ecosystem management. Given the intricacy of the current process-based water quality models, a data-driven model could be an effective alternative tool. In this study, a random forest machine learning technique is employed to predict the DO level by identifying its major drivers. Time-series of half-hourly water quality data, spanning from 2007 to 2019, for the South Branch Potomac River near Springfield, WV, are obtained from the United States Geological Survey database. Key drivers are identified, and models are formulated for different scenarios of input variables. The model is calibrated for each input scenario using 80% of the data. Water temperature and pH are found to be the most influential predictors of DO. However, satisfactory model performance is achieved by considering water temperature, pH, and specific conductance as input variables. The model validation is made by predicting DO concentrations for the remaining 20% of the data. The comparison with the traditional multiple linear regression method shows that the random forest model performs significantly better. The study insights are, therefore, expected to be useful to estimate stream/river DO levels at various sites with a minimum number of predictors and help build a sturdy framework for ecosystem health management across an environmental gradient.

Download Full-text