Research on the usability of different machine learning methods in visibility forecasting

Haze pollution, mainly characterized by low visibility, is one of the main environmental problems currently faced by China. Accurate haze forecasts facilitate the implementation of preventive measures to control the emission of air pollutants and, thereby mitigate haze pollution. However, it is not easy to accurately predict the low visibility events induced by haze, which requires not only accurate prediction for weather elements, but also refined and real-time updated source emission inventory. In order to obtain reliable forecasting tools, this paper studies the usability of several popular machine learning methods, such as support vector machine, k-nearest neighbor, random forest, as well as several deep learning methods, on the visibility forecasting. Starting from the main factors related to visibility, the relationships between wind speed, wind direction, temperature, humidity, and visibility are discussed. Training and forecasting were performed using the machine learning methods. The accuracy of these methods in visibility forecasting was confirmed through several parameters (i.e., root-mean-square error, mean absolute error, and mean absolute percentage error). The results show that: (1) Among all meteorological parameters, wind speed was the best at reflecting the visibility change patterns; (2) RNN LSTM, and GRU methods performs almost equally well on short-term visibility forecasts(i.e. 1h, 3h, and 6h); (3) A classical machine learning method (i.e. the SVM) performs well in mid- and long-term visibility forecasts; (4) The machine learning methods also have a certain degree of forecast accuracy even for long time periods (e.g. of 72h).

Download Full-text

Studi Komparasi Metode Machine Learning untuk Klasifikasi Citra Huruf Vokal Hiragana

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i3.3083 ◽

2021 ◽

Vol 5 (3) ◽

pp. 905

Author(s):

Muhammad Afrizal Amrustian ◽

Vika Febri Muliati ◽

Elsa Elvira Awal

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Image Classification ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods ◽

The Comparative Study

Japanese is one of the most difficult languages to understand and read. Japanese writing that does not use the alphabet is the reason for the difficulty of the Japanese language to read. There are three types of Japanese, namely kanji, katakana, and hiragana. Hiragana letters are the most commonly used type of writing. In addition, hiragana has a cursive nature, so each person's writing will be different. Machine learning methods can be used to read Japanese letters by recognizing the image of the letters. The Japanese letters that are used in this study are hiragana vowels. This study focuses on conducting a comparative study of machine learning methods for the image classification of Japanese letters. The machine learning methods that were successfully compared are Naïve Bayes, Support Vector Machine, Decision Tree, Random Forest, and K-Nearest Neighbor. The results of the comparative study show that the K-Nearest Neighbor method is the best method for image classification of hiragana vowels. K-Nearest Neighbor gets an accuracy of 89.4% with a low error rate.

Download Full-text

The prediction of the wind speed at different heights by machine learning methods

An International Journal of Optimization and Control Theories & Applications (IJOCTA) ◽

10.11121/ijocta.01.2016.00315 ◽

2016 ◽

Vol 6 (2) ◽

pp. 179-187 ◽

Cited By ~ 6

Author(s):

Yusuf S. Türkan ◽

Hacer Yumurtacı Aydoğmuş ◽

Hamit Erdal

Keyword(s):

Machine Learning ◽

Decision Making ◽

Renewable Energy ◽

Wind Speed ◽

Wind Farms ◽

Support Vector ◽

Learning Methods ◽

Wind Speeds ◽

Machine Learning Methods ◽

Successful Technique

In Turkey, many enterprisers started to make investment on renewable energy systems after new legal regulations and stimulus packages about production of renewable energy were introduced. Out of many alternatives, production of electricity via wind farms is one of the leading systems. For these systems, the wind speed values measured prior to the establishment of the farms are extremely important in both decision making and in the projection of the investment. However, the measurement of the wind speed at different heights is a time consuming and expensive process. For this reason, the success of the techniques predicting the wind speeds is fairly important in fast and reliable decision-making for investment in wind farms. In this study, the annual wind speed values of Kutahya, one of the regions in Turkey that has potential for wind energy at two different heights, were used and with the help of speed values at 10 m, wind speed values at 30 m of height were predicted by seven different machine learning methods. The results of the analysis were compared with each other. The results show that support vector machines is a successful technique in the prediction of the wind speed for different heights.

Download Full-text

Prediction of Neutralization Depth of R.C. Bridges Using Machine Learning Methods

Crystals ◽

10.3390/cryst11020210 ◽

2021 ◽

Vol 11 (2) ◽

pp. 210

Author(s):

Kangkang Duan ◽

Shuangyin Cao ◽

Jinbao Li ◽

Chongfa Xu

Keyword(s):

Machine Learning ◽

Recall Rate ◽

Kernel Functions ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods ◽

Radial Basis ◽

Prediction Problems

Machine learning techniques have become a popular solution to prediction problems. These approaches show excellent performance without being explicitly programmed. In this paper, 448 sets of data were collected to predict the neutralization depth of concrete bridges in China. Random forest was used for parameter selection. Besides this, four machine learning methods, such as support vector machine (SVM), k-nearest neighbor (KNN) and XGBoost, were adopted to develop models. The results show that machine learning models obtain a high accuracy (>80%) and an acceptable macro recall rate (>80%) even with only four parameters. For SVM models, the radial basis function has a better performance than other kernel functions. The radial basis kernel SVM method has the highest verification accuracy (91%) and the highest macro recall rate (86%). Besides this, the preference of different methods is revealed in this study.

Download Full-text

Metabolic Syndrome Prediction Models Using Machine Learning and Sasang Constitution Type

Evidence-based Complementary and Alternative Medicine ◽

10.1155/2021/8315047 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Ji-Eun Park ◽

Sujeong Mun ◽

Siwoo Lee

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Prediction Models ◽

Support Vector ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods ◽

Sasang Constitution ◽

Constitution Type ◽

Conventional Regression

Background. Machine learning may be a useful tool for predicting metabolic syndrome (MetS), and previous studies also suggest that the risk of MetS differs according to Sasang constitution type. The present study investigated the development of MetS prediction models utilizing machine learning methods and whether the incorporation of Sasang constitution type could improve the performance of those prediction models. Methods. Participants visiting a medical center for a health check-up were recruited in 2005 and 2006. Six kinds of machine learning were utilized (K-nearest neighbor, naive Bayes, random forest, decision tree, multilayer perceptron, and support vector machine), as was conventional logistic regression. Machine learning-derived MetS prediction models with and without the incorporation of Sasang constitution type were compared to investigate whether the former would predict MetS with higher sensitivity. Age, sex, education level, marital status, body mass index, stress, physical activity, alcohol consumption, and smoking were included as potentially predictive factors. Results. A total of 750/2,871 participants had MetS. Among the six types of machine learning methods investigated, multiplayer perceptron and support vector machine exhibited the same performance as the conventional regression method, based on the areas under the receiver operating characteristic curves. The naive-Bayes method exhibited the highest sensitivity (0.49), which was higher than that of the conventional regression method (0.39). The incorporation of Sasang constitution type improved the sensitivity of all of the machine learning methods investigated except for the K-nearest neighbor method. Conclusion. Machine learning-derived models may be useful for MetS prediction, and the incorporation of Sasang constitution type may increase the sensitivity of such models.

Download Full-text

A Comparison of Machine Learning Methods to Forecast Tropospheric Ozone Levels in Delhi

Atmosphere ◽

10.3390/atmos13010046 ◽

2021 ◽

Vol 13 (1) ◽

pp. 46

Author(s):

Eliana Kai Juarez ◽

Mark R. Petersen

Keyword(s):

Machine Learning ◽

Short Term Memory ◽

Ground Level ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Learning Methods ◽

Ground Level Ozone ◽

Machine Learning Methods ◽

Hourly Data

Ground-level ozone is a pollutant that is harmful to urban populations, particularly in developing countries where it is present in significant quantities. It greatly increases the risk of heart and lung diseases and harms agricultural crops. This study hypothesized that, as a secondary pollutant, ground-level ozone is amenable to 24 h forecasting based on measurements of weather conditions and primary pollutants such as nitrogen oxides and volatile organic compounds. We developed software to analyze hourly records of 12 air pollutants and 5 weather variables over the course of one year in Delhi, India. To determine the best predictive model, eight machine learning algorithms were tuned, trained, tested, and compared using cross-validation with hourly data for a full year. The algorithms, ranked by R2 values, were XGBoost (0.61), Random Forest (0.61), K-Nearest Neighbor Regression (0.55), Support Vector Regression (0.48), Decision Trees (0.43), AdaBoost (0.39), and linear regression (0.39). When trained by separate seasons across five years, the predictive capabilities of all models increased, with a maximum R2 of 0.75 during winter. Bidirectional Long Short-Term Memory was the least accurate model for annual training, but had some of the best predictions for seasonal training. Out of five air quality index categories, the XGBoost model was able to predict the correct category 24 h in advance 90% of the time when trained with full-year data. Separated by season, winter is considerably more predictable (97.3%), followed by post-monsoon (92.8%), monsoon (90.3%), and summer (88.9%). These results show the importance of training machine learning methods with season-specific data sets and comparing a large number of methods for specific applications.

Download Full-text

The Tomatoes and Chilies Type Classifications by Using Machine Learning Methods

Journal of Development Research ◽

10.28926/jdr.v4i1.93 ◽

2020 ◽

Vol 4 (1) ◽

pp. 1-6

Author(s):

Irzal Ahmad Sabilla ◽

Chastine Fatichah

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Support Vector ◽

Staple Food ◽

K Nearest Neighbor ◽

Learning Methods ◽

Linear Discriminant ◽

Machine Learning Methods

Vegetables are ingredients for flavoring, such as tomatoes and chilies. A Both of these ingredients are processed to accompany the people's staple food in the form of sauce and seasoning. In supermarkets, these vegetables can be found easily, but many people do not understand how to choose the type and quality of chilies and tomatoes. This study discusses the classification of types of cayenne, curly, green, red chilies, and tomatoes with good and bad conditions using machine learning and contrast enhancement techniques. The machine learning methods used are Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), Linear Discriminant Analysis (LDA), and Random Forest (RF). The results of testing the best method are measured based on the value of accuracy. In addition to the accuracy of this study, it also measures the speed of computation so that the methods used are efficient.

Download Full-text

A Novel Integration of Hodrick-Prescott Filter and Harmonic Analysis with Machine Learning Methods to Enhance Time Series Prediction Accuracy of Daily and Monthly Wind Speeds

10.21203/rs.3.rs-794022/v1 ◽

2021 ◽

Author(s):

Chigbogu Godwin Ozoegwu

Keyword(s):

Machine Learning ◽

Time Series ◽

Wind Speed ◽

Harmonic Analysis ◽

Hybrid Algorithm ◽

Support Vector ◽

Learning Methods ◽

Wind Speeds ◽

Machine Learning Methods ◽

Tropical Conditions

Abstract In this work, a new hybrid algorithm for modelling time series of daily and monthly wind speed is proposed. The method utilizes Hodrick-Prescott Filter (HPF) to decompose raw wind speed data into trend and cyclic components, and harmonic analysis (HA) is thereafter used to decompose the cyclic component into the periodic and stochastic sub-components. Machine learning (ML) methods are then used to model the time series of both the trend and stochastic components. The predicted wind speeds are finally summed from the individual predictions of the ML methods and harmonic analyses. To highlight the considerably higher predictive accuracy that results from the introduced data pre-treatments with HPF and HA, the proposed hybrid algorithm is compared against the traditional ML methods that are not subjected to the pre-treatments. The proposed hybrid algorithms are highly accurate relative to the traditional ML methods reflecting much higher coefficients of determination and correlation coefficients, and much lower error indices. Artificial neural networks (ANNs), linear regression with interactions (LRI), support vector machine (SVM), rational quadratic Gaussian process regression (RQGPR), fine regression trees (FRTs) and boosted ensembles of trees (BETs) are used as the illustrative machine learning methods. To guarantee both versatility and robustness, the methods are tested on example data drawn from both temperate and tropical conditions.

Download Full-text

Identifying Cancer Targets Based on Machine Learning Methods via Chou’s 5-steps Rule and General Pseudo Components

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666191016155543 ◽

2019 ◽

Vol 19 (25) ◽

pp. 2301-2317 ◽

Cited By ~ 2

Author(s):

Ruirui Liang ◽

Jiayang Xie ◽

Chi Zhang ◽

Mengying Zhang ◽

Hai Huang ◽

...

Keyword(s):

Machine Learning ◽

Growth Rate ◽

Big Data ◽

Human Genome Project ◽

Genome Project ◽

Support Vector ◽

Successful Implementation ◽

Learning Methods ◽

Machine Learning Methods ◽

Vector Machines

In recent years, the successful implementation of human genome project has made people realize that genetic, environmental and lifestyle factors should be combined together to study cancer due to the complexity and various forms of the disease. The increasing availability and growth rate of ‘big data’ derived from various omics, opens a new window for study and therapy of cancer. In this paper, we will introduce the application of machine learning methods in handling cancer big data including the use of artificial neural networks, support vector machines, ensemble learning and naïve Bayes classifiers.

Download Full-text

Integration of transcriptomic data identifies key hallmark genes in hypertrophic cardiomyopathy

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-02147-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jing Xu ◽

Xiangdong Liu ◽

Qiming Dai

Keyword(s):

Machine Learning ◽

Hypertrophic Cardiomyopathy ◽

Heart Diseases ◽

Expression Patterns ◽

Support Vector ◽

Rna Seq ◽

Ppi Network ◽

Learning Methods ◽

Transcriptomic Data ◽

Machine Learning Methods

Abstract Background Hypertrophic cardiomyopathy (HCM) represents one of the most common inherited heart diseases. To identify key molecules involved in the development of HCM, gene expression patterns of the heart tissue samples in HCM patients from multiple microarray and RNA-seq platforms were investigated. Methods The significant genes were obtained through the intersection of two gene sets, corresponding to the identified differentially expressed genes (DEGs) within the microarray data and within the RNA-Seq data. Those genes were further ranked using minimum-Redundancy Maximum-Relevance feature selection algorithm. Moreover, the genes were assessed by three different machine learning methods for classification, including support vector machines, random forest and k-Nearest Neighbor. Results Outstanding results were achieved by taking exclusively the top eight genes of the ranking into consideration. Since the eight genes were identified as candidate HCM hallmark genes, the interactions between them and known HCM disease genes were explored through the protein–protein interaction (PPI) network. Most candidate HCM hallmark genes were found to have direct or indirect interactions with known HCM diseases genes in the PPI network, particularly the hub genes JAK2 and GADD45A. Conclusions This study highlights the transcriptomic data integration, in combination with machine learning methods, in providing insight into the key hallmark genes in the genetic etiology of HCM.

Download Full-text

Machine Learning Methods Applied to the Prediction of Pseudo-nitzschia spp. Blooms in the Galician Rias Baixas (NW Spain)

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040199 ◽

2021 ◽

Vol 10 (4) ◽

pp. 199

Author(s):

Francisco M. Bellas Aláez ◽

Jesus M. Torres Palenzuela ◽

Evangelos Spyrakos ◽

Luis González Vilas

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Prediction Models ◽

Support Vector ◽

False Alarms ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Rías Baixas ◽

New Algorithms

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.

Download Full-text