Variation-Oriented Data Filtering for Improvement in Model Complexity of Air Pollutant Prediction Model

Accurate prediction models for air pollutants are crucial for forecast and health alarm to local inhabitants. In recent literature,discrete wavelet transform(DWT) was employed to decompose a series of air pollutant levels, followed by modeling usingsupport vector machine(SVM). This combination of DWT and SVM was reported to produce a more accurate prediction model for air pollutants by investigating different levels of frequency bands. However, DWT has a significant demand in model complexity, namely, the training time and the model size of the prediction model. In this paper, a new method calledvariation-oriented filtering(VF) is proposed to remove the data with low variation, which can be considered asnoiseto a prediction model. By VF, the noise and the size of the series of air pollutant levels can be reduced simultaneously and hence so are the training time and model size. The SO2(sulfur dioxide) level in Macau was selected as a test case. Experimental results show that VF can effectively and efficiently reduce the model complexity with improvement in predictive accuracy.

Download Full-text

Detection of moisture and carotenoid content in carrot slices during hot air drying based on multispectral imaging equipment with selected wavelengths

International Journal of Food Engineering ◽

10.1515/ijfe-2021-0127 ◽

2021 ◽

Vol 17 (9) ◽

pp. 727-735

Author(s):

Jiamei Long ◽

Jia Yang ◽

Jing Peng ◽

Leiqing Pan ◽

Kang Tu

Keyword(s):

Moisture Content ◽

Prediction Model ◽

Multispectral Imaging ◽

Prediction Models ◽

Projection Algorithm ◽

Carotenoid Content ◽

Support Vector ◽

Hot Air Drying ◽

Air Drying ◽

Hot Air

Abstract Moisture content and carotenoid content are important indicators for evaluating the drying process of carrot slices. There are growing attention to develop non-destructive methods as effectively analytical tools in quality assurance of drying carrot slices. In this study, the characteristic wavelengths of moisture and carotenoid content in carrot slices during hot air drying were extracted based on hyperspectral imaging technology. A multispectral imaging equipment was built after that, and the wavelengths of filters were determined according to the characteristic wavelengths. Based on the successive projection algorithm (SPA), the optimal wavelengths of moisture and carotenoid content were further determined, and prediction models of both were established based on the system. There were 12 filters selected in this study. The results showed that a support vector machine (SVM) prediction model for moisture content was established based on seven optimal wavelengths with 0.991 for the coefficient of determination of prediction set (R 2 p ) and 10.318 for the residual prediction residual (RPD). Based on eight optimal wavelengths, a SVM prediction model for carotenoid content was also established with 0.968 for R 2 p and 5.337 for RPD. The prediction performance is close to or even better than that based on hyperspectral. The study confirmed the feasibility of using the multispectral imaging equipment to measure the moisture and carotenoid content of carrot slices during drying based on selected wavelengths, laying a foundation for the further preparation of a portable multispectral detector for the quality of dry products.

Download Full-text

Practical health indicator construction methodology for bearing ensemble remaining useful life prediction with ISOMAP-DE and ELM-WPHM

Measurement Science and Technology ◽

10.1088/1361-6501/ac3855 ◽

2021 ◽

Author(s):

Yingkui Gu ◽

Qingpeng Bi ◽

Guangqi Qiu

Keyword(s):

Prediction Model ◽

Weight Coefficient ◽

Remaining Useful Life ◽

Support Vector ◽

Health Indicator ◽

Training Time ◽

Optimal Weight ◽

Useful Life ◽

Learning Machine ◽

Health Features

Abstract To improve the accuracy of our previous bearing ensemble Remaining Useful Life (RUL) prediction model using the Genetic Algorithm (GA), Support Vector Regression (SVR), and the Weibull Proportional Hazard Model (WPHM) (see reference [1]), we proposed a more practical Health Indicator (HI) construction methodology for bearing ensemble RUL prediction. A weighted coefficient determination method for four prognostic metrics-monotonicity, robustness, trendability, and consistency-was proposed to select sensitive health features accurately using the Analytic Hierarchy Process (AHP). The selected sensitive health features were fused through isometric feature mapping (ISOMAP), and Differential Evolution (DE) was employed to replace GA for computing the optimal weight coefficients of each input fused feature. One-dimensional HI was constructed by multiplying each input fused feature with the corresponding optimal weight coefficient, and RUL prediction was implemented through an extreme learning machine (ELM) and WPHM. The accuracy and effectiveness of the proposed method were validated by a bearing experiment. The results show that HI construction with ISOMAP-DE has achieved the best performance, and the proposed ELM-WPHM model is compared with BP-WPHM, SVM-WPHM, LSTM-WPHM, and DLSTM-WPHM in terms of RMSE criteria. The minimum error and training time appear in ELM-WPHM, indicating the superiority of the proposed bearing ensemble RUL prediction model.

Download Full-text

An intelligent prediction model of the tool wear based on machine learning in turning high strength steel

Proceedings of the Institution of Mechanical Engineers Part B Journal of Engineering Manufacture ◽

10.1177/0954405420935787 ◽

2020 ◽

Vol 234 (13) ◽

pp. 1580-1597

Author(s):

Minghui Cheng ◽

Li Jiao ◽

Xuechun Shi ◽

Xibin Wang ◽

Pei Yan ◽

...

Keyword(s):

Tool Wear ◽

Prediction Model ◽

Cutting Force ◽

Surface Quality ◽

Surface Texture ◽

High Strength Steel ◽

Prediction Models ◽

High Strength ◽

Support Vector ◽

Time Frequency

In the process of high strength steel turning, tool wear will reduce the surface quality of the workpiece and increase cutting force and cutting temperature. To obtain the fine surface quality and avoid unnecessary loss, it is necessary to monitor the state of tool wear in the dry turning. In this article, the cutting force, vibration signal and surface texture of the machined surface were collected by tool condition monitoring system and signal processing techniques are being used for extracting the time-domain, frequency-domain and time-frequency features of cutting force and vibration. The gray level processing technique is used to extract the features of the gray co-occurrence matrix of the surface texture and found that these features changed simultaneously when the cutting tool broke. After this, an intelligent prediction model of tool wear was built using the support vector regression (SVR) whose kernel function parameters were optimized by the grid search algorithm (GS), the genetic algorithm (GA) and the particle swarm optimization algorithm respectively. The features extracted from the signals and surface texture are used to train the prediction model in MATLAB. It was found that after the surface texture features were fused using the intelligent prediction model on the basis of the features of cutting force and vibration, prediction accuracy of the proposed method is found as 97.32% and 96.72% respectively under the two prediction models of GA-SVR and GS-SVR. Moreover, the intelligent prediction model can not only predict the tool wear under different cutting conditions, but also the different wear stages in a single wear cycle and the absolute error between the predicted value and the actual value is less than 10 μm, the confidence coefficient of prediction curve is around 0.99.

Download Full-text

Maximum a posteriori Threshold Genomic Prediction Model for Ordinal Traits

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401733 ◽

2020 ◽

Vol 10 (11) ◽

pp. 4083-4102

Author(s):

Abelardo Montesinos-López ◽

Humberto Gutierrez-Pulido ◽

Osval Antonio Montesinos-López ◽

José Crossa

Keyword(s):

Support Vector Machine ◽

Prediction Model ◽

Ridge Regression ◽

Genomic Prediction ◽

Prediction Models ◽

Prediction Performance ◽

Support Vector ◽

Successful Implementation ◽

A Posteriori ◽

Ordinal Traits

Due to the ever-increasing data collected in genomic breeding programs, there is a need for genomic prediction models that can deal better with big data. For this reason, here we propose a Maximum a posteriori Threshold Genomic Prediction (MAPT) model for ordinal traits that is more efficient than the conventional Bayesian Threshold Genomic Prediction model for ordinal traits. The MAPT performs the predictions of the Threshold Genomic Prediction model by using the maximum a posteriori estimation of the parameters, that is, the values of the parameters that maximize the joint posterior density. We compared the prediction performance of the proposed MAPT to the conventional Bayesian Threshold Genomic Prediction model, the multinomial Ridge regression and support vector machine on 8 real data sets. We found that the proposed MAPT was competitive with regard to the multinomial and support vector machine models in terms of prediction performance, and slightly better than the conventional Bayesian Threshold Genomic Prediction model. With regard to the implementation time, we found that in general the MAPT and the support vector machine were the best, while the slowest was the multinomial Ridge regression model. However, it is important to point out that the successful implementation of the proposed MAPT model depends on the informative priors used to avoid underestimation of variance components.

Download Full-text

Prognosis of Chronic Renal Syndrome by Classification and Progression Using Temporal Abstraction

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9927.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 53-58

Keyword(s):

Genetic Algorithm ◽

Time Series ◽

Renal Disease ◽

Prediction Model ◽

Prediction Models ◽

Series Data ◽

Support Vector ◽

Temporal Abstraction ◽

Graphic Data ◽

Lab Test

Chronic renal syndrome is defined as a progressive loss of renal function over period. Analysers have make effort in attempting to diagnosis the risk factors that may affect the retrogression of chronic renal syndrome. The motivation of this project helps to develop a prediction model for level 4 CKD patients to detect on condition that, their estimated Glomerular Filtration Rate (eGFR) stage downscale to lower than 15 ml/min/1.73 m². End phase renal disease, after six months accumulating their concluding lab test observation by assessing time affiliated aspects. Data mining algorithm along with Temporal Abstraction (TA) are confederated to reinforce CKD evolvement of prognostication models. In this work a inclusive of 112 chronic renal disease patients are composed from April 1952 to September 2011 which were extracted from the patient’s Electronic Medical Records (EMR). The information of chronic renal patients are collected in a big spatial info-graphic data. In order to analyse these info-graphic data, it is significant to detect the issues affecting CKD deterioration and hence it becomes a challenging task. To overcome this challenge, time series graph has been generated in this project work based on creatinine and albumin lab test values and reports of the time period. The presence of CKD diagnostic codes are transformed into default seven digit default format of International Classification of Disease 10 Clinical Modification (ICD 10 CM). Feature selection is performed in this work based on wrapper method using genetic algorithm. It is helpful for finding the most relevant variables for a predictive model. High Utility Sequential Rule Miner (HUSRM) is used here to address the discovery of CKD sequential rules based on sequence patterns. Temporal Abstraction (TA) techniques namely basic TA and complex TA are used in this work to analyse the status of chronic renal syndrome patients. Classification and Regression Technique (CART) along with Adaptive Boosting (AdaBoost) and Support Vector Machine Boosting (SVMBoost) are applied to develop the CKD in which the progression prediction models exhibit most accurate prediction. The results obtained from this work divulged that comprehending temporal observation forward the prognostic instances has escalated the efficacy of the instances. Finally, an evaluation metrics namely accuracy, sensitivity, specificity, positive likelihood, negative likelihood and Area Under the Curve (AUC) are helps to evaluate the performance of the prediction models which are designed and implemented in this project. Key Words: CKD, progression, time series data, genetic algorithm, sequential rules, TA classification and prediction model.

Download Full-text

Evaluating the performance of support vector machines based on different kernel methods for forecasting air pollutants

Вестник ВГУ. Серия: Системный анализ и информационные технологии ◽

10.17308/sait.2020.3/3035 ◽

2020 ◽

pp. 5-14

Author(s):

Adven Masih ◽

Alexander N. Medvedev

Keyword(s):

Air Pollution ◽

Support Vector Machines ◽

Correlation Coefficient ◽

Air Pollutants ◽

Prediction Models ◽

Pollution Monitoring ◽

Polynomial Kernel ◽

Support Vector ◽

Squared Error ◽

Vector Machines

The alarming level of air pollution in urban centres is an urgent threat to human health. Its consequences can be measured in terms of health issues experienced by children, an increasing numbers of heart and lung diseases, and, most importantly, the number of pollution related deaths. That is why a lot of attention has recently been paid to air pollution monitoring and prediction modelling. In order to develop prediction models, the study uses Support Vector Machines (SVM) with linear, polynomial, radial base function, normalised polynomial, and Pearson VII function kernels to predict the hourly concentration of pollutants in the air. The paper analyses the monitoring dataset of air pollutants and meteorological parameters as input variable to predict the concentrations of various air pollutants. The prediction performance of the models was assessed by using evaluation metrics, namely the correlation coefficient, root mean squared error, relative absolute error, and relative root squared error. To validate the model, the accuracy of the predictive algorithm was tested against two widely and commonly applied regression approaches called multilayer perceptron and linear regression. Furthermore, back check prediction test was performed to examine the consistency of the models. According to the results, the Pearson VII function and normalised polynomial kernel yield the most accurate results in terms of the correlation coefficient and error values to predict the concentrations of atmospheric pollutants as compared to other SVM kernels and traditional prediction models.

Download Full-text

Prediction model of pH value in mitten crab culture

Indian Journal of Fisheries ◽

10.21077/ijf.2017.64.3.57740-06 ◽

2017 ◽

Vol 64 (3) ◽

Author(s):

Chengyun Zhu ◽

Xingqiao Liu ◽

Hailei Chen ◽

Xiang Tian

Keyword(s):

Prediction Model ◽

Prediction Models ◽

Ph Value ◽

Pso Algorithm ◽

Eriocheir Sinensis ◽

The Other ◽

Support Vector ◽

Mitten Crab ◽

Optimisation Methods ◽

Two Parameters

The pH of water directly affects growth of mitten crab (Eriocheir sinensis H. Milne-Edwards, 1853) in aquaculture. A prediction model was set up to determine the changing trend of pH value during culture of mitten crabs. The model would help the farmer to take measures in advance to maintain the safety of cultured crabs, when the predicted value of pH is found to cross beyond safe levels. Prediction model of pH is based on the least squares support vector regression (LSSVR) model with chaotic mutation to improve the estimation of the distribution algorithm (CMEDA) to find optimal parameters (γ and σ) of LSSVR. Because these two parameters can significantly affect the performance of the LSSVR, the other three parameter optimisation methods viz., the particle swarm optimisation (PSO) algorithm, the genetic algorithm (GA) and grid search (GS) algorithm were used to compare with the CMEDA algorithm. The calculated mean absolute percentage errors of the results of the four prediction models were 0.4059, 0.6332, 0.9385 and 1.2499%, respectively. The CMEDA-LSSVR model has a higher prediction accuracy and more reliable performance than the other models. The prediction model was used in Xinhua, Jiangsu Province, China and it performed well and helped farmers make decisions and reduce aquaculture risks.

Download Full-text

Prediction Model for Dry Eye Syndrome Incidence Rate Using Air Pollutants and Meteorological Factors in South Korea: Analysis of Sub-Region Deviations

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17144969 ◽

2020 ◽

Vol 17 (14) ◽

pp. 4969

Author(s):

Jong-Sang Youn ◽

Jeong-Won Seo ◽

Wonjun Park ◽

SeJoon Park ◽

Ki-Joon Jeon

Keyword(s):

South Korea ◽

Prediction Model ◽

Incidence Rate ◽

Dry Eye ◽

Air Pollutants ◽

Metropolitan Areas ◽

Meteorological Factors ◽

Air Pollutant ◽

Dry Eye Syndrome ◽

Population Rate

Here, we develop a dry eye syndrome (DES) incidence rate prediction model using air pollutants (PM10, NO2, SO2, O3, and CO), meteorological factors (temperature, humidity, and wind speed), population rate, and clinical data for South Korea. The prediction model is well fitted to the incidence rate (R2 = 0.9443 and 0.9388, p < 2.2 × 10−16). To analyze regional deviations, we classify outpatient data, air pollutant, and meteorological factors in 16 administrative districts (seven metropolitan areas and nine states). Our results confirm NO2 and relative humidity are the factors impacting regional deviations in the prediction model.

Download Full-text

A Novel Approach of Weighted Support Vector Machine with Applied Chance Theory for Forecasting Air Pollution Phenomenon in Egypt

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026818500013 ◽

2018 ◽

Vol 17 (01) ◽

pp. 1850001 ◽

Cited By ~ 4

Author(s):

Nabil Mohamed Eldakhly ◽

Magdy Aboul-Ela ◽

Areeg Abdalla

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Prediction Models ◽

Learning Algorithms ◽

Management Control ◽

Air Pollutant ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Chance Theory

The particulate matter air pollutant of diameter less than 10 micrometers (PM[Formula: see text]), a category of pollutants including solid and liquid particles, can be a health hazard for several reasons: it can harm lung tissues and throat, aggravate asthma and increase respiratory illness. Accurate prediction models of PM[Formula: see text] concentrations are essential for proper management, control, and making public warning strategies. Therefore, machine learning techniques have the capability to develop methods or tools that can be used to discover unseen patterns from given data to solve a particular task or problem. The chance theory has advanced concepts pertinent to treat cases where both randomness and fuzziness play simultaneous roles at one time. The main objective is to study the modification of a single machine learning algorithm — support vector machine (SVM) — applying the chance weight of the target variable, based on the chance theory, to the corresponding dataset point to be superior to the ensemble machine learning algorithms. The results of this study are outperforming of the SVM algorithms when modifying and combining with the right theory/technique, especially the chance theory over other modern ensemble learning algorithms.

Download Full-text

Prediction of Type 2 Diabetes Based on Machine Learning Algorithm

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18063317 ◽

2021 ◽

Vol 18 (6) ◽

pp. 3317

Author(s):

Henock M. Deberneh ◽

Intaek Kim

Keyword(s):

Machine Learning ◽

Type 2 Diabetes ◽

Prediction Model ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Recursive Feature Elimination ◽

Medical Institute ◽

Support Vector

Prediction of type 2 diabetes (T2D) occurrence allows a person at risk to take actions that can prevent onset or delay the progression of the disease. In this study, we developed a machine learning (ML) model to predict T2D occurrence in the following year (Y + 1) using variables in the current year (Y). The dataset for this study was collected at a private medical institute as electronic health records from 2013 to 2018. To construct the prediction model, key features were first selected using ANOVA tests, chi-squared tests, and recursive feature elimination methods. The resultant features were fasting plasma glucose (FPG), HbA1c, triglycerides, BMI, gamma-GTP, age, uric acid, sex, smoking, drinking, physical activity, and family history. We then employed logistic regression, random forest, support vector machine, XGBoost, and ensemble machine learning algorithms based on these variables to predict the outcome as normal (non-diabetic), prediabetes, or diabetes. Based on the experimental results, the performance of the prediction model proved to be reasonably good at forecasting the occurrence of T2D in the Korean population. The model can provide clinicians and patients with valuable predictive information on the likelihood of developing T2D. The cross-validation (CV) results showed that the ensemble models had a superior performance to that of the single models. The CV performance of the prediction models was improved by incorporating more medical history from the dataset.

Download Full-text