scholarly journals Variation-Oriented Data Filtering for Improvement in Model Complexity of Air Pollutant Prediction Model

2014 ◽  
Vol 2014 ◽  
pp. 1-14 ◽  
Author(s):  
Chi Man Vong ◽  
Weng Fai Ip ◽  
Pak Kin Wong

Accurate prediction models for air pollutants are crucial for forecast and health alarm to local inhabitants. In recent literature,discrete wavelet transform(DWT) was employed to decompose a series of air pollutant levels, followed by modeling usingsupport vector machine(SVM). This combination of DWT and SVM was reported to produce a more accurate prediction model for air pollutants by investigating different levels of frequency bands. However, DWT has a significant demand in model complexity, namely, the training time and the model size of the prediction model. In this paper, a new method calledvariation-oriented filtering(VF) is proposed to remove the data with low variation, which can be considered asnoiseto a prediction model. By VF, the noise and the size of the series of air pollutant levels can be reduced simultaneously and hence so are the training time and model size. The SO2(sulfur dioxide) level in Macau was selected as a test case. Experimental results show that VF can effectively and efficiently reduce the model complexity with improvement in predictive accuracy.

2021 ◽  
Vol 17 (9) ◽  
pp. 727-735
Author(s):  
Jiamei Long ◽  
Jia Yang ◽  
Jing Peng ◽  
Leiqing Pan ◽  
Kang Tu

Abstract Moisture content and carotenoid content are important indicators for evaluating the drying process of carrot slices. There are growing attention to develop non-destructive methods as effectively analytical tools in quality assurance of drying carrot slices. In this study, the characteristic wavelengths of moisture and carotenoid content in carrot slices during hot air drying were extracted based on hyperspectral imaging technology. A multispectral imaging equipment was built after that, and the wavelengths of filters were determined according to the characteristic wavelengths. Based on the successive projection algorithm (SPA), the optimal wavelengths of moisture and carotenoid content were further determined, and prediction models of both were established based on the system. There were 12 filters selected in this study. The results showed that a support vector machine (SVM) prediction model for moisture content was established based on seven optimal wavelengths with 0.991 for the coefficient of determination of prediction set (R 2 p ) and 10.318 for the residual prediction residual (RPD). Based on eight optimal wavelengths, a SVM prediction model for carotenoid content was also established with 0.968 for R 2 p and 5.337 for RPD. The prediction performance is close to or even better than that based on hyperspectral. The study confirmed the feasibility of using the multispectral imaging equipment to measure the moisture and carotenoid content of carrot slices during drying based on selected wavelengths, laying a foundation for the further preparation of a portable multispectral detector for the quality of dry products.


Author(s):  
Yingkui Gu ◽  
Qingpeng Bi ◽  
Guangqi Qiu

Abstract To improve the accuracy of our previous bearing ensemble Remaining Useful Life (RUL) prediction model using the Genetic Algorithm (GA), Support Vector Regression (SVR), and the Weibull Proportional Hazard Model (WPHM) (see reference [1]), we proposed a more practical Health Indicator (HI) construction methodology for bearing ensemble RUL prediction. A weighted coefficient determination method for four prognostic metrics-monotonicity, robustness, trendability, and consistency-was proposed to select sensitive health features accurately using the Analytic Hierarchy Process (AHP). The selected sensitive health features were fused through isometric feature mapping (ISOMAP), and Differential Evolution (DE) was employed to replace GA for computing the optimal weight coefficients of each input fused feature. One-dimensional HI was constructed by multiplying each input fused feature with the corresponding optimal weight coefficient, and RUL prediction was implemented through an extreme learning machine (ELM) and WPHM. The accuracy and effectiveness of the proposed method were validated by a bearing experiment. The results show that HI construction with ISOMAP-DE has achieved the best performance, and the proposed ELM-WPHM model is compared with BP-WPHM, SVM-WPHM, LSTM-WPHM, and DLSTM-WPHM in terms of RMSE criteria. The minimum error and training time appear in ELM-WPHM, indicating the superiority of the proposed bearing ensemble RUL prediction model.


Author(s):  
Minghui Cheng ◽  
Li Jiao ◽  
Xuechun Shi ◽  
Xibin Wang ◽  
Pei Yan ◽  
...  

In the process of high strength steel turning, tool wear will reduce the surface quality of the workpiece and increase cutting force and cutting temperature. To obtain the fine surface quality and avoid unnecessary loss, it is necessary to monitor the state of tool wear in the dry turning. In this article, the cutting force, vibration signal and surface texture of the machined surface were collected by tool condition monitoring system and signal processing techniques are being used for extracting the time-domain, frequency-domain and time-frequency features of cutting force and vibration. The gray level processing technique is used to extract the features of the gray co-occurrence matrix of the surface texture and found that these features changed simultaneously when the cutting tool broke. After this, an intelligent prediction model of tool wear was built using the support vector regression (SVR) whose kernel function parameters were optimized by the grid search algorithm (GS), the genetic algorithm (GA) and the particle swarm optimization algorithm respectively. The features extracted from the signals and surface texture are used to train the prediction model in MATLAB. It was found that after the surface texture features were fused using the intelligent prediction model on the basis of the features of cutting force and vibration, prediction accuracy of the proposed method is found as 97.32% and 96.72% respectively under the two prediction models of GA-SVR and GS-SVR. Moreover, the intelligent prediction model can not only predict the tool wear under different cutting conditions, but also the different wear stages in a single wear cycle and the absolute error between the predicted value and the actual value is less than 10 μm, the confidence coefficient of prediction curve is around 0.99.


2020 ◽  
Vol 10 (11) ◽  
pp. 4083-4102
Author(s):  
Abelardo Montesinos-López ◽  
Humberto Gutierrez-Pulido ◽  
Osval Antonio Montesinos-López ◽  
José Crossa

Due to the ever-increasing data collected in genomic breeding programs, there is a need for genomic prediction models that can deal better with big data. For this reason, here we propose a Maximum a posteriori Threshold Genomic Prediction (MAPT) model for ordinal traits that is more efficient than the conventional Bayesian Threshold Genomic Prediction model for ordinal traits. The MAPT performs the predictions of the Threshold Genomic Prediction model by using the maximum a posteriori estimation of the parameters, that is, the values of the parameters that maximize the joint posterior density. We compared the prediction performance of the proposed MAPT to the conventional Bayesian Threshold Genomic Prediction model, the multinomial Ridge regression and support vector machine on 8 real data sets. We found that the proposed MAPT was competitive with regard to the multinomial and support vector machine models in terms of prediction performance, and slightly better than the conventional Bayesian Threshold Genomic Prediction model. With regard to the implementation time, we found that in general the MAPT and the support vector machine were the best, while the slowest was the multinomial Ridge regression model. However, it is important to point out that the successful implementation of the proposed MAPT model depends on the informative priors used to avoid underestimation of variance components.


Chronic renal syndrome is defined as a progressive loss of renal function over period. Analysers have make effort in attempting to diagnosis the risk factors that may affect the retrogression of chronic renal syndrome. The motivation of this project helps to develop a prediction model for level 4 CKD patients to detect on condition that, their estimated Glomerular Filtration Rate (eGFR) stage downscale to lower than 15 ml/min/1.73 m². End phase renal disease, after six months accumulating their concluding lab test observation by assessing time affiliated aspects. Data mining algorithm along with Temporal Abstraction (TA) are confederated to reinforce CKD evolvement of prognostication models. In this work a inclusive of 112 chronic renal disease patients are composed from April 1952 to September 2011 which were extracted from the patient’s Electronic Medical Records (EMR). The information of chronic renal patients are collected in a big spatial info-graphic data. In order to analyse these info-graphic data, it is significant to detect the issues affecting CKD deterioration and hence it becomes a challenging task. To overcome this challenge, time series graph has been generated in this project work based on creatinine and albumin lab test values and reports of the time period. The presence of CKD diagnostic codes are transformed into default seven digit default format of International Classification of Disease 10 Clinical Modification (ICD 10 CM). Feature selection is performed in this work based on wrapper method using genetic algorithm. It is helpful for finding the most relevant variables for a predictive model. High Utility Sequential Rule Miner (HUSRM) is used here to address the discovery of CKD sequential rules based on sequence patterns. Temporal Abstraction (TA) techniques namely basic TA and complex TA are used in this work to analyse the status of chronic renal syndrome patients. Classification and Regression Technique (CART) along with Adaptive Boosting (AdaBoost) and Support Vector Machine Boosting (SVMBoost) are applied to develop the CKD in which the progression prediction models exhibit most accurate prediction. The results obtained from this work divulged that comprehending temporal observation forward the prognostic instances has escalated the efficacy of the instances. Finally, an evaluation metrics namely accuracy, sensitivity, specificity, positive likelihood, negative likelihood and Area Under the Curve (AUC) are helps to evaluate the performance of the prediction models which are designed and implemented in this project. Key Words: CKD, progression, time series data, genetic algorithm, sequential rules, TA classification and prediction model.


Author(s):  
Adven Masih ◽  
Alexander N. Medvedev

The alarming level of air pollution in urban centres is an urgent threat to human health. Its consequences can be measured in terms of health issues experienced by children, an increasing numbers of heart and lung diseases, and, most importantly, the number of pollution related deaths. That is why a lot of attention has recently been paid to air pollution monitoring and prediction modelling. In order to develop prediction models, the study uses Support Vector Machines (SVM) with linear, polynomial, radial base function, normalised polynomial, and Pearson VII function kernels to predict the hourly concentration of pollutants in the air. The paper analyses the monitoring dataset of air pollutants and meteorological parameters as input variable to predict the concentrations of various air pollutants. The prediction performance of the models was assessed by using evaluation metrics, namely the correlation coefficient, root mean squared error, relative absolute error, and relative root squared error. To validate the model, the accuracy of the predictive algorithm was tested against two widely and commonly applied regression approaches called multilayer perceptron and linear regression. Furthermore, back check prediction test was performed to examine the consistency of the models. According to the results, the Pearson VII function and normalised polynomial kernel yield the most accurate results in terms of the correlation coefficient and error values to predict the concentrations of atmospheric pollutants as compared to other SVM kernels and traditional prediction models.


2017 ◽  
Vol 64 (3) ◽  
Author(s):  
Chengyun Zhu ◽  
Xingqiao Liu ◽  
Hailei Chen ◽  
Xiang Tian

The pH of water directly affects growth of mitten crab (Eriocheir sinensis H. Milne-Edwards, 1853) in aquaculture. A prediction model was set up to determine the changing trend of pH value during culture of mitten crabs. The model would help the farmer to take measures in advance to maintain the safety of cultured crabs, when the predicted value of pH is found to cross beyond safe levels. Prediction model of pH is based on the least squares support vector regression (LSSVR) model with chaotic mutation to improve the estimation of the distribution algorithm (CMEDA) to find optimal parameters (γ and σ) of LSSVR. Because these two parameters can significantly affect the performance of the LSSVR, the other three parameter optimisation methods viz., the particle swarm optimisation (PSO) algorithm, the genetic algorithm (GA) and grid search (GS) algorithm were used to compare with the CMEDA algorithm. The calculated mean absolute percentage errors of the results of the four prediction models were 0.4059, 0.6332, 0.9385 and 1.2499%, respectively. The CMEDA-LSSVR model has a higher prediction accuracy and more reliable performance than the other models. The prediction model was used in Xinhua, Jiangsu Province, China and it performed well and helped farmers make decisions and reduce aquaculture risks.


Author(s):  
Jong-Sang Youn ◽  
Jeong-Won Seo ◽  
Wonjun Park ◽  
SeJoon Park ◽  
Ki-Joon Jeon

Here, we develop a dry eye syndrome (DES) incidence rate prediction model using air pollutants (PM10, NO2, SO2, O3, and CO), meteorological factors (temperature, humidity, and wind speed), population rate, and clinical data for South Korea. The prediction model is well fitted to the incidence rate (R2 = 0.9443 and 0.9388, p < 2.2 × 10−16). To analyze regional deviations, we classify outpatient data, air pollutant, and meteorological factors in 16 administrative districts (seven metropolitan areas and nine states). Our results confirm NO2 and relative humidity are the factors impacting regional deviations in the prediction model.


Author(s):  
Nabil Mohamed Eldakhly ◽  
Magdy Aboul-Ela ◽  
Areeg Abdalla

The particulate matter air pollutant of diameter less than 10 micrometers (PM[Formula: see text]), a category of pollutants including solid and liquid particles, can be a health hazard for several reasons: it can harm lung tissues and throat, aggravate asthma and increase respiratory illness. Accurate prediction models of PM[Formula: see text] concentrations are essential for proper management, control, and making public warning strategies. Therefore, machine learning techniques have the capability to develop methods or tools that can be used to discover unseen patterns from given data to solve a particular task or problem. The chance theory has advanced concepts pertinent to treat cases where both randomness and fuzziness play simultaneous roles at one time. The main objective is to study the modification of a single machine learning algorithm — support vector machine (SVM) — applying the chance weight of the target variable, based on the chance theory, to the corresponding dataset point to be superior to the ensemble machine learning algorithms. The results of this study are outperforming of the SVM algorithms when modifying and combining with the right theory/technique, especially the chance theory over other modern ensemble learning algorithms.


Author(s):  
Henock M. Deberneh ◽  
Intaek Kim

Prediction of type 2 diabetes (T2D) occurrence allows a person at risk to take actions that can prevent onset or delay the progression of the disease. In this study, we developed a machine learning (ML) model to predict T2D occurrence in the following year (Y + 1) using variables in the current year (Y). The dataset for this study was collected at a private medical institute as electronic health records from 2013 to 2018. To construct the prediction model, key features were first selected using ANOVA tests, chi-squared tests, and recursive feature elimination methods. The resultant features were fasting plasma glucose (FPG), HbA1c, triglycerides, BMI, gamma-GTP, age, uric acid, sex, smoking, drinking, physical activity, and family history. We then employed logistic regression, random forest, support vector machine, XGBoost, and ensemble machine learning algorithms based on these variables to predict the outcome as normal (non-diabetic), prediabetes, or diabetes. Based on the experimental results, the performance of the prediction model proved to be reasonably good at forecasting the occurrence of T2D in the Korean population. The model can provide clinicians and patients with valuable predictive information on the likelihood of developing T2D. The cross-validation (CV) results showed that the ensemble models had a superior performance to that of the single models. The CV performance of the prediction models was improved by incorporating more medical history from the dataset.


Sign in / Sign up

Export Citation Format

Share Document