scholarly journals Error Prediction of Air Quality at Monitoring Stations Using Random Forest in a Total Error Framework

Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2160
Author(s):  
Jean-Marie Lepioufle ◽  
Leif Marsteen ◽  
Mona Johnsrud

Instead of a flag valid/non-valid usually proposed in the quality control (QC) processes of air quality (AQ), we proposed a method that predicts the p-value of each observation as a value between 0 and 1. We based our error predictions on three approaches: the one proposed by the Working Group on Guidance for the Demonstration of Equivalence (European Commission (2010)), the one proposed by Wager (Journal of MachineLearningResearch, 15, 1625–1651 (2014)) and the one proposed by Lu (Journal of MachineLearningResearch, 22, 1–41 (2021)). Total Error framework enables to differentiate the different errors: input, output, structural modeling and remnant. We thus theoretically described a one-site AQ prediction based on a multi-site network using Random Forest for regression in a Total Error framework. We demonstrated the methodology with a dataset of hourly nitrogen dioxide measured by a network of monitoring stations located in Oslo, Norway and implemented the error predictions for the three approaches. The results indicate that a simple one-site AQ prediction based on a multi-site network using Random Forest for regression provides moderate metrics for fixed stations. According to the diagnostic based on predictive qq-plot and among the three approaches used in this study, the approach proposed by Lu provides better error predictions. Furthermore, ensuring a high precision of the error prediction requires efforts on getting accurate input, output and prediction model and limiting our lack of knowledge about the “true” AQ phenomena. We put effort in quantifying each type of error involved in the error prediction to assess the error prediction model and further improving it in terms of performance and precision.

2010 ◽  
Vol 455 ◽  
pp. 565-570
Author(s):  
Zi Hua Hu ◽  
X. Yi ◽  
Ju Long Yuan

Based on the detailed analysis of the one-side milling theory of the spatial cam, an error comprehensive prediction model of the contour normal deviation is proposed in this paper by applying the spatial meshing theory, the rotation transformation of tensor and Newton’s iteration. A computer simulation example is presented, and the results show the error prediction model can disclosure and predict the law of influence effectively, that the tool and installation inaccuracy produce the contour normal error during the one-side processing of the spatial cam. Consequently, a scientific proof has been supplied to improve the precision and quality of the one-side milling of the globoidal indexing cam.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Kejia Zhang ◽  
Xu Zhang ◽  
Hongtao Song ◽  
Haiwei Pan ◽  
Bangju Wang

With the continuous improvement of people’s quality of life, air quality issues have become one of the topics of daily concern. How to achieve accurate predictions of air quality in a variety of complex situations is the key to the rapid response of local governments. This paper studies two problems: (1) how to predict the air quality of any monitoring station based on the existing weather and environmental data while considering the spatiotemporal correlation among monitoring stations and (2) how to maintain the accuracy and stability of the forecast even when the available data is severely insufficient. A prediction model combining Long Short-Term Memory networks (LSTM) and Graph Attention (GAT) mechanism is proposed to solve the first problems. A metalearning algorithm for the prediction model is proposed to solve the second problem. LSTM is used to characterize the temporal correlation of historical data and GAT is used to characterize the spatial correlation among all the monitoring stations in the target city. In the case of insufficient training data, the proposed metalearning algorithm can be used to transfer knowledge from other cities with abundant training data. Through testing on public data sets, the proposed model has obvious advantages in accuracy compared with baseline models. Combining with the metalearning algorithm, it gives a much better performance in the case of insufficient training data.


2019 ◽  
Vol 10 (2) ◽  
pp. 44-50
Author(s):  
Rinaldi Daswito ◽  
Rima Folentia ◽  
M Yusuf MF

One of the diseases that can be transmitted by flies is diarrhea. Green betel leaf contains essential oils, chavicol, arecoline, phenol, and tannins which function as plant-based insecticides. This study aimed to determine the effectiveness of green betel leaf extract (Piper betel) as a plant-based insecticide on the number of mortality of house flies (Musca domestica). The research was an experimental study used After Only Design used the One Way Anova test with a 95% confidence level. The samples used were 360 ​​house flies. Each treatment of 30 house flies with 4 repetitions and used three concentrations of green betel leaf extract (25%, 50%, 75%). The study was conducted at the Chemistry and Microbiology Laboratory of Health Polytechnic Tanjungpinang, while the location of the fly collection was at the Tokojo Garbage Collection Station in Bintan Regency. The number of mortality of house flies at a concentration of 25% was 81 heads (67.5%), 50% concentrations were 93 heads (77.5%), and at a concentration of 75% were 103 heads (85.83%). There was an effect of green betel leaf extract on the mortality of house flies (p-value 0.0001 <0.05) with the most effective concentration of 75%. Further research is needed to obtain a finished product utilizing green betel leaf extract as a vegetable insecticide, especially in controlling the fly vector. Need further research on the use of green betel leaf extract as a vegetable insecticide controlling the fly vector by taking into account the amount of spraying and the age of the fly.   Keywords: Green betel leaf extract , organic insecticide, houseflies


2021 ◽  
Vol 20 ◽  
pp. 153303382110246
Author(s):  
Jihwan Park ◽  
Mi Jung Rho ◽  
Hyong Woo Moon ◽  
Jaewon Kim ◽  
Chanjung Lee ◽  
...  

Objectives: To develop a model to predict biochemical recurrence (BCR) after radical prostatectomy (RP), using artificial intelligence (AI) techniques. Patients and Methods: This study collected data from 7,128 patients with prostate cancer (PCa) who received RP at 3 tertiary hospitals. After preprocessing, we used the data of 6,755 cases to generate the BCR prediction model. There were 16 input variables with BCR as the outcome variable. We used a random forest to develop the model. Several sampling techniques were used to address class imbalances. Results: We achieved good performance using a random forest with synthetic minority oversampling technique (SMOTE) using Tomek links, edited nearest neighbors (ENN), and random oversampling: accuracy = 96.59%, recall = 95.49%, precision = 97.66%, F1 score = 96.59%, and ROC AUC = 98.83%. Conclusion: We developed a BCR prediction model for RP. The Dr. Answer AI project, which was developed based on our BCR prediction model, helps physicians and patients to make treatment decisions in the clinical follow-up process as a clinical decision support system.


Viruses ◽  
2020 ◽  
Vol 13 (1) ◽  
pp. 18
Author(s):  
Michèle Bergmann ◽  
Mike Holzheu ◽  
Yury Zablotski ◽  
Stephanie Speck ◽  
Uwe Truyen ◽  
...  

Measuring antibodies to evaluate dogs´ immunity against canine parvovirus (CPV) is useful to avoid unnecessary re-vaccinations. The study aimed to evaluate the quality and practicability of four point-of-care (POC) tests for detection of anti-CPV antibodies. The sera of 198 client-owned and 43 specific pathogen-free (SPF) dogs were included; virus neutralization was the reference method. Specificity, sensitivity, positive and negative predictive value (PPV and NPV), and overall accuracy (OA) were calculated. Specificity was considered to be the most important indicator for POC test performance. Differences between specificity and sensitivity of POC tests in the sera of all dogs were determined by McNemar, agreement by Cohen´s kappa. Prevalence of anti-CPV antibodies in all dogs was 80% (192/241); in the subgroup of client-owned dogs, it was 97% (192/198); and in the subgroup of SPF dogs, it was 0% (0/43). FASTest® and CanTiCheck® were easiest to perform. Specificity was highest in the CanTiCheck® (overall dogs, 98%; client-owned dogs, 83%; SPF dogs, 100%) and the TiterCHEK® (overall dogs, 96%; client-owned dogs, 67%; SPF dogs, 100%); no significant differences in specificity were observed between the ImmunoComb®, the TiterCHEK®, and the CanTiCheck®. Sensitivity was highest in the FASTest® (overall dogs, 95%; client-owned dogs, 95%) and the CanTiCheck® (overall dogs, 80%; client-owned dogs, 80%); sensitivity of the FASTest® was significantly higher compared to the one of the other three tests (McNemars p-value in each comparison: <0.001). CanTiCheck® would be the POC test of choice when considering specificity and practicability. However, differences in the number of false positive results between CanTiCheck®, TiterCHEK®, and ImmunoComb® were minimal.


2020 ◽  
Vol 10 (24) ◽  
pp. 9151
Author(s):  
Yun-Chia Liang ◽  
Yona Maimury ◽  
Angela Hsiang-Ling Chen ◽  
Josue Rodolfo Cuevas Juarez

Air, an essential natural resource, has been compromised in terms of quality by economic activities. Considerable research has been devoted to predicting instances of poor air quality, but most studies are limited by insufficient longitudinal data, making it difficult to account for seasonal and other factors. Several prediction models have been developed using an 11-year dataset collected by Taiwan’s Environmental Protection Administration (EPA). Machine learning methods, including adaptive boosting (AdaBoost), artificial neural network (ANN), random forest, stacking ensemble, and support vector machine (SVM), produce promising results for air quality index (AQI) level predictions. A series of experiments, using datasets for three different regions to obtain the best prediction performance from the stacking ensemble, AdaBoost, and random forest, found the stacking ensemble delivers consistently superior performance for R2 and RMSE, while AdaBoost provides best results for MAE.


Author(s):  
Ahmad R. Alsaber ◽  
Jiazhu Pan ◽  
Adeeba Al-Hurban 

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.


Sign in / Sign up

Export Citation Format

Share Document