scholarly journals A Prediction Model for Regional Carbon Emissions Based on GRU Networks

Author(s):  
Jun Meng ◽  
Gangyi Ding ◽  
Laiyang Liu ◽  
Zheng Guan

Abstract In this study, a data-driven regional carbon emissions prediction model is proposed. The Grubbs criterion is used to eliminate the gross error data in carbon emissions sensor data. Then, according to the nearby valid data, the exponential smoothing method is used to interpolate the missing values to generate the continuous sequence for model training. Finally, the GRU network, which is a deep learning method, is used to process these sequential standardized data to obtain the prediction model. In this paper, the wireless carbon sensor network monitoring data set from August 2012 to April 2014 trained and evaluated the prediction model, and compared with the prediction model based on BP network. The experimental results prove the feasibility of the research method and related technical approaches, and the accuracy of the prediction model, which provides a method basis for the nowcasting of carbon emissions and other greenhouse gas environmental data.

Author(s):  
Ahmad R. Alsaber ◽  
Jiazhu Pan ◽  
Adeeba Al-Hurban 

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.


2020 ◽  
Author(s):  
Dongyan Ding ◽  
Tingyuan Lang ◽  
Dongling Zou ◽  
Jiawei Tan ◽  
Jia Chen ◽  
...  

Abstract Backgroud: Accurately forecasting the prognosis could improve therapeutic management of cancer patients, however, the currently used clinical features are difficult to provide enought information. The purpose of this study is to develop a survival prediction model for cervical cancer patients with big data and machine learning algorithms. Results: The cancer genome atlas cervical cancer data, including the expression of 1046 microRNAs and the clinical information of 309 cervical and endocervical cancer and 3 control samples, were downloaded. Missing values and outliers imputation, samples normalization, log transformation and features scaling were performed for preprocessing and 3 control, 2 metastatic samples and 707 microRNAs with missing values ≥ 20% were excluded. By Cox Proportional-Hazards analysis, 55 prognosis-related microRNAs (20 positively and 35 negatively correlated with survival) were identified. K-means clustering analysis showed that the cervical cancer samples can be separated into two and three subgroups with top 20 identified survival-related microRNAs for best stratification. By Support Vector Machine algorithm, two prediction models were developed which can segment the patients into two and three groups with different survival rate, respectively. The models exhibite high performance : for two classes, Area under the curve = 0.976 (training set), 0.972 (test set), 0.974 (whole data set); for three classes, AUC = 0.983, 0.996 and 0.991 (group1, 2 and 3 in training set), 0.955, 0.989 and 0.991 (group 1, 2 and 3 in test set), 0.974, 0.993 and 0.991 (group 1, 2 and 3 in whole data set) .Conclusion: The survival prediction models for cervical cancer were developed. The patients with very low survival rate (≤ 40%) can be separated by the three classes prediction model first. The rest patients can be identified by the two classes prediction model as high survival rate (≈ 75%) and low survival rate (≈ 50%).


Test ◽  
2021 ◽  
Author(s):  
Giovanni Saraceno ◽  
Claudio Agostinelli

AbstractIn the classical contamination models, such as the gross-error (Huber and Tukey contamination model or case-wise contamination), observations are considered as the units to be identified as outliers or not. This model is very useful when the number of considered variables is moderately small. Alqallaf et al. (Ann Stat 37(1):311–331, 2009) show the limits of this approach for a larger number of variables and introduced the independent contamination model (cell-wise contamination) where now the cells are the units to be identified as outliers or not. One approach to deal, at the same time, with both type of contamination is filter out the contaminated cells from the data set and then apply a robust procedure able to handle case-wise outliers and missing values. Here, we develop a general framework to build filters in any dimension based on statistical data depth functions. We show that previous approaches, e.g., Agostinelli et al. (TEST 24(3):441–461, 2015b) and Leung et al. (Comput Stat Data Anal 111:59–76, 2017), are special cases. We illustrate our method by using the half-space depth.


2014 ◽  
Vol 803 ◽  
pp. 278-281 ◽  
Author(s):  
Norazian Mohamed Noor ◽  
Mohd Mustafa Al Bakri Abdullah ◽  
Ahmad Shukri Yahaya ◽  
Nor Azam Ramli

Data collected in air pollution monitoring such as PM10, sulphur dioxide, ozone and carbon monoxide are obtained from automated monitoring stations. These data usually contained missing values due to machine failure, routine maintenance, changes in the siting of monitors and human error. Incomplete datasets can cause bias due to systematic differences between observed and unobserved data. Therefore, it is important to find the best way to estimate these missing values to ensure the quality of data analysed are of high quality. Incomplete data matrices are problematic: incomplete datasets may lead to results that are different from those that would have been obtained from a complete dataset (Hawthorne and Elliott, 2004). There are three major problems that may arise when dealing with incomplete data. First, there is a loss of information and, as a consequence, a loss of efficiency. Second, there are several complications related to data handling, computation and analysis, due to the irregulaties in data structure and the impossibility of using standard software. Third, and more important, there maybe bias due to systematic differences between observed and unobserved data. One approach to solve incomplete data problems is the adoption of imputation techniques (Junninen et al., 2004). Thus, this study compared the performance between linear interpolation method (imputation technique) and substitution of mean value for replacement of missing values in environmental data set.


2019 ◽  
Vol 23 (6) ◽  
pp. 670-679
Author(s):  
Krista Greenan ◽  
Sandra L. Taylor ◽  
Daniel Fulkerson ◽  
Kiarash Shahlaie ◽  
Clayton Gerndt ◽  
...  

OBJECTIVEA recent retrospective study of severe traumatic brain injury (TBI) in pediatric patients showed similar outcomes in those with a Glasgow Coma Scale (GCS) score of 3 and those with a score of 4 and reported a favorable long-term outcome in 11.9% of patients. Using decision tree analysis, authors of that study provided criteria to identify patients with a potentially favorable outcome. The authors of the present study sought to validate the previously described decision tree and further inform understanding of the outcomes of children with a GCS score 3 or 4 by using data from multiple institutions and machine learning methods to identify important predictors of outcome.METHODSClinical, radiographic, and outcome data on pediatric TBI patients (age < 18 years) were prospectively collected as part of an institutional TBI registry. Patients with a GCS score of 3 or 4 were selected, and the previously published prediction model was evaluated using this data set. Next, a combined data set that included data from two institutions was used to create a new, more statistically robust model using binomial recursive partitioning to create a decision tree.RESULTSForty-five patients from the institutional TBI registry were included in the present study, as were 67 patients from the previously published data set, for a total of 112 patients in the combined analysis. The previously published prediction model for survival was externally validated and performed only modestly (AUC 0.68, 95% CI 0.47, 0.89). In the combined data set, pupillary response and age were the only predictors retained in the decision tree. Ninety-six percent of patients with bilaterally nonreactive pupils had a poor outcome. If the pupillary response was normal in at least one eye, the outcome subsequently depended on age: 72% of children between 5 months and 6 years old had a favorable outcome, whereas 100% of children younger than 5 months old and 77% of those older than 6 years had poor outcomes. The overall accuracy of the combined prediction model was 90.2% with a sensitivity of 68.4% and specificity of 93.6%.CONCLUSIONSA previously published survival model for severe TBI in children with a low GCS score was externally validated. With a larger data set, however, a simplified and more robust model was developed, and the variables most predictive of outcome were age and pupillary response.


Author(s):  
Kyungkoo Jun

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.


2021 ◽  
pp. 158-166
Author(s):  
Noah Balestra ◽  
Gaurav Sharma ◽  
Linda M. Riek ◽  
Ania Busza

<b><i>Background:</i></b> Prior studies suggest that participation in rehabilitation exercises improves motor function poststroke; however, studies on optimal exercise dose and timing have been limited by the technical challenge of quantifying exercise activities over multiple days. <b><i>Objectives:</i></b> The objectives of this study were to assess the feasibility of using body-worn sensors to track rehabilitation exercises in the inpatient setting and investigate which recording parameters and data analysis strategies are sufficient for accurately identifying and counting exercise repetitions. <b><i>Methods:</i></b> MC10 BioStampRC® sensors were used to measure accelerometer and gyroscope data from upper extremities of healthy controls (<i>n</i> = 13) and individuals with upper extremity weakness due to recent stroke (<i>n</i> = 13) while the subjects performed 3 preselected arm exercises. Sensor data were then labeled by exercise type and this labeled data set was used to train a machine learning classification algorithm for identifying exercise type. The machine learning algorithm and a peak-finding algorithm were used to count exercise repetitions in non-labeled data sets. <b><i>Results:</i></b> We achieved a repetition counting accuracy of 95.6% overall, and 95.0% in patients with upper extremity weakness due to stroke when using both accelerometer and gyroscope data. Accuracy was decreased when using fewer sensors or using accelerometer data alone. <b><i>Conclusions:</i></b> Our exploratory study suggests that body-worn sensor systems are technically feasible, well tolerated in subjects with recent stroke, and may ultimately be useful for developing a system to measure total exercise “dose” in poststroke patients during clinical rehabilitation or clinical trials.


Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2532
Author(s):  
Encarna Quesada ◽  
Juan J. Cuadrado-Gallego ◽  
Miguel Ángel Patricio ◽  
Luis Usero

Anomaly Detection research is focused on the development and application of methods that allow for the identification of data that are different enough—compared with the rest of the data set that is being analyzed—and considered anomalies (or, as they are more commonly called, outliers). These values mainly originate from two sources: they may be errors introduced during the collection or handling of the data, or they can be correct, but very different from the rest of the values. It is essential to correctly identify each type as, in the first case, they must be removed from the data set but, in the second case, they must be carefully analyzed and taken into account. The correct selection and use of the model to be applied to a specific problem is fundamental for the success of the anomaly detection study and, in many cases, the use of only one model cannot provide sufficient results, which can be only reached by using a mixture model resulting from the integration of existing and/or ad hoc-developed models. This is the kind of model that is developed and applied to solve the problem presented in this paper. This study deals with the definition and application of an anomaly detection model that combines statistical models and a new method defined by the authors, the Local Transilience Outlier Identification Method, in order to improve the identification of outliers in the sensor-obtained values of variables that affect the operations of wind tunnels. The correct detection of outliers for the variables involved in wind tunnel operations is very important for the industrial ventilation systems industry, especially for vertical wind tunnels, which are used as training facilities for indoor skydiving, as the incorrect performance of such devices may put human lives at risk. In consequence, the use of the presented model for outlier detection may have a high impact in this industrial sector. In this research work, a proof-of-concept is carried out using data from a real installation, in order to test the proposed anomaly analysis method and its application to control the correct performance of wind tunnels.


Sign in / Sign up

Export Citation Format

Share Document