Regression-Based Prediction Model using Contextual and Personal information for filling Missing Values in Personal Informatics systems: A Case Study using Fitbit data-set

2021 ◽  
Author(s):  
Nannan Wen
2020 ◽  
Author(s):  
Dongyan Ding ◽  
Tingyuan Lang ◽  
Dongling Zou ◽  
Jiawei Tan ◽  
Jia Chen ◽  
...  

Abstract Backgroud: Accurately forecasting the prognosis could improve therapeutic management of cancer patients, however, the currently used clinical features are difficult to provide enought information. The purpose of this study is to develop a survival prediction model for cervical cancer patients with big data and machine learning algorithms. Results: The cancer genome atlas cervical cancer data, including the expression of 1046 microRNAs and the clinical information of 309 cervical and endocervical cancer and 3 control samples, were downloaded. Missing values and outliers imputation, samples normalization, log transformation and features scaling were performed for preprocessing and 3 control, 2 metastatic samples and 707 microRNAs with missing values ≥ 20% were excluded. By Cox Proportional-Hazards analysis, 55 prognosis-related microRNAs (20 positively and 35 negatively correlated with survival) were identified. K-means clustering analysis showed that the cervical cancer samples can be separated into two and three subgroups with top 20 identified survival-related microRNAs for best stratification. By Support Vector Machine algorithm, two prediction models were developed which can segment the patients into two and three groups with different survival rate, respectively. The models exhibite high performance : for two classes, Area under the curve = 0.976 (training set), 0.972 (test set), 0.974 (whole data set); for three classes, AUC = 0.983, 0.996 and 0.991 (group1, 2 and 3 in training set), 0.955, 0.989 and 0.991 (group 1, 2 and 3 in test set), 0.974, 0.993 and 0.991 (group 1, 2 and 3 in whole data set) .Conclusion: The survival prediction models for cervical cancer were developed. The patients with very low survival rate (≤ 40%) can be separated by the three classes prediction model first. The rest patients can be identified by the two classes prediction model as high survival rate (≈ 75%) and low survival rate (≈ 50%).


2021 ◽  
Author(s):  
Jun Meng ◽  
Gangyi Ding ◽  
Laiyang Liu ◽  
Zheng Guan

Abstract In this study, a data-driven regional carbon emissions prediction model is proposed. The Grubbs criterion is used to eliminate the gross error data in carbon emissions sensor data. Then, according to the nearby valid data, the exponential smoothing method is used to interpolate the missing values to generate the continuous sequence for model training. Finally, the GRU network, which is a deep learning method, is used to process these sequential standardized data to obtain the prediction model. In this paper, the wireless carbon sensor network monitoring data set from August 2012 to April 2014 trained and evaluated the prediction model, and compared with the prediction model based on BP network. The experimental results prove the feasibility of the research method and related technical approaches, and the accuracy of the prediction model, which provides a method basis for the nowcasting of carbon emissions and other greenhouse gas environmental data.


2019 ◽  
Vol 23 (6) ◽  
pp. 670-679
Author(s):  
Krista Greenan ◽  
Sandra L. Taylor ◽  
Daniel Fulkerson ◽  
Kiarash Shahlaie ◽  
Clayton Gerndt ◽  
...  

OBJECTIVEA recent retrospective study of severe traumatic brain injury (TBI) in pediatric patients showed similar outcomes in those with a Glasgow Coma Scale (GCS) score of 3 and those with a score of 4 and reported a favorable long-term outcome in 11.9% of patients. Using decision tree analysis, authors of that study provided criteria to identify patients with a potentially favorable outcome. The authors of the present study sought to validate the previously described decision tree and further inform understanding of the outcomes of children with a GCS score 3 or 4 by using data from multiple institutions and machine learning methods to identify important predictors of outcome.METHODSClinical, radiographic, and outcome data on pediatric TBI patients (age < 18 years) were prospectively collected as part of an institutional TBI registry. Patients with a GCS score of 3 or 4 were selected, and the previously published prediction model was evaluated using this data set. Next, a combined data set that included data from two institutions was used to create a new, more statistically robust model using binomial recursive partitioning to create a decision tree.RESULTSForty-five patients from the institutional TBI registry were included in the present study, as were 67 patients from the previously published data set, for a total of 112 patients in the combined analysis. The previously published prediction model for survival was externally validated and performed only modestly (AUC 0.68, 95% CI 0.47, 0.89). In the combined data set, pupillary response and age were the only predictors retained in the decision tree. Ninety-six percent of patients with bilaterally nonreactive pupils had a poor outcome. If the pupillary response was normal in at least one eye, the outcome subsequently depended on age: 72% of children between 5 months and 6 years old had a favorable outcome, whereas 100% of children younger than 5 months old and 77% of those older than 6 years had poor outcomes. The overall accuracy of the combined prediction model was 90.2% with a sensitivity of 68.4% and specificity of 93.6%.CONCLUSIONSA previously published survival model for severe TBI in children with a low GCS score was externally validated. With a larger data set, however, a simplified and more robust model was developed, and the variables most predictive of outcome were age and pupillary response.


Author(s):  
Michael W. Pratt ◽  
M. Kyle Matsuba

Chapter 7 begins with an overview of Erikson’s ideas about intimacy and its place in the life cycle, followed by a summary of Bowlby and Ainsworth’s attachment theory framework and its relation to family development. The authors review existing longitudinal research on the development of family relationships in adolescence and emerging adulthood, focusing on evidence with regard to links to McAdams and Pals’ personality model. They discuss the evidence, both questionnaire and narrative, from the Futures Study data set on family relationships, including emerging adults’ relations with parents and, separately, with grandparents, as well as their anticipations of their own parenthood. As a way of illustrating the key personality concepts from this family chapter, the authors end with a case study of Jane Fonda in youth and her father, Henry Fonda, to illustrate these issues through the lives of a 20th-century Hollywood dynasty of actors.


Author(s):  
Michael W. Pratt ◽  
M. Kyle Matsuba

Chapter 6 reviews research on the topic of vocational/occupational development in relation to the McAdams and Pals tripartite personality framework of traits, goals, and life stories. Distinctions between types of motivations for the work role (as a job, career, or calling) are particularly highlighted. The authors then turn to research from the Futures Study on work motivations and their links to personality traits, identity, generativity, and the life story, drawing on analyses and quotes from the data set. To illustrate the key concepts from this vocation chapter, the authors end with a case study on Charles Darwin’s pivotal turning point, his round-the-world voyage as naturalist for the HMS Beagle. Darwin was an emerging adult in his 20s at the time, and we highlight the role of this journey as a turning point in his adult vocational development.


2003 ◽  
Vol 42 (05) ◽  
pp. 564-571 ◽  
Author(s):  
M. Schumacher ◽  
E. Graf ◽  
T. Gerds

Summary Objectives: A lack of generally applicable tools for the assessment of predictions for survival data has to be recognized. Prediction error curves based on the Brier score that have been suggested as a sensible approach are illustrated by means of a case study. Methods: The concept of predictions made in terms of conditional survival probabilities given the patient’s covariates is introduced. Such predictions are derived from various statistical models for survival data including artificial neural networks. The idea of how the prediction error of a prognostic classification scheme can be followed over time is illustrated with the data of two studies on the prognosis of node positive breast cancer patients, one of them serving as an independent test data set. Results and Conclusions: The Brier score as a function of time is shown to be a valuable tool for assessing the predictive performance of prognostic classification schemes for survival data incorporating censored observations. Comparison with the prediction based on the pooled Kaplan Meier estimator yields a benchmark value for any classification scheme incorporating patient’s covariate measurements. The problem of an overoptimistic assessment of prediction error caused by data-driven modelling as it is, for example, done with artificial neural nets can be circumvented by an assessment in an independent test data set.


Author(s):  
Ahmad R. Alsaber ◽  
Jiazhu Pan ◽  
Adeeba Al-Hurban 

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.


1997 ◽  
Vol 08 (03) ◽  
pp. 301-315 ◽  
Author(s):  
Marcel J. Nijman ◽  
Hilbert J. Kappen

A Radial Basis Boltzmann Machine (RBBM) is a specialized Boltzmann Machine architecture that combines feed-forward mapping with probability estimation in the input space, and for which very efficient learning rules exist. The hidden representation of the network displays symmetry breaking as a function of the noise in the dynamics. Thus, generalization can be studied as a function of the noise in the neuron dynamics instead of as a function of the number of hidden units. We show that the RBBM can be seen as an elegant alternative of k-nearest neighbor, leading to comparable performance without the need to store all data. We show that the RBBM has good classification performance compared to the MLP. The main advantage of the RBBM is that simultaneously with the input-output mapping, a model of the input space is obtained which can be used for learning with missing values. We derive learning rules for the case of incomplete data, and show that they perform better on incomplete data than the traditional learning rules on a 'repaired' data set.


Geophysics ◽  
2007 ◽  
Vol 72 (1) ◽  
pp. F25-F34 ◽  
Author(s):  
Benoit Tournerie ◽  
Michel Chouteau ◽  
Denis Marcotte

We present and test a new method to correct for the static shift affecting magnetotelluric (MT) apparent resistivity sounding curves. We use geostatistical analysis of apparent resistivity and phase data for selected periods. For each period, we first estimate and model the experimental variograms and cross variogram between phase and apparent resistivity. We then use the geostatistical model to estimate, by cokriging, the corrected apparent resistivities using the measured phases and apparent resistivities. The static shift factor is obtained as the difference between the logarithm of the corrected and measured apparent resistivities. We retain as final static shift estimates the ones for the period displaying the best correlation with the estimates at all periods. We present a 3D synthetic case study showing that the static shift is retrieved quite precisely when the static shift factors are uniformly distributed around zero. If the static shift distribution has a nonzero mean, we obtained best results when an apparent resistivity data subset can be identified a priori as unaffected by static shift and cokriging is done using only this subset. The method has been successfully tested on the synthetic COPROD-2S2 2D MT data set and on a 3D-survey data set from Las Cañadas Caldera (Tenerife, Canary Islands) severely affected by static shift.


Sign in / Sign up

Export Citation Format

Share Document