scholarly journals Copy Mean: A New Method to Impute Intermittent Missing Values in Longitudinal Studies

2013 ◽  
Vol 03 (04) ◽  
pp. 26-40 ◽  
Author(s):  
Christophe Genolini ◽  
René Écochard ◽  
Hélène Jacqmin-Gadda
2016 ◽  
Vol 132 ◽  
pp. 29-44 ◽  
Author(s):  
Christophe Genolini ◽  
Amandine Lacombe ◽  
René Écochard ◽  
Fabien Subtil

2021 ◽  
Author(s):  
Panagiotis Anagnostou ◽  
Sotiris Tasoulis ◽  
Aristidis G. Vrahatis ◽  
Spiros Georgakopoulos ◽  
Matthew Prina ◽  
...  

AbstractPreventive healthcare is a crucial pillar of health as it contributes to staying healthy and having immediate treatment when needed. Mining knowledge from longitudinal studies has the potential to significantly contribute to the improvement of preventive healthcare. Unfortunately, data originated from such studies are characterized by high complexity, huge volume and a plethora of missing values. Machine Learning, Data Mining and Data Imputation models are utilized as part of solving the aforementioned challenges, respectively. Towards this direction, we focus on the development of a complete methodology for the ATHLOS (Ageing Trajectories of Health: Longitudinal Opportunities and Synergies) Project - funded by the European Union’s Horizon 2020 Research and Innovation Program, which aims to achieve a better interpretation of the impact of aging on health. The inherent complexity of the provided dataset lie in the fact that the project includes 15 independent European and international longitudinal studies of aging. In this work, we particularly focus on the HealthStatus (HS) score, an index that estimates the human status of health, aiming to examine the effect of various data imputation models to the prediction power of classification and regression models. Our results are promising, indicating the critical importance of data imputation in enhancing preventive medicine’s crucial role.


2021 ◽  
Author(s):  
Boli Yang ◽  
Yan Feng ◽  
Ruyin Cao

<p>Cloud contamination is a serious obstacle for the application of Landsat data. Thick clouds can completely block land surface information and lead to missing values. The reconstruction of missing values in a Landsat cloud image requires the cloud and cloud shadow mask. In this study, we raised the issue that the quality of the quality assessment (QA) band in current Landsat products cannot meet the requirement of thick-cloud removal. To address this issue, we developed a new method (called Auto-PCP) to preprocess the original QA band, with the ultimate objective to improve the performance of cloud removal on Landsat cloud images. We tested the new method at four test sites and compared cloud-removed images generated by using three different QA bands, including the original QA band, the modified QA band by a dilation of two pixels around cloud and cloud shadow edges, and the QA band processed by Auto-PCP (“QA_Auto-PCP”). Experimental results, from both actual and simulated Landsat cloud images, show that QA_Auto-PCP achieved the best visual assessment for the cloud-removed images, and had the smallest RMSE values and the largest Structure SIMilarity index (SSIM) values. The improvement for the performance of cloud removal by QA_Auto-PCP is because the new method substantially decreases omission errors of clouds and shadows in the original QA band, but meanwhile does not increase commission errors. Moreover, Auto-PCP is easy to implement and uses the same data as cloud removal without additional image collections. We expect that Auto-PCP can further popularize cloud removal and advance the application of Landsat data.     </p><p><strong> </strong></p><p><strong>Keywords: </strong>Cloud detection, Cloud shadows, Cloud simulation, Cloud removal, MODTRAN</p>


1987 ◽  
Vol 40 (5) ◽  
pp. 373-383 ◽  
Author(s):  
Mary E. Charlson ◽  
Peter Pompei ◽  
Kathy L. Ales ◽  
C.Ronald MacKenzie

2020 ◽  
Author(s):  
Zeus Gracia-Tabuenca ◽  
Sarael Alcauter

AbstractNetwork neuroscience models the brain as interacting elements. However, a large number of elements imply a vast number of interactions, making it difficult to assess which connections are relevant and which are spurious. Zalesky et al. (2010) proposed the Network-Based Statistics (NBS), which identifies clusters of connections and tests their likelihood via permutation tests. This framework shows a better trade-off of Type I and II errors compared to conventional multiple comparison corrections. NBS uses General Linear Hypothesis Testing (GLHT), which may underestimate the within-subject variance structure when dealing with longitudinal samples with a varying number of observations (unbalanced samples). We implemented NBR, an R-package that extends the NBS framework adding (non)linear mixed-effects (LME) models. LME models the within-subject variance in more detail, and deals with missing values more flexibly. To illustrate its advantages, we used a public dataset of 333 human participants (188/145 females/males; age range: 17.0-28.4 y.o.) with two (n=212) or three (n=121) sessions each. Sessions include a resting-state fMRI scan and psychometric data. State anxiety scores and connectivity matrices between brain lobes were extracted. We tested their relationship using GLHT and LME models for balanced and unbalanced datasets, respectively. Only the LME approach found a significant association between state anxiety and a subnetwork that includes the cingulum, frontal, parietal, occipital, and cerebellum. Given that missing data is very common in longitudinal studies, we expect that NBR will be very useful to explore unbalanced samples.Significant StatementLongitudinal studies are increasing in neuroscience, providing new insights into the brain under treatment, development, or aging. Nevertheless, missing data is highly frequent in those studies, and conventional designs may discard incomplete observations or underestimate the within-subject variance. We developed a publicly available software (R package: NBR) that implements mixed-effect models into every possible connection in a sample of networks, and it can find significant subsets of connections using non-parametric permutation tests. We demonstrate that using NBR on larger unbalanced samples has higher statistical power than when exploring the balanced subsamples. Although this method is applicable in general network analysis, we anticipate this method being potentially useful in systems neuroscience considering the increase of longitudinal samples in the field.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Matúš Medo ◽  
Daniel M. Aebersold ◽  
Michaela Medová

Abstract Background Data from discovery proteomic and phosphoproteomic experiments typically include missing values that correspond to proteins that have not been identified in the analyzed sample. Replacing the missing values with random numbers, a process known as “imputation”, avoids apparent infinite fold-change values. However, the procedure comes at a cost: Imputing a large number of missing values has the potential to significantly impact the results of the subsequent differential expression analysis. Results We propose a method that identifies differentially expressed proteins by ranking their observed changes with respect to the changes observed for other proteins. Missing values are taken into account by this method directly, without the need to impute them. We illustrate the performance of the new method on two distinct datasets and show that it is robust to missing values and, at the same time, provides results that are otherwise similar to those obtained with edgeR which is a state-of-art differential expression analysis method. Conclusions The new method for the differential expression analysis of proteomic data is available as an easy to use Python package.


Sign in / Sign up

Export Citation Format

Share Document