scholarly journals A Novel Approach for Missing Value Imputation and Classification of Microarray Dataset

2012 ◽  
Vol 38 ◽  
pp. 1067-1071 ◽  
Author(s):  
Rajashree Senapti ◽  
Kailash Shaw ◽  
Sashikala Mishra ◽  
Debahuti Mishra
2018 ◽  
Author(s):  
Stefan Bischof ◽  
Andreas Harth ◽  
Benedikt KKmpgen ◽  
Axel Polleres ◽  
Patrik Schneider

Author(s):  
David Lewis-Smith ◽  
Shiva Ganesan ◽  
Peter D. Galer ◽  
Katherine L. Helbig ◽  
Sarah E. McKeown ◽  
...  

AbstractWhile genetic studies of epilepsies can be performed in thousands of individuals, phenotyping remains a manual, non-scalable task. A particular challenge is capturing the evolution of complex phenotypes with age. Here, we present a novel approach, applying phenotypic similarity analysis to a total of 3251 patient-years of longitudinal electronic medical record data from a previously reported cohort of 658 individuals with genetic epilepsies. After mapping clinical data to the Human Phenotype Ontology, we determined the phenotypic similarity of individuals sharing each genetic etiology within each 3-month age interval from birth up to a maximum age of 25 years. 140 of 600 (23%) of all 27 genes and 3-month age intervals with sufficient data for calculation of phenotypic similarity were significantly higher than expect by chance. 11 of 27 genetic etiologies had significant overall phenotypic similarity trajectories. These do not simply reflect strong statistical associations with single phenotypic features but appear to emerge from complex clinical constellations of features that may not be strongly associated individually. As an attempt to reconstruct the cognitive framework of syndrome recognition in clinical practice, longitudinal phenotypic similarity analysis extends the traditional phenotyping approach by utilizing data from electronic medical records at a scale that is far beyond the capabilities of manual phenotyping. Delineation of how the phenotypic homogeneity of genetic epilepsies varies with age could improve the phenotypic classification of these disorders, the accuracy of prognostic counseling, and by providing historical control data, the design and interpretation of precision clinical trials in rare diseases.


Author(s):  
Caio Ribeiro ◽  
Alex A. Freitas

AbstractLongitudinal datasets of human ageing studies usually have a high volume of missing data, and one way to handle missing values in a dataset is to replace them with estimations. However, there are many methods to estimate missing values, and no single method is the best for all datasets. In this article, we propose a data-driven missing value imputation approach that performs a feature-wise selection of the best imputation method, using known information in the dataset to rank the five methods we selected, based on their estimation error rates. We evaluated the proposed approach in two sets of experiments: a classifier-independent scenario, where we compared the applicabilities and error rates of each imputation method; and a classifier-dependent scenario, where we compared the predictive accuracy of Random Forest classifiers generated with datasets prepared using each imputation method and a baseline approach of doing no imputation (letting the classification algorithm handle the missing values internally). Based on our results from both sets of experiments, we concluded that the proposed data-driven missing value imputation approach generally resulted in models with more accurate estimations for missing data and better performing classifiers, in longitudinal datasets of human ageing. We also observed that imputation methods devised specifically for longitudinal data had very accurate estimations. This reinforces the idea that using the temporal information intrinsic to longitudinal data is a worthwhile endeavour for machine learning applications, and that can be achieved through the proposed data-driven approach.


Sign in / Sign up

Export Citation Format

Share Document