Clustering of Variables with Missing Data: Application to Preference Studies

Author(s):  
Karin Sahmer ◽  
Evelyne Vigneau ◽  
Mostafa El Qannari ◽  
Joachim Kunert
2021 ◽  
Author(s):  
Amanda E. Gentry ◽  
Robert M. Kirkpatrick ◽  
Roseann E. Peterson ◽  
Bradley T. Webb

AbstractThe availability of large-scale biobanks linking rich phenotypes and biological measures are a powerful opportunity for scientific discovery. However, real-world collections frequently have extensive non-random missing data. Machine learning methods are able to predict missing data but performance is significantly impaired by block-wise missingness inherent to many biobanks. To address this, we developed Missingness Adapted Group-wise Informed Clustered LASSO (MAGIC-LASSO) which performs hierarchical clustering of variables based on missingness followed by sequential Group LASSO within clusters. Variables are pre-filtered for missingness and balance between training and target sets with final models built using stepwise inclusion of features ranked by completeness. This research has been conducted using the UK Biobank (n>500k) to predict unmeasured Alcohol Use Disorders Identification Test (AUDIT.) The phenotypic correlation between measured and predicted total score was 0.67 while genetic correlations between independent subjects was >0.86, demonstrating the method has significant accuracy and utility.


2021 ◽  
Vol 14 (2) ◽  
pp. 365-382
Author(s):  
Francis Erebholo ◽  
Victor Apprey ◽  
Paul Bezandry ◽  
John Kwagyan

PLoS Genetics ◽  
2018 ◽  
Vol 14 (7) ◽  
pp. e1007452 ◽  
Author(s):  
Yu Jiang ◽  
Sai Chen ◽  
Daniel McGuire ◽  
Fang Chen ◽  
Mengzhen Liu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document