AN HYBRID APPROACH TO FEATURE SELECTION FOR MIXED CATEGORICAL AND CONTINUOUS DATA

2022 ◽  
Vol 191 ◽  
pp. 116302
Author(s):  
Akshata K. Naik ◽  
Venkatanareshbabu Kuppili

2014 ◽  
Vol 42 (15) ◽  
pp. e122-e122 ◽  
Author(s):  
Amit Kumar Srivastava ◽  
Rupali Chopra ◽  
Shafat Ali ◽  
Shweta Aggarwal ◽  
Lovekesh Vig ◽  
...  

Abstract Inundation of evolutionary markers expedited in Human Genome Project and 1000 Genome Consortium has necessitated pruning of redundant and dependent variables. Various computational tools based on machine-learning and data-mining methods like feature selection/extraction have been proposed to escape the curse of dimensionality in large datasets. Incidentally, evolutionary studies, primarily based on sequentially evolved variations have remained un-facilitated by such advances till date. Here, we present a novel approach of recursive feature selection for hierarchical clustering of Y-chromosomal SNPs/haplogroups to select a minimal set of independent markers, sufficient to infer population structure as precisely as deduced by a larger number of evolutionary markers. To validate the applicability of our approach, we optimally designed MALDI-TOF mass spectrometry-based multiplex to accommodate independent Y-chromosomal markers in a single multiplex and genotyped two geographically distinct Indian populations. An analysis of 105 world-wide populations reflected that 15 independent variations/markers were optimal in defining population structure parameters, such as FST, molecular variance and correlation-based relationship. A subsequent addition of randomly selected markers had a negligible effect (close to zero, i.e. 1 × 10−3) on these parameters. The study proves efficient in tracing complex population structures and deriving relationships among world-wide populations in a cost-effective and expedient manner.


2019 ◽  
pp. 389
Author(s):  
زينب عبدالأمير ◽  
علياء كريم عبدالحسن

Sign in / Sign up

Export Citation Format

Share Document