Machine learning methods with feature selection approach to estimate software services development effort

AbstractBackgroundMachine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called “short fat data” problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Thus, it is very important to perform variable selection before searching for epistasis. Many methods have been evaluated and proposed to perform feature selection, but no single method works best in all scenarios. We demonstrate this by conducting two separate simulation analyses to evaluate the proposed collective feature selection approach.ResultsThrough our simulation study we propose a collective feature selection approach to select features that are in the “union” of the best performing methods. We explored various parametric, non-parametric, and data mining approaches to perform feature selection. We choose our top performing methods to select the union of the resulting variables based on a user-defined percentage of variants selected from each method to take to downstream analysis. Our simulation analysis shows that non-parametric data mining approaches, such as MDR, may work best under one simulation criteria for the high effect size (penetrance) datasets, while non-parametric methods designed for feature selection, such as Ranger and Gradient boosting, work best under other simulation criteria. Thus, using a collective approach proves to be more beneficial for selecting variables with epistatic effects also in low effect size datasets and different genetic architectures. Following this, we applied our proposed collective feature selection approach to select the top 1% of variables to identify potential interacting variables associated with Body Mass Index (BMI) in ~44,000 samples obtained from Geisinger’s MyCode Community Health Initiative (on behalf of DiscovEHR collaboration).ConclusionsIn this study, we were able to show that selecting variables using a collective feature selection approach could help in selecting true positive epistatic variables more frequently than applying any single method for feature selection via simulation studies. We were able to demonstrate the effectiveness of collective feature selection along with a comparison of many methods in our simulation analysis. We also applied our method to identify non-linear networks associated with obesity.

Download Full-text

Feature Selection and Machine Learning Methods for Optimal Identification and Prediction of Subtypes in Parkinson's Disease

Computer Methods and Programs in Biomedicine ◽

10.1016/j.cmpb.2021.106131 ◽

2021 ◽

pp. 106131

Author(s):

Mohammad R. Salmanpour ◽

Mojtaba Shamsaei ◽

Arman Rahmim

Keyword(s):

Machine Learning ◽

Parkinson’S Disease ◽

Parkinson's Disease ◽

Feature Selection ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Improved Permeability Prediction of Porous Media by Feature Selection and Machine Learning Methods Comparison

Journal of Computing in Civil Engineering ◽

10.1061/(asce)cp.1943-5487.0000983 ◽

2022 ◽

Vol 36 (2) ◽

Author(s):

J. W. Tian ◽

Chongchong Qi ◽

Kang Peng ◽

Yingfeng Sun ◽

Zaher Mundher Yaseen

Keyword(s):

Machine Learning ◽

Porous Media ◽

Feature Selection ◽

Learning Methods ◽

Methods Comparison ◽

Machine Learning Methods ◽

Permeability Prediction

Download Full-text

Experiments on the Use of Feature Selection and Machine Learning Methods in Automatic Malay Text Categorization

Procedia Technology ◽

10.1016/j.protcy.2013.12.254 ◽

2013 ◽

Vol 11 ◽

pp. 748-754 ◽

Cited By ~ 6

Author(s):

Hamood Alshalabi ◽

Sabrina Tiun ◽

Nazlia Omar ◽

Mohammed Albared

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Text Categorization ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Laser-induced breakdown spectroscopy for the classification of wood materials using machine learning methods combined with feature selection

Plasma Science and Technology ◽

10.1088/2058-6272/abf1ac ◽

2021 ◽

Author(s):

Xutai Cui ◽

Qianqian Wang ◽

Kai Wei ◽

Geer Teng ◽

Xiangjun Xu

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Laser Induced Breakdown Spectroscopy ◽

Learning Methods ◽

Breakdown Spectroscopy ◽

Machine Learning Methods ◽

Laser Induced Breakdown

Download Full-text

Early Detection of the Alzheimer’s Disease: A Novel Cognitive Feature Selection Approach Using Machine Learning

Advances in Information, Communication and Cybersecurity - Lecture Notes in Networks and Systems ◽

10.1007/978-3-030-91738-8_35 ◽

2022 ◽

pp. 383-392

Author(s):

Muhammad Irfan ◽

Seyed Shahrestani ◽

Mahmoud Elkhodr

Keyword(s):

Machine Learning ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Feature Selection ◽

Early Detection ◽

Selection Approach ◽

Feature Selection Approach ◽

Cognitive Feature

Download Full-text

Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methods

BMC Bioinformatics ◽

10.1186/1471-2105-14-170 ◽

2013 ◽

Vol 14 (1) ◽

Cited By ~ 39

Author(s):

Siow-Wee Chang ◽

Sameem Abdul-Kareem ◽

Amir Feisal Merican ◽

Rosnah Binti Zain

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Oral Cancer ◽

Cancer Prognosis ◽

Learning Methods ◽

Machine Learning Methods ◽

Genomic Markers

Download Full-text

Development of machine learning‐based real time scheduling systems: using ensemble based on wrapper feature selection approach

International Journal of Production Research ◽

10.1080/00207543.2011.636389 ◽

2012 ◽

Vol 50 (20) ◽

pp. 5887-5905 ◽

Cited By ~ 6

Author(s):

Yeou-Ren Shiue ◽

Ruey‐Shiang Guh ◽

Ken‐Chun Lee

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Real Time ◽

Real Time Scheduling ◽

Selection Approach ◽

Time Scheduling ◽

Feature Selection Approach ◽

Wrapper Feature Selection

Download Full-text

Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method

Iranian Journal of Pediatric Hematology & Oncology ◽

10.18502/ijpho.v11i2.5838 ◽

2021 ◽

Author(s):

Razieh Sheikhpour ◽

Roohallah Fazli ◽

Sanaz Mehrabani

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Microarray Data ◽

Lymphoblastic Leukemia ◽

Feature Selection Method ◽

Selection Method ◽

Learning Methods ◽

Machine Learning Methods ◽

Acute Myeloid ◽

Sparse Feature Selection

Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expression of 7129 genes of 25 patients with acute myeloid leukemia (AML), and 47 patients with lymphoblastic leukemia (ALL) achieved by the microarray technology were used in this study. Then, the important genes were identified using a sparse feature selection method to diagnose AML and ALL tissues based on the machine learning methods such as support vector machine (SVM), Gaussian kernel density estimation based classifier (GKDEC), k-nearest neighbor (KNN), and linear discriminant classifier (LDC). Results: Diagnosis of ALL and AML was done with the accuracy of 100% using 8 genes of microarray data selected by the sparse feature selection method, GKDEC, and LDC. Moreover, the KNN classifier using 6 genes and the SVM classifier using 7 genes diagnosed AML and ALL with the accuracy of 91.18% and 94.12%, respectively. The gene with the description “Paired-box protein PAX2 (PAX2) gene, exon 11 and complete CDs” was determined as the most important gene in the diagnosis of ALL and AML. Conclusion: The experimental results of the current study showed that AML and ALL can be diagnosed with high accuracy using sparse feature selection and machine learning methods. It seems that the investigation of the expression of selected genes in this study can be helpful in the diagnosis of ALL and AML.

Download Full-text