An efficient feature selection method for classification in health care systems using machine learning techniques

Author(s):  
K Selvakuberan ◽  
D Kayathiri ◽  
B Harini ◽  
M Indra Devi
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Miseon Shim ◽  
Seung-Hwan Lee ◽  
Han-Jeong Hwang

AbstractIn recent years, machine learning techniques have been frequently applied to uncovering neuropsychiatric biomarkers with the aim of accurately diagnosing neuropsychiatric diseases and predicting treatment prognosis. However, many studies did not perform cross validation (CV) when using machine learning techniques, or others performed CV in an incorrect manner, leading to significantly biased results due to overfitting problem. The aim of this study is to investigate the impact of CV on the prediction performance of neuropsychiatric biomarkers, in particular, for feature selection performed with high-dimensional features. To this end, we evaluated prediction performances using both simulation data and actual electroencephalography (EEG) data. The overall prediction accuracies of the feature selection method performed outside of CV were considerably higher than those of the feature selection method performed within CV for both the simulation and actual EEG data. The differences between the prediction accuracies of the two feature selection approaches can be thought of as the amount of overfitting due to selection bias. Our results indicate the importance of correctly using CV to avoid biased results of prediction performance of neuropsychiatric biomarkers.


2014 ◽  
Vol 23 (05) ◽  
pp. 1450014 ◽  
Author(s):  
Theodoros Iliou ◽  
Christos-Nikolaos Anagnostopoulos ◽  
George Anastassopoulos

Osteoporosis is a disease of bones that leads to an increased risk of fracture and it is characterized by low bone mineral density and micro-architectural deterioration of bone tissue. In this article, the dataset consists of 3426 subjects (1083 pathological and 2343 healthy cases) whose diagnosis was based on laboratory and osteal bone densitometry examination. In all cases, four diagnostic factors for osteoporosis risk prediction, namely age, sex, height and weight were stored for later evaluation with the selected classifiers. In order to categorize subjects into two classes (osteoporosis, nonosteoporosis), twenty machine learning techniques were assessed, based on their popularity and frequency in biomedical engineering problems. All classifiers have been evaluated using the wellknown 10-fold cross validation method and the results are reported analytically. In addition, a feature selection method identified that with the use of only two diagnostic factors (age and weight), similar performance could be achieved. The scope of the proposed exhaustive methodology is to assist therapists in osteoporosis prediction, avoiding unnecessary further testing with bone densitometry.


Author(s):  
Azhar M. A. ◽  
Princy Ann Thomas

Heart Failure is one of the common diseases that can lead to dangerous situations. There are several data available within the healthcare systems. However, there was an absence of successful analysis methods to find connections and patterns in health care data. Some Machine learning methods can help us remedy this circumstance. This helps in getting a better insight into the concept of a classification problem. In many classification problems, it is difficult to learn good classifiers before removing these unwanted features due to the huge size of the data. In my work, we have used an artificial neural network-based autoencoder for effective feature selection The aim of feature selection is improving prediction performance and providing a better understanding of the process data. Hybrid Classification method with a dynamic integration algorithm for classification that aims at finding optimal features by applying machine learning techniques resulting in improving the performance in the prediction of cardiovascular disease.


2021 ◽  
Vol 15 (1) ◽  
pp. 127-139
Author(s):  
Kanksha ◽  
Aman Bhaskar ◽  
Sagar Pande ◽  
Rahul Malik ◽  
Aditya Khamparia

Healthcare is an essential part of people’s lives, particularly for the elderly population, and also should be economical. Medicare is one particular healthcare plan. Claims fraud is a significant contributor to increased healthcare expenses, though the effect of it could be lessened by fraud detection. In this paper, an analysis of various machine learning techniques was done to identify Medicare fraud. The isolated forest an unsupervised machine learning algorithm which improves overall performance while detecting fraud based upon outliers. The goal of this specific paper is generally to show probable dishonest providers on the ground of their allegations. Obtained results were found more promising compared to existing techniques. Around 98.76% accuracy is obtained using an isolated forest algorithm.


Mathematics ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 1226
Author(s):  
Saeed Najafi-Zangeneh ◽  
Naser Shams-Gharneh ◽  
Ali Arjomandi-Nezhad ◽  
Sarfaraz Hashemkhani Zolfani

Companies always seek ways to make their professional employees stay with them to reduce extra recruiting and training costs. Predicting whether a particular employee may leave or not will help the company to make preventive decisions. Unlike physical systems, human resource problems cannot be described by a scientific-analytical formula. Therefore, machine learning approaches are the best tools for this aim. This paper presents a three-stage (pre-processing, processing, post-processing) framework for attrition prediction. An IBM HR dataset is chosen as the case study. Since there are several features in the dataset, the “max-out” feature selection method is proposed for dimension reduction in the pre-processing stage. This method is implemented for the IBM HR dataset. The coefficient of each feature in the logistic regression model shows the importance of the feature in attrition prediction. The results show improvement in the F1-score performance measure due to the “max-out” feature selection method. Finally, the validity of parameters is checked by training the model for multiple bootstrap datasets. Then, the average and standard deviation of parameters are analyzed to check the confidence value of the model’s parameters and their stability. The small standard deviation of parameters indicates that the model is stable and is more likely to generalize well.


2021 ◽  
pp. 002073142110174
Author(s):  
Md Mijanur Rahman ◽  
Fatema Khatun ◽  
Ashik Uzzaman ◽  
Sadia Islam Sami ◽  
Md Al-Amin Bhuiyan ◽  
...  

The novel coronavirus disease (COVID-19) has spread over 219 countries of the globe as a pandemic, creating alarming impacts on health care, socioeconomic environments, and international relationships. The principal objective of the study is to provide the current technological aspects of artificial intelligence (AI) and other relevant technologies and their implications for confronting COVID-19 and preventing the pandemic’s dreadful effects. This article presents AI approaches that have significant contributions in the fields of health care, then highlights and categorizes their applications in confronting COVID-19, such as detection and diagnosis, data analysis and treatment procedures, research and drug development, social control and services, and the prediction of outbreaks. The study addresses the link between the technologies and the epidemics as well as the potential impacts of technology in health care with the introduction of machine learning and natural language processing tools. It is expected that this comprehensive study will support researchers in modeling health care systems and drive further studies in advanced technologies. Finally, we propose future directions in research and conclude that persuasive AI strategies, probabilistic models, and supervised learning are required to tackle future pandemic challenges.


2020 ◽  
Vol 22 (Supplement_2) ◽  
pp. ii158-ii158
Author(s):  
Nicholas Nuechterlein ◽  
Beibin Li ◽  
James Fink ◽  
David Haynor ◽  
Eric Holland ◽  
...  

Abstract BACKGROUND Previously, we have shown that combined whole-exome sequencing (WES) and genome-wide somatic copy number alteration (SCNA) information can separate IDH1/2-wildtype glioblastoma into two prognostic molecular subtypes (Group 1 and Group 2) and that these subtypes cannot be distinguished by epigenetic or clinical features. However, the potential for radiographic features to discriminate between these molecular subtypes has not been established. METHODS Radiogenomic features (n=35,400) were extracted from 46 multiparametric, pre-operative magnetic resonance imaging (MRI) of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive, all of whom have corresponding WES and SCNA data in The Cancer Genome Atlas. We developed a novel feature selection method that leverages the structure of extracted radiogenomic MRI features to mitigate the dimensionality challenge posed by the disparity between the number of features and patients in our cohort. Seven traditional machine learning classifiers were trained to distinguish Group 1 versus Group 2 using our feature selection method. Our feature selection was compared to lasso feature selection, recursive feature elimination, and variance thresholding. RESULTS We are able to classify Group 1 versus Group 2 glioblastomas with a cross-validated area under the curve (AUC) score of 0.82 using ridge logistic regression and our proposed feature selection method, which reduces the size of our feature set from 35,400 to 288. An interrogation of the selected features suggests that features describing contours in the T2 abnormality region on the FLAIR MRI modality may best distinguish these two groups from one another. CONCLUSIONS We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups. This algorithm may be applied to future prospective studies to assess the utility of MRI as a surrogate for costly prognostic genomic studies.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Nicholas Nuechterlein ◽  
Beibin Li ◽  
Abdullah Feroze ◽  
Eric C Holland ◽  
Linda Shapiro ◽  
...  

Abstract Background Combined whole-exome sequencing (WES) and somatic copy number alteration (SCNA) information can separate isocitrate dehydrogenase (IDH)1/2-wildtype glioblastoma into two prognostic molecular subtypes, which cannot be distinguished by epigenetic or clinical features. The potential for radiographic features to discriminate between these molecular subtypes has yet to be established. Methods Radiologic features (n = 35 340) were extracted from 46 multisequence, pre-operative magnetic resonance imaging (MRI) scans of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive (TCIA), all of whom have corresponding WES/SCNA data. We developed a novel feature selection method that leverages the structure of extracted MRI features to mitigate the dimensionality challenge posed by the disparity between a large number of features and the limited patients in our cohort. Six traditional machine learning classifiers were trained to distinguish molecular subtypes using our feature selection method, which was compared to least absolute shrinkage and selection operator (LASSO) feature selection, recursive feature elimination, and variance thresholding. Results We were able to classify glioblastomas into two prognostic subgroups with a cross-validated area under the curve score of 0.80 (±0.03) using ridge logistic regression on the 15-dimensional principle component analysis (PCA) embedding of the features selected by our novel feature selection method. An interrogation of the selected features suggested that features describing contours in the T2 signal abnormality region on the T2-weighted fluid-attenuated inversion recovery (FLAIR) MRI sequence may best distinguish these two groups from one another. Conclusions We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups.


Sign in / Sign up

Export Citation Format

Share Document