Origin–Destination Matrix Estimation and Prediction from Socioeconomic Variables Using Automatic Feature Selection Procedure-Based Machine Learning Model

2021 ◽  
Vol 147 (4) ◽  
pp. 04021056
Author(s):  
P. J. Rodríguez-Rueda ◽  
J. J. Ruiz-Aguilar ◽  
J. González-Enrique ◽  
I. Turias
Author(s):  
Hwayoung Park ◽  
Sungtae Shin ◽  
Changhong Youm ◽  
Sang-Myung Cheon ◽  
Myeounggon Lee ◽  
...  

Abstract Background Freezing of gait (FOG) is a sensitive problem, which is caused by motor control deficits and requires greater attention during postural transitions such as turning in people with Parkinson’s disease (PD). However, the turning characteristics have not yet been extensively investigated to distinguish between people with PD with and without FOG (freezers and non-freezers) based on full-body kinematic analysis during the turning task. The objectives of this study were to identify the machine learning model that best classifies people with PD and freezers and reveal the associations between clinical characteristics and turning features based on feature selection through stepwise regression. Methods The study recruited 77 people with PD (31 freezers and 46 non-freezers) and 34 age-matched older adults. The 360° turning task was performed at the preferred speed for the inner step of the more affected limb. All experiments on the people with PD were performed in the “Off” state of medication. The full-body kinematic features during the turning task were extracted using the three-dimensional motion capture system. These features were selected via stepwise regression. Results In feature selection through stepwise regression, five and six features were identified to distinguish between people with PD and controls and between freezers and non-freezers (PD and FOG classification problem), respectively. The machine learning model accuracies revealed that the random forest (RF) model had 98.1% accuracy when using all turning features and 98.0% accuracy when using the five features selected for PD classification. In addition, RF and logistic regression showed accuracies of 79.4% when using all turning features and 72.9% when using the six selected features for FOG classification. Conclusion We suggest that our study leads to understanding of the turning characteristics of people with PD and freezers during the 360° turning task for the inner step of the more affected limb and may help improve the objective classification and clinical assessment by disease progression using turning features.


Author(s):  
J. V. D. Prasad ◽  
A. Raghuvira Pratap ◽  
Babu Sallagundla

With the rapid increase in number of clinical data and hence the prediction and analysing data becomes very difficult. With the help of various machine learning models, it becomes easy to work on these huge data. A machine learning model faces lots of challenges; one among the challenge is feature selection. In this research work, we propose a novel feature selection method based on statistical procedures to increase the performance of the machine learning model. Furthermore, we have tested the feature selection algorithm in liver disease classification dataset and the results obtained shows the efficiency of the proposed method.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Fengyi Zhang ◽  
Xinyuan Cui ◽  
Renrong Gong ◽  
Chuan Zhang ◽  
Zhigao Liao

This study aimed to provide effective methods for the identification of surgeries with high cancellation risk based on machine learning models and analyze the key factors that affect the identification performance. The data covered the period from January 1, 2013, to December 31, 2014, at West China Hospital in China, which focus on elective urologic surgeries. All surgeries were scheduled one day in advance, and all cancellations were of institutional resource- and capacity-related types. Feature selection strategies, machine learning models, and sampling methods are the most discussed topic in general machine learning researches and have a direct impact on the performance of machine learning models. Hence, they were considered to systematically generate complete schemes in machine learning-based identification of surgery cancellations. The results proved the feasibility and robustness of identifying surgeries with high cancellation risk, with the considerable maximum of area under the curve (AUC) (0.7199) for random forest model with original sampling using backward selection strategy. In addition, one-side Delong test and sum of square error analysis were conducted to measure the effects of feature selection strategy, machine learning model, and sampling method on the identification of surgeries with high cancellation risk, and the selection of machine learning model was identified as the key factors that affect the identification of surgeries with high cancellation risk. This study offers methodology and insights for identifying the key experimental factors for identifying surgery cancellations, and it is helpful to further research on machine learning-based identification of surgeries with high cancellation risk.


2020 ◽  
Author(s):  
Ivan Alejandro Garcia Ramirez ◽  
Arturo Calderon ◽  
Andrés Méndez ◽  
Susana Ortega

Abstract Motivation: Datasets with high dimensionality represent a challenge to existing learning methods. The presence of irrelevant and redundant features in a dataset can degrade the performance of the models inferred from it. In large datasets, manual management of features tends to be impractical. Therefore, the development of automatic discovery techniques to remove useless features has attracted increasing interest. In this paper, we propose a novell framework to select relevant features in supervised datasets. Availability: This tool can be downloaded from https://github.com/ivangarcia88/selectionResults: This tool allow to identify relevant and remove redundant features, reducing computation time on training a machine learning model while improving the performance.


2020 ◽  
Vol 10 (15) ◽  
pp. 5046
Author(s):  
Andreas Nicolaou ◽  
Stavros Shiaeles ◽  
Nick Savage

Insider threats have become a considerable information security issue that governments and organizations must face. The implementation of security policies and procedures may not be enough to protect organizational assets. Even with the evolution of information and network security technology, the threat from insiders is increasing. Many researchers are approaching this issue with various methods in order to develop a model that will help organizations to reduce their exposure to the threat and prevent damage to their assets. In this paper, we approach the insider threat problem and attempt to mitigate it by developing a machine learning model based on Bio-inspired computing. The model was developed by using an existing unsupervised learning algorithm for anomaly detection and we fitted the model to a synthetic dataset to detect outliers. We explore swarm intelligence algorithms and their performance on feature selection optimization for improving the performance of the machine learning model. The results show that swarm intelligence algorithms perform well on feature selection optimization and the generated, near-optimal, subset of features has a similar performance to the original one.


2021 ◽  
Vol 11 ◽  
Author(s):  
Cheng Chang ◽  
Xiaoyan Sun ◽  
Gang Wang ◽  
Hong Yu ◽  
Wenlu Zhao ◽  
...  

ObjectivesAnaplastic lymphoma kinase (ALK) rearrangement status examination has been widely used in clinic for non-small cell lung cancer (NSCLC) patients in order to find patients that can be treated with targeted ALK inhibitors. This study intended to non-invasively predict the ALK rearrangement status in lung adenocarcinomas by developing a machine learning model that combines PET/CT radiomic features and clinical characteristics.MethodsFive hundred twenty-six patients of lung adenocarcinoma with PET/CT scan examination were enrolled, including 109 positive and 417 negative patients for ALK rearrangements from February 2016 to March 2019. The Artificial Intelligence Kit software was used to extract radiomic features of PET/CT images. The maximum relevance minimum redundancy (mRMR) and least absolute shrinkage and selection operator (LASSO) logistic regression were further employed to select the most distinguishable radiomic features to construct predictive models. The mRMR is a feature selection method, which selects the features with high correlation to the pathological results (maximum correlation), meanwhile retain the features with minimum correlation between them (minimum redundancy). LASSO is a statistical formula whose main purpose is the feature selection and regularization of data model. LASSO method regularizes model parameters by shrinking the regression coefficients, reducing some of them to zero. The feature selection phase occurs after the shrinkage, where every non-zero value is selected to be used in the model. Receiver operating characteristic (ROC) analysis was used to evaluate the performance of the models, and the performance of different models was compared by the DeLong test.ResultsA total of 22 radiomic features were extracted from PET/CT images for constructing the PET/CT radiomic model, and majority of these features used were based on CT features (20 out of 22), only 2 PET features were included (PET percentile 10 and PET difference entropy). Moreover, three clinical features associated with ALK mutation (age, burr and pleural effusion) were also employed to construct a combined model of PET/CT and clinical model. We found that this combined model PET/CT-clinical model has a significant advantage to predict the ALK mutation status in the training group (AUC = 0.87) and the testing group (AUC = 0.88) compared with the clinical model alone in the training group (AUC = 0.76) and the testing group (AUC = 0.74) respectively. However, there is no significant difference between the combined model and PET/CT radiomic model.ConclusionsThis study demonstrated that PET/CT radiomics-based machine learning model has potential to be used as a non-invasive diagnostic method to help diagnose ALK mutation status for lung adenocarcinoma patients in the clinic.


Sign in / Sign up

Export Citation Format

Share Document