Threshold benchmarking for feature ranking techniques

In prediction modeling, the choice of features chosen from the original feature set is crucial for accuracy and model interpretability. Feature ranking techniques rank the features by its importance but there is no consensus on the number of features to be cut-off. Thus, it becomes important to identify a threshold value or range, so as to remove the redundant features. In this work, an empirical study is conducted for identification of the threshold benchmark for feature ranking algorithms. Experiments are conducted on Apache Click dataset with six popularly used ranker techniques and six machine learning techniques, to deduce a relationship between the total number of input features (N) to the threshold range. The area under the curve analysis shows that ≃ 33-50% of the features are necessary and sufficient to yield a reasonable performance measure, with a variance of 2%, in defect prediction models. Further, we also find that the log2(N) as the ranker threshold value represents the lower limit of the range.

Download Full-text

Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study

10.1109/icscc51209.2021.9528170 ◽

2021 ◽

Author(s):

Sushant Kumar Pandey ◽

Anil Kumar Tripathi

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Prediction Models ◽

Class Imbalance ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Learning Techniques ◽

Defect Prediction Models

Download Full-text

Assessment of defect prediction models using machine learning techniques for object-oriented systems

2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) ◽

10.1109/icrito.2016.7785021 ◽

2016 ◽

Cited By ~ 1

Author(s):

Ruchika Malhotra ◽

Shivani Shukla ◽

Geet Sawhney

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Object Oriented ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Learning Techniques ◽

Defect Prediction Models ◽

Object Oriented Systems

Download Full-text

A Review of Statistical and Machine Learning Techniques for Microvascular Complications in Type 2 Diabetes

Current Diabetes Reviews ◽

10.2174/1573399816666200511003357 ◽

2020 ◽

Vol 16 ◽

Author(s):

Nitigya Sambyal ◽

Poonam Saini ◽

Rupali Syal

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Clinical Medicine ◽

Microvascular Complications ◽

Descriptive Analysis ◽

Machine Learning Techniques ◽

World Health ◽

Public Health Issue ◽

Learning Techniques ◽

Health Organization

Background and Introduction: Diabetes mellitus is a metabolic disorder that has emerged as a serious public health issue worldwide. According to the World Health Organization (WHO), without interventions, the number of diabetic incidences is expected to be at least 629 million by 2045. Uncontrolled diabetes gradually leads to progressive damage to eyes, heart, kidneys, blood vessels and nerves. Method: The paper presents a critical review of existing statistical and Artificial Intelligence (AI) based machine learning techniques with respect to DM complications namely retinopathy, neuropathy and nephropathy. The statistical and machine learning analytic techniques are used to structure the subsequent content review. Result: It has been inferred that statistical analysis can help only in inferential and descriptive analysis whereas, AI based machine learning models can even provide actionable prediction models for faster and accurate diagnose of complications associated with DM. Conclusion: The integration of AI based analytics techniques like machine learning and deep learning in clinical medicine will result in improved disease management through faster disease detection and cost reduction for disease treatment.

Download Full-text

Sustainability Performance Assessment Using Self-Organizing Maps (SOM) and Classification and Ensembles of Regression Trees (CART)

Sustainability ◽

10.3390/su13073870 ◽

2021 ◽

Vol 13 (7) ◽

pp. 3870

Author(s):

Mehrbakhsh Nilashi ◽

Shahla Asadi ◽

Rabab Ali Abumalloh ◽

Sarminah Samad ◽

Fahad Ghabban ◽

...

Keyword(s):

Performance Assessment ◽

Prediction Accuracy ◽

Prediction Models ◽

Sustainability Assessment ◽

Regression Trees ◽

Machine Learning Techniques ◽

Coefficient Of Determination ◽

Sustainability Performance ◽

Learning Techniques ◽

Self Organizing

This study aims to develop a new approach based on machine learning techniques to assess sustainability performance. Two main dimensions of sustainability, ecological sustainability, and human sustainability, were considered in this study. A set of sustainability indicators was used, and the research method in this study was developed using cluster analysis and prediction learning techniques. A Self-Organizing Map (SOM) was applied for data clustering, while Classification and Regression Trees (CART) were applied to assess sustainability performance. The proposed method was evaluated through Sustainability Assessment by Fuzzy Evaluation (SAFE) dataset, which comprises various indicators of sustainability performance in 128 countries. Eight clusters from the data were found through the SOM clustering technique. A prediction model was found in each cluster through the CART technique. In addition, an ensemble of CART was constructed in each cluster of SOM to increase the prediction accuracy of CART. All prediction models were assessed through the adjusted coefficient of determination approach. The results demonstrated that the prediction accuracy values were high in all CART models. The results indicated that the method developed by ensembles of CART and clustering provide higher prediction accuracy than individual CART models. The main advantage of integrating the proposed method is its ability to automate decision rules from big data for prediction models. The method proposed in this study could be implemented as an effective tool for sustainability performance assessment.

Download Full-text

Autism Spectrum Disorder Identification Using Polynomial Distribution based Convolutional Neural Network

NeuroQuantology ◽

10.14704/nq.2021.19.2.nq21013 ◽

2021 ◽

Vol 19 (2) ◽

pp. 19-30

Author(s):

G. Nagarajan ◽

Dr.A. Mahabub Basha ◽

R. Poornima

Keyword(s):

Neural Network ◽

Prediction Models ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Machine Learning Techniques ◽

Hybrid Technique ◽

Learning Techniques ◽

Chicken Swarm Optimization ◽

Sensitivity Specificity ◽

Polynomial Distribution

One main psychiatric disorder found in humans is ASD (Autistic Spectrum Disorder). The disease manifests in a mental disorder that restricts humans from communications, language, speech in terms of their individual abilities. Even though its cure is complex and literally impossible, its early detection is required for mitigating its intensity. ASD does not have a pre-defined age for affecting humans. A system for effectively predicting ASD based on MLTs (Machine Learning Techniques) is proposed in this work. Hybrid APMs (Autism Prediction Models) combining multiple techniques like RF (Random Forest), CART (Classification and Regression Trees), RF-ID3 (RF-Iterative Dichotomiser 3) perform well, but face issues in memory usage, execution times and inadequate feature selections. Taking these issues into account, this work overcomes these hurdles in this proposed work with a hybrid technique that combines MCSO (Modified Chicken Swarm Optimization) and PDCNN (Polynomial Distribution based Convolution Neural Network) algorithms for its objective. The proposed scheme’s experimental results prove its higher levels of accuracy, precision, sensitivity, specificity, FPRs (False Positive Rates) and lowered time complexity when compared to other methods.

Download Full-text

Survival prediction models since liver transplantation - comparisons between Cox models and machine learning techniques

10.21203/rs.3.rs-22670/v3 ◽

2020 ◽

Author(s):

Georgios Kantidakis ◽

Hein Putter ◽

Carlo Lancia ◽

Jacob de Boer ◽

Andries E Braat ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Liver Transplantation ◽

Prediction Models ◽

Machine Learning Techniques ◽

Brier Score ◽

Survival Prediction ◽

Cox Models ◽

Learning Techniques ◽

Random Survival Forest

Abstract Background: Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians.Methods: In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques.Results: Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years.Conclusion: In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables.

Download Full-text

Machine Learning Frameworks in Cancer Detection

E3S Web of Conferences ◽

10.1051/e3sconf/202129701073 ◽

2021 ◽

Vol 297 ◽

pp. 01073

Author(s):

Sabyasachi Pramanik ◽

K. Martin Sagayam ◽

Om Prakash Jena

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Cancer Development ◽

Support Vector ◽

Learning Approaches ◽

Learning Techniques ◽

Fact Finding ◽

Risk Of Cancer

Cancer has been described as a diverse illness with several distinct subtypes that may occur simultaneously. As a result, early detection and forecast of cancer types have graced essentially in cancer fact-finding methods since they may help to improve the clinical treatment of cancer survivors. The significance of categorizing cancer suffers into higher or lower-threat categories has prompted numerous fact-finding associates from the bioscience and genomics field to investigate the utilization of machine learning (ML) algorithms in cancer diagnosis and treatment. Because of this, these methods have been used with the goal of simulating the development and treatment of malignant diseases in humans. Furthermore, the capacity of machine learning techniques to identify important characteristics from complicated datasets demonstrates the significance of these technologies. These technologies include Bayesian networks and artificial neural networks, along with a number of other approaches. Decision Trees and Support Vector Machines which have already been extensively used in cancer research for the creation of predictive models, also lead to accurate decision making. The application of machine learning techniques may undoubtedly enhance our knowledge of cancer development; nevertheless, a sufficient degree of validation is required before these approaches can be considered for use in daily clinical practice. An overview of current machine learning approaches utilized in the simulation of cancer development is presented in this paper. All of the supervised machine learning approaches described here, along with a variety of input characteristics and data samples, are used to build the prediction models. In light of the increasing trend towards the use of machine learning methods in biomedical research, we offer the most current papers that have used these approaches to predict risk of cancer or patient outcomes in order to better understand cancer.

Download Full-text

Survival prediction models since liver transplantation - comparisons between Cox models and machine learning techniques

BMC Medical Research Methodology ◽

10.1186/s12874-020-01153-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Georgios Kantidakis ◽

Hein Putter ◽

Carlo Lancia ◽

Jacob de Boer ◽

Andries E. Braat ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Neural Networks ◽

Liver Transplantation ◽

Prediction Models ◽

Machine Learning Techniques ◽

Brier Score ◽

Cox Models ◽

Learning Techniques ◽

Random Survival Forest

Abstract Background Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians. Methods In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques. Results Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years. Conclusion In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables. Trial registration Retrospective data were provided by the Scientific Registry of Transplant Recipients under Data Use Agreement number 9477 for analysis of risk factors after liver transplantation.

Download Full-text

Computer-aided prediction and design of IL-6 inducing peptides: IL-6 plays a crucial role in COVID-19

Briefings in Bioinformatics ◽

10.1093/bib/bbaa259 ◽

2020 ◽

Cited By ~ 2

Author(s):

Anjali Dhall ◽

Sumeet Patiyal ◽

Neelam Sharma ◽

Salman Sadullah Usmani ◽

Gajendra P S Raghava

Keyword(s):

Scientific Community ◽

Prediction Models ◽

Vital Role ◽

Machine Learning Techniques ◽

Validation Dataset ◽

Independent Validation ◽

Immune Epitope ◽

Learning Techniques ◽

Wide Range ◽

Immune Epitope Database

Abstract Interleukin 6 (IL-6) is a pro-inflammatory cytokine that stimulates acute phase responses, hematopoiesis and specific immune reactions. Recently, it was found that the IL-6 plays a vital role in the progression of COVID-19, which is responsible for the high mortality rate. In order to facilitate the scientific community to fight against COVID-19, we have developed a method for predicting IL-6 inducing peptides/epitopes. The models were trained and tested on experimentally validated 365 IL-6 inducing and 2991 non-inducing peptides extracted from the immune epitope database. Initially, 9149 features of each peptide were computed using Pfeature, which were reduced to 186 features using the SVC-L1 technique. These features were ranked based on their classification ability, and the top 10 features were used for developing prediction models. A wide range of machine learning techniques has been deployed to develop models. Random Forest-based model achieves a maximum AUROC of 0.84 and 0.83 on training and independent validation dataset, respectively. We have also identified IL-6 inducing peptides in different proteins of SARS-CoV-2, using our best models to design vaccine against COVID-19. A web server named as IL-6Pred and a standalone package has been developed for predicting, designing and screening of IL-6 inducing peptides (https://webs.iiitd.edu.in/raghava/il6pred/).

Download Full-text

Machine Learning for Emergency Department Management

International Journal of Information Systems in the Service Sector ◽

10.4018/ijisss.2019070102 ◽

2019 ◽

Vol 11 (3) ◽

pp. 19-36 ◽

Cited By ~ 1

Author(s):

Sofia Benbelkacem ◽

Farid Kadri ◽

Baghdad Atmani ◽

Sondès Chaabane

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Length Of Stay ◽

Prediction Models ◽

Pediatric Emergency ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Emergency Department Management ◽

Increasing Demand ◽

Set Up

Nowadays, emergency department services are confronted to an increasing demand. This situation causes emergency department overcrowding which often increases the length of stay of patients and leads to strain situations. To overcome this issue, emergency department managers must predict the length of stay. In this work, the researchers propose to use machine learning techniques to set up a methodology that supports the management of emergency departments (EDs). The target of this work is to predict the length of stay of patients in the ED in order to prevent strain situations. The experiments were carried out on a real database collected from the pediatric emergency department (PED) in Lille regional hospital center, France. Different machine learning techniques have been used to build the best prediction models. The results seem better with Naive Bayes, C4.5 and SVM methods. In addition, the models based on a subset of attributes proved to be more efficient than models based on the set of attributes.

Download Full-text