Verifying Robustness of Gradient Boosted Models

Gradient boosted models are a fundamental machine learning technique. Robustness to small perturbations of the input is an important quality measure for machine learning models, but the literature lacks a method to prove the robustness of gradient boosted models.This work introduces VERIGB, a tool for quantifying the robustness of gradient boosted models. VERIGB encodes the model and the robustness property as an SMT formula, which enables state of the art verification tools to prove the model’s robustness. We extensively evaluate VERIGB on publicly available datasets and demonstrate a capability for verifying large models. Finally, we show that some model configurations tend to be inherently more robust than others.

Download Full-text

Mode Choice Prediction using Machine Learning Technique for A Door-to-Door Journey in Kuantan City

Mekatronika ◽

10.15282/mekatronika.v2i1.6745 ◽

2020 ◽

Vol 2 (1) ◽

pp. 73-78

Author(s):

Nur Fahriza Mohd Ali ◽

Ahmad Farhan Mohd Sadullah ◽

Anwar P.P. Abdul Majeed ◽

Mohd Azraai Mohd Razman ◽

Rabiu Muazu Musa

Keyword(s):

Machine Learning ◽

Random Forest ◽

Mode Choice ◽

Learning Models ◽

Machine Learning Technique ◽

Travel Mode Choice ◽

Testing Data ◽

Learning Technique ◽

The City ◽

Machine Learning Models

A door-to-door journey in a public transportation system is a notable concept that is practically being promoted among users to consider public transport as an important alternative. The door-to-door journey will integrate the travel segments starting from home to destination, including all visible amenities. Users’ preferences on the time travel of these key segments are necessary to be understood. In this case, Machine Learning technique has been seen as a robust computational advancement to forecast their travel mode choice. However, the most convenient model as the best predictor is still questionable. To address this issue, we employed some pre-eminent machine learning models, specifically Random Forest (RF), Naïve Bayes (NB), Logistic Regression (LR), k-Nearest Neighbor (kNN) as well as Support Vector Machine (SVM), to compare their travel mode choice prediction performance of users in the city of Kuantan. The data collection was conducted in Kuantan City via Revealed/Stated Preferences (RPSP) Survey between 8:00 AM to 5:00 PM on weekdays. The data collected was split into a ratio of 80:20 for training and testing before evaluating them between the aforesaid models. The results depicted that the Random Forest could provide satisfactory classification accuracies for both training and testing data up to 68.3% and 61.3%, respectively, compared to the other evaluated machine learning models. In summary, Random Forest provides a good result in the training and testing data and is considered as the best predictor in this research to forecast users’ mode choice in the city of Kuantan.

Download Full-text

Breast Cancer Recurrence Prediction Model Using Machine Learning Technique: State of the Art, Challenges and Future Direction

10.1109/icrito51393.2021.9596179 ◽

2021 ◽

Author(s):

Mohan Kumar ◽

Sunil Kumar Khatri ◽

Masoud Mohammadian

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Prediction Model ◽

State Of The Art ◽

Cancer Recurrence ◽

Breast Cancer Recurrence ◽

Machine Learning Technique ◽

Recurrence Prediction ◽

Learning Technique ◽

Future Direction

Download Full-text

Effectiveness, Explainability and Reliability of Machine Meta-Learning Methods for Predicting Mortality in Patients with COVID-19: Results of the Brazilian COVID-19 Registry

10.1101/2021.11.01.21265527 ◽

2021 ◽

Author(s):

Bruno Barbosa Miranda de Paiva ◽

Polianna Delfino Pereira ◽

Claudio Moises Valiense de Andrade ◽

Virginia Mara Reis Gomes ◽

Maria Clara Pontello Barbosa Lima ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

State Of The Art ◽

Laboratory Data ◽

Machine Learning Algorithms ◽

Training Data ◽

Learning Models ◽

Learning Methods ◽

Meta Learning ◽

Machine Learning Models

Objective: To provide a thorough comparative study among state ofthe art machine learning methods and statistical methods for determining in-hospital mortality in COVID 19 patients using data upon hospital admission; to study the reliability of the predictions of the most effective methods by correlating the probability of the outcome and the accuracy of the methods; to investigate how explainable are the predictions produced by the most effective methods. Materials and Methods: De-identified data were obtained from COVID 19 positive patients in 36 participating hospitals, from March 1 to September 30, 2020. Demographic, comorbidity, clinical presentation and laboratory data were used as training data to develop COVID 19 mortality prediction models. Multiple machine learning and traditional statistics models were trained on this prediction task using a folded cross validation procedure, from which we assessed performance and interpretability metrics. Results: The Stacking of machine learning models improved over the previous state of the art results by more than 26% in predicting the class of interest (death), achieving 87.1% of AUROC and macroF1 of 73.9%. We also show that some machine learning models can be very interpretable and reliable, yielding more accurate predictions while providing a good explanation for the why. Conclusion: The best results were obtained using the meta learning ensemble model Stacking. State of the art explainability techniques such as SHAP values can be used to draw useful insights into the patterns learned by machine-learning algorithms. Machine learning models can be more explainable than traditional statistics models while also yielding highly reliable predictions. Key words: COVID-19; prognosis; prediction model; machine learning

Download Full-text

The State of the Art in Enhancing Trust in Machine Learning Models with the Use of Visualizations

Computer Graphics Forum ◽

10.1111/cgf.14034 ◽

2020 ◽

Vol 39 (3) ◽

pp. 713-756 ◽

Cited By ~ 1

Author(s):

A. Chatzimparmpas ◽

R. M. Martins ◽

I. Jusufi ◽

K. Kucher ◽

F. Rossi ◽

...

Keyword(s):

Machine Learning ◽

State Of The Art ◽

The State ◽

Learning Models ◽

Machine Learning Models

Download Full-text

A Systematised State-of-the-Art Review of Machine Learning Models to Aid Clinical Decision-Making in Epilepsy

SSRN Electronic Journal ◽

10.2139/ssrn.3541140 ◽

2020 ◽

Author(s):

Edward Jonathan Han-Burgess ◽

Richard J. Stevens

Keyword(s):

Machine Learning ◽

Decision Making ◽

Clinical Decision Making ◽

State Of The Art ◽

Clinical Decision ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Evaluation of statistical and machine learning models for time series prediction: Identifying the state-of-the-art and the best conditions for the use of each model

Information Sciences ◽

10.1016/j.ins.2019.01.076 ◽

2019 ◽

Vol 484 ◽

pp. 302-337 ◽

Cited By ~ 17

Author(s):

Antonio Rafael Sabino Parmezan ◽

Vinicius M.A. Souza ◽

Gustavo E.A.P.A. Batista

Keyword(s):

Machine Learning ◽

Time Series ◽

State Of The Art ◽

Time Series Prediction ◽

The State ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Decoding defect statistics from diffractograms via machine learning

npj Computational Materials ◽

10.1038/s41524-021-00539-z ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Cody Kunka ◽

Apaar Shanker ◽

Elton Y. Chen ◽

Surya R. Kalidindi ◽

Rémi Dingreville

Keyword(s):

Machine Learning ◽

Spatial Distribution ◽

State Of The Art ◽

Structural Information ◽

Atomistic Simulations ◽

Learning Models ◽

Machine Learning Model ◽

Electron Diffractograms ◽

Material Information ◽

Machine Learning Models

AbstractDiffraction techniques can powerfully and nondestructively probe materials while maintaining high resolution in both space and time. Unfortunately, these characterizations have been limited and sometimes even erroneous due to the difficulty of decoding the desired material information from features of the diffractograms. Currently, these features are identified non-comprehensively via human intuition, so the resulting models can only predict a subset of the available structural information. In the present work we show (i) how to compute machine-identified features that fully summarize a diffractogram and (ii) how to employ machine learning to reliably connect these features to an expanded set of structural statistics. To exemplify this framework, we assessed virtual electron diffractograms generated from atomistic simulations of irradiated copper. When based on machine-identified features rather than human-identified features, our machine-learning model not only predicted one-point statistics (i.e. density) but also a two-point statistic (i.e. spatial distribution) of the defect population. Hence, this work demonstrates that machine-learning models that input machine-identified features significantly advance the state of the art for accurately and robustly decoding diffractograms.

Download Full-text

Integrating a Low-Cost Electronic Nose and Machine Learning Modelling to Assess Coffee Aroma Profile and Intensity

Sensors ◽

10.3390/s21062016 ◽

2021 ◽

Vol 21 (6) ◽

pp. 2016

Author(s):

Claudia Gonzalez Viejo ◽

Eden Tongson ◽

Sigfredo Fuentes

Keyword(s):

Machine Learning ◽

Electronic Nose ◽

Low Cost ◽

High Accuracy ◽

Quality Trait ◽

Learning Models ◽

High Quality ◽

Aroma Profile ◽

Important Quality ◽

Machine Learning Models

Aroma is one of the main attributes that consumers consider when appreciating and selecting a coffee; hence it is considered an important quality trait. However, the most common methods to assess aroma are based on expensive equipment or human senses through sensory evaluation, which is time-consuming and requires highly trained assessors to avoid subjectivity. Therefore, this study aimed to estimate the coffee intensity and aromas using a low-cost and portable electronic nose (e-nose) and machine learning modeling. For this purpose, triplicates of six commercial coffee samples with different intensity levels were used for this study. Two machine learning models were developed based on artificial neural networks using the data from the e-nose as inputs to (i) classify the samples into low, medium, and high-intensity (Model 1) and (ii) to predict the relative abundance of 45 different aromas (Model 2). Results showed that it is possible to estimate the intensity of coffees with high accuracy (98%; Model 1), as well as to predict the specific aromas obtaining a high correlation coefficient (R = 0.99), and no under- or over-fitting of the models were detected. The proposed contactless, nondestructive, rapid, reliable, and low-cost method showed to be effective in evaluating volatile compounds in coffee, which is a potential technique to be applied within all stages of the production process to detect any undesirable characteristics on–time and ensure high-quality products.

Download Full-text

Application of Machine Learning in Rhinology: A State of the Art Review

Korean Journal of Otorhinolaryngology - Head and Neck Surgery ◽

10.3342/kjorl-hns.2020.00633 ◽

2020 ◽

Vol 63 (8) ◽

pp. 341-349

Author(s):

Myeong Sang Yu

Keyword(s):

Machine Learning ◽

Big Data ◽

Deep Learning ◽

Medical Records ◽

State Of The Art ◽

Machine Learning Techniques ◽

Machine Learning Technique ◽

Learning Techniques ◽

Learning Technique ◽

Machine Learning Applications

The revolutionary development of artificial intelligence (AI) such as machine learning and deep learning have been one of the most important technology in many parts of industry, and also enhance huge changes in health care. The big data obtained from electrical medical records and digitalized images accelerated the application of AI technologies in medical fields. Machine learning techniques can deal with the complexity of big data which is difficult to apply traditional statistics. Recently, the deep learning techniques including convolutional neural network have been considered as a promising machine learning technique in medical imaging applications. In the era of precision medicine, otolaryngologists need to understand the potentialities, pitfalls and limitations of AI technology, and try to find opportunities to collaborate with data scientists. This article briefly introduce the basic concepts of machine learning and its techniques, and reviewed the current works on machine learning applications in the field of otolaryngology and rhinology.

Download Full-text

A Human-AI Loop Approach for Joint Keyword Discovery and Expectation Estimation in Micropost Event Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5626 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2451-2458

Author(s):

Akansha Bhardwaj ◽

Jie Yang ◽

Philippe Cudré-Mauroux

Keyword(s):

Machine Learning ◽

Real World ◽

Event Detection ◽

State Of The Art ◽

Regularization Parameter ◽

Learning Models ◽

Training Process ◽

Model Training ◽

Real World Datasets ◽

Machine Learning Models

Microblogging platforms such as Twitter are increasingly being used in event detection. Existing approaches mainly use machine learning models and rely on event-related keywords to collect the data for model training. These approaches make strong assumptions on the distribution of the relevant microposts containing the keyword – referred to as the expectation of the distribution – and use it as a posterior regularization parameter during model training. Such approaches are, however, limited as they fail to reliably estimate the informativeness of a keyword and its expectation for model training. This paper introduces a Human-AI loop approach to jointly discover informative keywords for model training while estimating their expectation. Our approach iteratively leverages the crowd to estimate both keyword-specific expectation and the disagreement between the crowd and the model in order to discover new keywords that are most beneficial for model training. These keywords and their expectation not only improve the resulting performance but also make the model training process more transparent. We empirically demonstrate the merits of our approach, both in terms of accuracy and interpretability, on multiple real-world datasets and show that our approach improves the state of the art by 24.3%.

Download Full-text