An Explainable Machine Learning Model for Material Backorder Prediction in Inventory Management

Global competition among businesses imposes a more effective and low-cost supply chain allowing firms to provide products at a desired quality, quantity, and time, with lower production costs. The latter include holding cost, ordering cost, and backorder cost. Backorder occurs when a product is temporarily unavailable or out of stock and the customer places an order for future production and shipment. Therefore, stock unavailability and prolonged delays in product delivery will lead to additional production costs and unsatisfied customers, respectively. Thus, it is of high importance to develop models that will effectively predict the backorder rate in an inventory system with the aim of improving the effectiveness of the supply chain and, consequentially, the performance of the company. However, traditional approaches in the literature are based on stochastic approximation, without incorporating information from historical data. To this end, machine learning models should be employed for extracting knowledge of large historical data to develop predictive models. Therefore, to cover this need, in this study, the backorder prediction problem was addressed. Specifically, various machine learning models were compared for solving the binary classification problem of backorder prediction, followed by model calibration and a post-hoc explainability based on the SHAP model to identify and interpret the most important features that contribute to material backorder. The results showed that the RF, XGB, LGBM, and BB models reached an AUC score of 0.95, while the best-performing model was the LGBM model after calibration with the Isotonic Regression method. The explainability analysis showed that the inventory stock of a product, the volume of products that can be delivered, the imminent demand (sales), and the accurate prediction of the future demand can significantly contribute to the correct prediction of backorders.

Download Full-text

Benchmarking machine learning models for the analysis of genetic data using FRESA.CAD Binary Classification Benchmarking

10.1101/733675 ◽

2019 ◽

Author(s):

Javier de Velasco Oriol ◽

Antonio Martinez-Torteya ◽

Victor Trevino ◽

Israel Alanis ◽

Edgar E. Vallejo ◽

...

Keyword(s):

Machine Learning ◽

Model Selection ◽

Binary Classification ◽

Genetic Data ◽

R Package ◽

Learning Models ◽

Classification Problems ◽

Machine Learning Methods ◽

Computational Perspective ◽

Machine Learning Models

AbstractBackgroundMachine learning models have proven to be useful tools for the analysis of genetic data. However, with the availability of a wide variety of such methods, model selection has become increasingly difficult, both from the human and computational perspective.ResultsWe present the R package FRESA.CAD Binary Classification Benchmarking that performs systematic comparisons between a collection of representative machine learning methods for solving binary classification problems on genetic datasets.ConclusionsFRESA.CAD Binary Benchmarking demonstrates to be a useful tool over a variety of binary classification problems comprising the analysis of genetic data showing both quantitative and qualitative advantages over similar packages.

Download Full-text

Predicting Anesthetic Infusion Events Using Machine Learning

10.21203/rs.3.rs-783161/v1 ◽

2021 ◽

Author(s):

Naoki Miyaguchi ◽

Koh Takeuchi ◽

Hisashi Kashima ◽

Mizuki Morita ◽

Hiroshi Morimatsu

Keyword(s):

Machine Learning ◽

Flow Rate ◽

Short Term Memory ◽

Binary Classification ◽

Classification Problem ◽

Clinical Findings ◽

Support Vector ◽

Learning Models ◽

Continuous Administration ◽

Machine Learning Models

Abstract Recently, research has been conducted to automatically control anesthesia using machine learning, with the aim of alleviating the shortage of anesthesiologists. In this study, we address the problem of predicting decisions made by anesthesiologists during surgery using machine learning; specifically, we formulate a decision making problem by increasing the flow rate at each time point in the continuous administration of analgesic remifentanil as a supervised binary classification problem. The experiments were conducted to evaluate the prediction performance using six machine learning models: logistic regression, support vector machine, random forest, LightGBM, artificial neural network, and long short-term memory (LSTM), using 210 case data collected during actual surgeries. The results demonstrated that when predicting the future increase in flow rate of remifentanil after 1 min, the model using LSTM was able to predict with scores of 0.659 for sensitivity, 0.732 for specificity, and 0.753 for ROC-AUC; this demonstrates the potential to predict the decisions made by anesthesiologists using machine learning. Furthermore, we examined the importance and contribution of the features of each model using shapley additive explanations—a method for interpreting predictions made by machine learning models. The trends indicated by the results were partially consistent with known clinical findings.

Download Full-text

Detecting Arsenic Contamination Using Satellite Imagery and Machine Learning

Toxics ◽

10.3390/toxics9120333 ◽

2021 ◽

Vol 9 (12) ◽

pp. 333

Author(s):

Ayush Agrawal ◽

Mark R. Petersen

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Mean Squared Error ◽

Binary Classification ◽

Arsenic Concentration ◽

Arsenic Contamination ◽

Hyperspectral Data ◽

Detection Methods ◽

Learning Models ◽

Machine Learning Models

Arsenic, a potent carcinogen and neurotoxin, affects over 200 million people globally. Current detection methods are laborious, expensive, and unscalable, being difficult to implement in developing regions and during crises such as COVID-19. This study attempts to determine if a relationship exists between soil’s hyperspectral data and arsenic concentration using NASA’s Hyperion satellite. It is the first arsenic study to use satellite-based hyperspectral data and apply a classification approach. Four regression machine learning models are tested to determine this correlation in soil with bare land cover. Raw data are converted to reflectance, problematic atmospheric influences are removed, characteristic wavelengths are selected, and four noise reduction algorithms are tested. The combination of data augmentation, Genetic Algorithm, Second Derivative Transformation, and Random Forest regression (R2=0.840 and normalized root mean squared error (re-scaled to [0,1]) = 0.122) shows strong correlation, performing better than past models despite using noisier satellite data (versus lab-processed samples). Three binary classification machine learning models are then applied to identify high-risk shrub-covered regions in ten U.S. states, achieving strong accuracy (=0.693) and F1-score (=0.728). Overall, these results suggest that such a methodology is practical and can provide a sustainable alternative to arsenic contamination detection.

Download Full-text

Machine-Learning Models for Sales Time Series Forecasting

Data ◽

10.3390/data4010015 ◽

2019 ◽

Vol 4 (1) ◽

pp. 15 ◽

Cited By ~ 14

Author(s):

Bohdan Pavlyshenko

Keyword(s):

Machine Learning ◽

Time Series ◽

Predictive Models ◽

Historical Data ◽

Predictive Analytics ◽

New Product ◽

Time Series Forecasting ◽

Learning Models ◽

Learning Generalization ◽

Machine Learning Models

In this paper, we study the usage of machine-learning models for sales predictive analytics. The main goal of this paper is to consider main approaches and case studies of using machine learning for sales forecasting. The effect of machine-learning generalization has been considered. This effect can be used to make sales predictions when there is a small amount of historical data for specific sales time series in the case when a new product or store is launched. A stacking approach for building regression ensemble of single models has been studied. The results show that using stacking techniques, we can improve the performance of predictive models for sales time series forecasting.

Download Full-text

Comparison of Multi-class and Binary Classification Machine Learning Models in Identifying Strong Gravitational Lenses

Publications of the Astronomical Society of the Pacific ◽

10.1088/1538-3873/ab747b ◽

2020 ◽

Vol 132 (1010) ◽

pp. 044501 ◽

Cited By ~ 1

Author(s):

Hossen Teimoorinia ◽

Robert D. Toyonaga ◽

Sebastien Fabbro ◽

Connor Bottrell

Keyword(s):

Machine Learning ◽

Binary Classification ◽

Gravitational Lenses ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Quantitative Interpretation Explains Machine Learning Models for Chemical Reaction Prediction and Uncovers Bias

10.26434/chemrxiv.13061402 ◽

2020 ◽

Author(s):

David Peter Kovacs ◽

William McCorkindale ◽

Alpha Lee

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Model Performance ◽

Training Data ◽

Correct Prediction ◽

Learning Models ◽

Reaction Prediction ◽

Wrong Reason ◽

Realistic Assessment ◽

Machine Learning Models

<div><div><div><p>Organic synthesis remains a stumbling block in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salient chemistry, nor is it clear which training data they are relying on to reach a prediction. This opaqueness hinders both model developers and users. In this paper, we quantitatively interpret the Molecular Transformer, the state-of-the-art model for reaction prediction. We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Furthermore, we demonstrate how to retrieve evidence for predicted reaction outcomes, and understand counterintuitive predictions by scrutinising the data. Additionally, we identify ”Clever Hans” predictions where the correct prediction is reached for the wrong reason due to dataset bias. We present a new debiased dataset that provides a more realistic assessment of model performance, which we propose as the new standard benchmark for comparing reaction prediction models.</p></div></div></div>

Download Full-text

Comparative Study of Real Time Machine Learning Models for Stock Prediction through Streaming Data

JUCS - Journal of Universal Computer Science ◽

10.3897/jucs.2020.059 ◽

2020 ◽

Vol 26 (9) ◽

pp. 1128-1147

Author(s):

Ranjan Behera ◽

Sushree Das ◽

Santanu Rath ◽

Sanjay Misra ◽

Robertas Damasevicius

Keyword(s):

Machine Learning ◽

Real Time ◽

Historical Data ◽

Streaming Data ◽

Support Vector ◽

Learning Models ◽

Stock Prediction ◽

The Real ◽

Lambda Architecture ◽

Machine Learning Models

Stock prediction is one of the emerging applications in the field of data science which help the companies to make better decision strategy. Machine learning models play a vital role in the field of prediction. In this paper, we have proposed various machine learning models which predicts the stock price from the real-time streaming data. Streaming data has been a potential source for real-time prediction which deals with continuous ow of data having information from various sources like social networking websites, server logs, mobile phone applications, trading oors etc. We have adopted the distributed platform, Spark to analyze the streaming data collected from two different sources as represented in two case studies in this paper. The first case study is based on stock prediction from the historical data collected from Google finance websites through NodeJs and the second one is based on the sentiment analysis of Twitter collected through Twitter API available in Stanford NLP package. Several researches have been made in developing models for stock prediction based on static data. In this work, an effort has been made to develop scalable, fault tolerant models for stock prediction from the real-time streaming data. The Proposed model is based on a distributed architecture known as Lambda architecture. The extensive comparison is made between actual and predicted output for different machine learning models. Support vector regression is found to have better accuracy as compared to other models. The historical data is considered as a ground truth data for validation.

Download Full-text

Goods and Activities Tracking Through Supply Chain Network Using Machine Learning Models

10.1007/978-3-030-85874-2_1 ◽

2021 ◽

pp. 3-12

Author(s):

Lahcen Tamym ◽

Ahmed Nait Sidi Moh ◽

Lyes Benyoucef ◽

Moulay Driss El Ouadghiri

Keyword(s):

Machine Learning ◽

Supply Chain ◽

Supply Chain Network ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Prediction of the chemical context for Buchwald-Hartwig coupling reactions

10.33774/chemrxiv-2021-87hqt ◽

2021 ◽

Author(s):

Samuel Genheden ◽

Agnes Mårdh ◽

Gustav Lahti ◽

Ola Engkvist ◽

Simon Olsson ◽

...

Keyword(s):

Machine Learning ◽

Historical Data ◽

Temporal Characteristic ◽

Coupling Reactions ◽

Learning Models ◽

Careful Planning ◽

Label Data ◽

Reaction Data ◽

The Individual ◽

Machine Learning Models

We present machine learning models for predicting the chemical context for Buchwald-Hartwig coupling reactions. Using reaction data from in-house electronic lab notebooks, we train two models: one based on single-label data and one based on multi-label data. Both models show excellent top-3 accuracy around 90%, which suggests strong predictivity. There seems to be an advantage of including multi-label data because the multi-label model shows higher accuracy and better sensitivity for the individual contexts than the single-label model. Although the models are performant, we also show that such models need to be re-trained periodically. There is a strong temporal characteristic to the usage of different contexts. Therefore, a model trained on historical data will decrease in usefulness with time as newer and better contexts emerge and replace older ones. We hypothesize that these significant transitions in the context-use will likely affect any model predicting chemical contexts trained on historical data. Consequently, training such models warrants careful planning of what data is used for training and how often the model needs to be re-trained.

Download Full-text

The use of machine learning methods in the development of nasal dosage forms with cerebroprotective action

Current issues in pharmacy and medicine science and practice ◽

10.14739/2409-2932.2021.2.232053 ◽

2021 ◽

Vol 14 (2) ◽

pp. 232-238

Author(s):

B. S. Burlaka ◽

I. F. Bielenichev

Keyword(s):

Machine Learning ◽

In Silico ◽

High Reliability ◽

Binary Classification ◽

Dosage Forms ◽

Training Dataset ◽

Learning Models ◽

Pharmaceutical Ingredients ◽

Machine Learning Models ◽

Rational Composition

In order to save resource of active pharmaceutical ingredients and excipients, in the early stages of research, when planning an experiment, it is advisable to use data of the predicted and experimental physicochemical properties stored in different aggregation databases. The information found will reduce the time for composition development and for technology processing. However, the variety of active compounds characteristics and excipients is not always reflected in these services. Recently, machine learning models have been widely used in various scientific fields; they allow to obtain predictions with high reliability. Given the above, it is relevant and promising to develop models of machine learning to predict the presence of pharmaceutical incompatibilities in the formulation of nasal dosage forms. The aim of the study is to develop models of machine learning for in silico forecast of the rational composition of nasal dosage forms with cerebroprotective action. Materials and methods. A dataset, containing data on compounds (active and auxiliary) and characteristics on the presence or absence of interaction (pharmaceutical incompatibility), was used as material. Training datasets were filled by content analysis of PubMed library data (pubmed.ncbi.nlm.nih.gov) manually, by keywords “pharmaceutical incompatibilities”, “physico-chemical compatibility”, “incompatible excipients”) for the last 10 years. The resulting dataset comprises 1185 lines. The methods employed were a set of methods for binary classification of machine learning (pycaret.org) using the programming language Python 3.8 (python.org) in the package management environment Miniconda (conda.io). Pipeline programming was performed using Jupyter notebook package (jupyter.org). The generation of MACCS (Molecular ACCess System keys) in the training dataset was performed using RDKit package (rdkit.org). Specifications of the simplified representation of molecules in the input line (SMILES), in automatic mode, were searched using PubChem service (pubchem.ncbi.nlm.nih.gov). Results. The obtained data allowed to choose two perspective models of machine learning of binary classification, whose quality was checked on a dataset for verification. Statistical evaluations of the selected models indicate a high probability of in silico prognosis for the presence or absence of pharmaceutical incompatibilities in the development of nasal formulations of cerebroprotective dosage forms. They are posted on the web server of the expert system ExpSys Nasalia (nasalia.zsmu.zp.ua) in the calculations section. Conclusions. As a result of our research, we have developed machine learning models for in silico prediction of the rational composition of nasal dosage forms with cerebroprotective action. Confirmation of the quality of the pharmaceutical incompatibilities prediction, using the developed models, is checked on a dataset for check. The statistical indicators of the tree_blender (AUC 0.9521, F1 0.9747, MCC 0.9094) and boost_blender (AUC 0.9593, F1 0.9821, MCC 0.9352) models were obtained. The use of machine learning models in pharmaceutical development will contribute to resource conservation and optimization of the composition of the formulation.

Download Full-text