Predicting Out-of-Stock Using Machine Learning: An Application in a Retail Packaged Foods Manufacturing Company

For decades, Out-of-Stock (OOS) events have been a problem for retailers and manufacturers. In grocery retailing, an OOS event is used to characterize the condition in which customers do not find a certain commodity while attempting to buy it. This paper focuses on addressing this problem from a manufacturer’s perspective, conducting a case study in a retail packaged foods manufacturing company located in Latin America. We developed two machine learning based systems to detect OOS events automatically. The first is based on a single Random Forest classifier with balanced data, and the second is an ensemble of six different classification algorithms. We used transactional data from the manufacturer information system and physical audits. The novelty of this work is our use of new predictor variables of OOS events. The system was successfully implemented and tested in a retail packaged foods manufacturer company. By incorporating the new predictive variables in our Random Forest and Ensemble classifier, we were able to improve their system’s predictive power. In particular, the Random Forest classifier presented the best performance in a real-world setting, achieving a detection precision of 72% and identifying 68% of the total OOS events. Finally, the incorporation of our new predictor variables allowed us to improve the performance of the Random Forest by 0.24 points in the F-measure.

Download Full-text

Amide proton transfer weighted (APTw) imaging based radiomics allows for the differentiation of gliomas from metastases

Scientific Reports ◽

10.1038/s41598-021-85168-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Elisabeth Sartoretti ◽

Thomas Sartoretti ◽

Michael Wyss ◽

Carolin Reischauer ◽

Luuk van Smoorenburg ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Brain Tumors ◽

Proton Transfer ◽

Multilayer Perceptron ◽

Random Forest Classifier ◽

Amide Proton ◽

Low Grade ◽

Who Grade ◽

Amide Proton Transfer

AbstractWe sought to evaluate the utility of radiomics for Amide Proton Transfer weighted (APTw) imaging by assessing its value in differentiating brain metastases from high- and low grade glial brain tumors. We retrospectively identified 48 treatment-naïve patients (10 WHO grade 2, 1 WHO grade 3, 10 WHO grade 4 primary glial brain tumors and 27 metastases) with either primary glial brain tumors or metastases who had undergone APTw MR imaging. After image analysis with radiomics feature extraction and post-processing, machine learning algorithms (multilayer perceptron machine learning algorithm; random forest classifier) with stratified tenfold cross validation were trained on features and were used to differentiate the brain neoplasms. The multilayer perceptron achieved an AUC of 0.836 (receiver operating characteristic curve) in differentiating primary glial brain tumors from metastases. The random forest classifier achieved an AUC of 0.868 in differentiating WHO grade 4 from WHO grade 2/3 primary glial brain tumors. For the differentiation of WHO grade 4 tumors from grade 2/3 tumors and metastases an average AUC of 0.797 was achieved. Our results indicate that the use of radiomics for APTw imaging is feasible and the differentiation of primary glial brain tumors from metastases is achievable with a high degree of accuracy.

Download Full-text

Diagnosing asthma and chronic obstructive pulmonary disease with machine learning

Health Informatics Journal ◽

10.1177/1460458217723169 ◽

2017 ◽

Vol 25 (3) ◽

pp. 811-827 ◽

Cited By ~ 15

Author(s):

Dimitris Spathis ◽

Panayiotis Vlamos

Keyword(s):

Machine Learning ◽

Chronic Obstructive Pulmonary Disease ◽

Random Forest ◽

Pulmonary Disease ◽

Clinical Decision Support Systems ◽

Clinical Decision ◽

Forced Expiratory Volume ◽

Random Forest Classifier ◽

Chronic Obstructive ◽

Obstructive Pulmonary Disease

This study examines the clinical decision support systems in healthcare, in particular about the prevention, diagnosis and treatment of respiratory diseases, such as Asthma and chronic obstructive pulmonary disease. The empirical pulmonology study of a representative sample (n = 132) attempts to identify the major factors that contribute to the diagnosis of these diseases. Machine learning results show that in chronic obstructive pulmonary disease’s case, Random Forest classifier outperforms other techniques with 97.7 per cent precision, while the most prominent attributes for diagnosis are smoking, forced expiratory volume 1, age and forced vital capacity. In asthma’s case, the best precision, 80.3 per cent, is achieved again with the Random Forest classifier, while the most prominent attribute is MEF2575.

Download Full-text

Design of experiments and response surface methodology to tune machine learning hyperparameters, with a random forest case-study

Expert Systems with Applications ◽

10.1016/j.eswa.2018.05.024 ◽

2018 ◽

Vol 109 ◽

pp. 195-205 ◽

Cited By ~ 17

Author(s):

Gustavo A. Lujan-Moreno ◽

Phillip R. Howard ◽

Omar G. Rojas ◽

Douglas C. Montgomery

Keyword(s):

Machine Learning ◽

Response Surface Methodology ◽

Random Forest ◽

Design Of Experiments ◽

Response Surface

Download Full-text

A Machine Learning Approach to Detect Student Dropout at University

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/041062021 ◽

2021 ◽

Vol 10 (6) ◽

pp. 3101-3107

Keyword(s):

Machine Learning ◽

Random Forest ◽

Dropout Rate ◽

Random Forest Classifier ◽

Drop Out ◽

Dropout Rates ◽

Learning Technology ◽

Student Dropout ◽

High Dropout Rate ◽

Academic Plan

In universities, student dropout is a major concern that reflects the university's quality. Some characteristics cause students to drop out of university. A high dropout rate of students affects the university's reputation and the student's careers in the future. Therefore, there's a requirement for student dropout analysis to enhance academic plan and management to scale back student's drop out from the university also on enhancing the standard of the upper education system. The machine learning technique provides powerful methods for the analysis and therefore the prediction of the dropout. This study uses a dataset from a university representative to develop a model for predicting student dropout. In this work, machine- learning models were used to detect dropout rates. Machine learning is being more widely used in the field of knowledge mining diagnostics. Following an examination of certain studies, we observed that dropout detection may be done using several methods. We've even used five dropout detection models. These models are Decision tree, Naïve bayes, Random Forest Classifier, SVM and KNN. We used machine-learning technology to analyze the data, and we discovered that the Random Forest classifier is highly promising for predicting dropout rates, with a training accuracy of 94% and a testing accuracy of 86%.

Download Full-text

Machine Learning Technique to Prognosis Diabetes Disease: Random Forest Classifier Approach

Advanced Computing and Intelligent Technologies - Lecture Notes in Networks and Systems ◽

10.1007/978-981-16-2164-2_19 ◽

2021 ◽

pp. 219-244

Author(s):

Prajyot Palimkar ◽

Rabindra Nath Shaw ◽

Ankush Ghosh

Keyword(s):

Machine Learning ◽

Random Forest ◽

Random Forest Classifier ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

Bidders Recommender for Public Procurement Auctions Using Machine Learning: Data Analysis, Algorithm, and Case Study with Tenders from Spain

Complexity ◽

10.1155/2020/8858258 ◽

2020 ◽

Vol 2020 ◽

pp. 1-20

Author(s):

Manuel J. García Rodríguez ◽

Vicente Rodríguez Montequín ◽

Francisco Ortega Fernández ◽

Joaquín M. Villanueva Balsera

Keyword(s):

Machine Learning ◽

Public Procurement ◽

Public Information ◽

Random Forest Classifier ◽

Procurement Auctions ◽

Machine Learning Method ◽

Test Conditions ◽

A Company ◽

Learning Data

Recommending the identity of bidders in public procurement auctions (tenders) has a significant impact in many areas of public procurement, but it has not yet been studied in depth. A bidders recommender would be a very beneficial tool because a supplier (company) can search appropriate tenders and, vice versa, a public procurement agency can discover automatically unknown companies which are suitable for its tender. This paper develops a pioneering algorithm to recommend potential bidders using a machine learning method, particularly a random forest classifier. The bidders recommender is described theoretically, so it can be implemented or adapted to any particular situation. It has been successfully validated with a case study: an actual Spanish tender dataset (free public information) which has 102,087 tenders from 2014 to 2020 and a company dataset (nonfree public information) which has 1,353,213 Spanish companies. Quantitative, graphical, and statistical descriptions of both datasets are presented. The results of the case study were satisfactory: the winning bidding company is within the recommended companies group, from 24% to 38% of the tenders, according to different test conditions and scenarios.

Download Full-text

Cholera Risk: A Machine Learning Approach Applied to Essential Climate Variables

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17249378 ◽

2020 ◽

Vol 17 (24) ◽

pp. 9378

Author(s):

Amy Marie Campbell ◽

Marie-Fanny Racault ◽

Stephen Goult ◽

Angus Laurenson

Keyword(s):

Machine Learning ◽

Random Forest ◽

Land Surface ◽

Environmental Changes ◽

Random Forest Classifier ◽

Sea Surface Salinity ◽

Learning Approach ◽

Climate Variables ◽

Surface Salinity ◽

Machine Learning Approach

Oceanic and coastal ecosystems have undergone complex environmental changes in recent years, amid a context of climate change. These changes are also reflected in the dynamics of water-borne diseases as some of the causative agents of these illnesses are ubiquitous in the aquatic environment and their survival rates are impacted by changes in climatic conditions. Previous studies have established strong relationships between essential climate variables and the coastal distribution and seasonal dynamics of the bacteria Vibrio cholerae, pathogenic types of which are responsible for human cholera disease. In this study we provide a novel exploration of the potential of a machine learning approach to forecast environmental cholera risk in coastal India, home to more than 200 million inhabitants, utilising atmospheric, terrestrial and oceanic satellite-derived essential climate variables. A Random Forest classifier model is developed, trained and tested on a cholera outbreak dataset over the period 2010–2018 for districts along coastal India. The random forest classifier model has an Accuracy of 0.99, an F1 Score of 0.942 and a Sensitivity score of 0.895, meaning that 89.5% of outbreaks are correctly identified. Spatio-temporal patterns emerged in terms of the model’s performance based on seasons and coastal locations. Further analysis of the specific contribution of each Essential Climate Variable to the model outputs shows that chlorophyll-a concentration, sea surface salinity and land surface temperature are the strongest predictors of the cholera outbreaks in the dataset used. The study reveals promising potential of the use of random forest classifiers and remotely-sensed essential climate variables for the development of environmental cholera-risk applications. Further exploration of the present random forest model and associated essential climate variables is encouraged on cholera surveillance datasets in other coastal areas affected by the disease to determine the model’s transferability potential and applicative value for cholera forecasting systems.

Download Full-text

Prediction of novel mouse TLR9 agonists using a random forest approach

BMC Molecular and Cell Biology ◽

10.1186/s12860-019-0241-0 ◽

2019 ◽

Vol 20 (S2) ◽

Author(s):

Varun Khanna ◽

Lei Li ◽

Johnson Fung ◽

Shoba Ranganathan ◽

Nikolai Petrovsky

Keyword(s):

Machine Learning ◽

Random Forest ◽

Correlation Coefficient ◽

Matthews Correlation Coefficient ◽

Learning Algorithms ◽

Ensemble Classifier ◽

Innate Immune ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Algorithm

Abstract Background Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to the considerable number of rotatable bonds in ODNs, high-throughput in silico screening for potential TLR9 activity via traditional structure-based virtual screening approaches of CpG ODNs is challenging. In the current study, we present a machine learning based method for predicting novel mouse TLR9 (mTLR9) agonists based on features including count and position of motifs, the distance between the motifs and graphically derived features such as the radius of gyration and moment of Inertia. We employed an in-house experimentally validated dataset of 396 single-stranded synthetic ODNs, to compare the results of five machine learning algorithms. Since the dataset was highly imbalanced, we used an ensemble learning approach based on repeated random down-sampling. Results Using in-house experimental TLR9 activity data we found that random forest algorithm outperformed other algorithms for our dataset for TLR9 activity prediction. Therefore, we developed a cross-validated ensemble classifier of 20 random forest models. The average Matthews correlation coefficient and balanced accuracy of our ensemble classifier in test samples was 0.61 and 80.0%, respectively, with the maximum balanced accuracy and Matthews correlation coefficient of 87.0% and 0.75, respectively. We confirmed common sequence motifs including ‘CC’, ‘GG’,‘AG’, ‘CCCG’ and ‘CGGC’ were overrepresented in mTLR9 agonists. Predictions on 6000 randomly generated ODNs were ranked and the top 100 ODNs were synthesized and experimentally tested for activity in a mTLR9 reporter cell assay, with 91 of the 100 selected ODNs showing high activity, confirming the accuracy of the model in predicting mTLR9 activity. Conclusion We combined repeated random down-sampling with random forest to overcome the class imbalance problem and achieved promising results. Overall, we showed that the random forest algorithm outperformed other machine learning algorithms including support vector machines, shrinkage discriminant analysis, gradient boosting machine and neural networks. Due to its predictive performance and simplicity, the random forest technique is a useful method for prediction of mTLR9 ODN agonists.

Download Full-text

Enhancing alpine glacial lakes detection and mapping using multi-source data and machine learning techniques

10.5194/egusphere-egu2020-21811 ◽

2020 ◽

Author(s):

Sonam Wangchuk ◽

Tobias Bolch

Keyword(s):

Machine Learning ◽

Random Forest ◽

Satellite Images ◽

Random Forest Classifier ◽

Machine Learning Techniques ◽

Glacial Lake ◽

Glacial Lakes ◽

Alpine Regions ◽

Learning Techniques ◽

Source Data

<p>An accurate detection and mapping of glacial lakes in the Alpine regions such as the Himalayas, the Alps and the Andes are challenged by many factors. These factors include 1) a small size of glacial lakes, 2) cloud cover in optical satellite images, 3) cast shadows from mountains and clouds, 4) seasonal snow in satellite images, 5) varying degree of turbidity amongst glacial lakes, and 6) frozen glacial lake surface. In our study, we propose a fully automated approach, that overcomes most of the above mentioned challenges, to detect and map glacial lakes accurately using multi-source data and machine learning techniques such as the random forest classifier algorithm. The multi-source data are from the Sentinel-1 Synthetic Aperture Radar data (radar backscatter), the Sentinel-2 multispectral instrument data (NDWI), and the SRTM digital elevation model (slope). We use these data as inputs for the rule-based segmentation of potential glacial lakes, where decision rules are implemented from the expert system. The potential glacial lake polygons are then classified either as glacial lakes or non-glacial lakes by the trained and tested random forest classifier algorithm. The performance of the method was assessed in eight test sites located across the Alpine regions (e.g. the Boshula mountain range and Koshi basin in the Himalayas, the Tajiks Pamirs, the Swiss Alps and the Peruvian Andes) of the word. We show that the proposed method performs efficiently irrespective of geographic, geologic, climatic, and glacial lake conditions.</p>

Download Full-text

A Comparative Study using Feature Selection to Predict the Behaviour of Bank Customers

E3S Web of Conferences ◽

10.1051/e3sconf/202018401011 ◽

2020 ◽

Vol 184 ◽

pp. 01011

Author(s):

Sreethi Musunuru ◽

Mahaalakshmi Mukkamala ◽

Latha Kunaparaju ◽

N V Ganapathi Raju

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Random Forest Classifier ◽

Customer Behavior ◽

Machine Learning Algorithms ◽

The Status ◽

Personal Level ◽

Near Future ◽

Structure Communication

Though banks hold an abundance of data on their customers in general, it is not unusual for them to track the actions of the creditors regularly to improve the services they offer to them and understand why a lot of them choose to exit and shift to other banks. Analyzing customer behavior can be highly beneficial to the banks as they can reach out to their customers on a personal level and develop a business model that will improve the pricing structure, communication, advertising, and benefits for their customers and themselves. Features like the amount a customer credits every month, his salary per annum, the gender of the customer, etc. are used to classify them using machine learning algorithms like K Neighbors Classifier and Random Forest Classifier. On classifying the customers, banks can get an idea of who will be continuing with them and who will be leaving them in the near future. Our study determines to remove the features that are independent but are not influential to determine the status of the customers in the future without the loss of accuracy and to improve the model to see if this will also increase the accuracy of the results.

Download Full-text