scholarly journals Predicting Out-of-Stock Using Machine Learning: An Application in a Retail Packaged Foods Manufacturing Company

Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2787
Author(s):  
Juan Manuel Rozas Andaur ◽  
Gonzalo A. Ruz ◽  
Marcos Goycoolea

For decades, Out-of-Stock (OOS) events have been a problem for retailers and manufacturers. In grocery retailing, an OOS event is used to characterize the condition in which customers do not find a certain commodity while attempting to buy it. This paper focuses on addressing this problem from a manufacturer’s perspective, conducting a case study in a retail packaged foods manufacturing company located in Latin America. We developed two machine learning based systems to detect OOS events automatically. The first is based on a single Random Forest classifier with balanced data, and the second is an ensemble of six different classification algorithms. We used transactional data from the manufacturer information system and physical audits. The novelty of this work is our use of new predictor variables of OOS events. The system was successfully implemented and tested in a retail packaged foods manufacturer company. By incorporating the new predictive variables in our Random Forest and Ensemble classifier, we were able to improve their system’s predictive power. In particular, the Random Forest classifier presented the best performance in a real-world setting, achieving a detection precision of 72% and identifying 68% of the total OOS events. Finally, the incorporation of our new predictor variables allowed us to improve the performance of the Random Forest by 0.24 points in the F-measure.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Elisabeth Sartoretti ◽  
Thomas Sartoretti ◽  
Michael Wyss ◽  
Carolin Reischauer ◽  
Luuk van Smoorenburg ◽  
...  

AbstractWe sought to evaluate the utility of radiomics for Amide Proton Transfer weighted (APTw) imaging by assessing its value in differentiating brain metastases from high- and low grade glial brain tumors. We retrospectively identified 48 treatment-naïve patients (10 WHO grade 2, 1 WHO grade 3, 10 WHO grade 4 primary glial brain tumors and 27 metastases) with either primary glial brain tumors or metastases who had undergone APTw MR imaging. After image analysis with radiomics feature extraction and post-processing, machine learning algorithms (multilayer perceptron machine learning algorithm; random forest classifier) with stratified tenfold cross validation were trained on features and were used to differentiate the brain neoplasms. The multilayer perceptron achieved an AUC of 0.836 (receiver operating characteristic curve) in differentiating primary glial brain tumors from metastases. The random forest classifier achieved an AUC of 0.868 in differentiating WHO grade 4 from WHO grade 2/3 primary glial brain tumors. For the differentiation of WHO grade 4 tumors from grade 2/3 tumors and metastases an average AUC of 0.797 was achieved. Our results indicate that the use of radiomics for APTw imaging is feasible and the differentiation of primary glial brain tumors from metastases is achievable with a high degree of accuracy.


2017 ◽  
Vol 25 (3) ◽  
pp. 811-827 ◽  
Author(s):  
Dimitris Spathis ◽  
Panayiotis Vlamos

This study examines the clinical decision support systems in healthcare, in particular about the prevention, diagnosis and treatment of respiratory diseases, such as Asthma and chronic obstructive pulmonary disease. The empirical pulmonology study of a representative sample (n = 132) attempts to identify the major factors that contribute to the diagnosis of these diseases. Machine learning results show that in chronic obstructive pulmonary disease’s case, Random Forest classifier outperforms other techniques with 97.7 per cent precision, while the most prominent attributes for diagnosis are smoking, forced expiratory volume 1, age and forced vital capacity. In asthma’s case, the best precision, 80.3 per cent, is achieved again with the Random Forest classifier, while the most prominent attribute is MEF2575.


In universities, student dropout is a major concern that reflects the university's quality. Some characteristics cause students to drop out of university. A high dropout rate of students affects the university's reputation and the student's careers in the future. Therefore, there's a requirement for student dropout analysis to enhance academic plan and management to scale back student's drop out from the university also on enhancing the standard of the upper education system. The machine learning technique provides powerful methods for the analysis and therefore the prediction of the dropout. This study uses a dataset from a university representative to develop a model for predicting student dropout. In this work, machine- learning models were used to detect dropout rates. Machine learning is being more widely used in the field of knowledge mining diagnostics. Following an examination of certain studies, we observed that dropout detection may be done using several methods. We've even used five dropout detection models. These models are Decision tree, Naïve bayes, Random Forest Classifier, SVM and KNN. We used machine-learning technology to analyze the data, and we discovered that the Random Forest classifier is highly promising for predicting dropout rates, with a training accuracy of 94% and a testing accuracy of 86%.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-20
Author(s):  
Manuel J. García Rodríguez ◽  
Vicente Rodríguez Montequín ◽  
Francisco Ortega Fernández ◽  
Joaquín M. Villanueva Balsera

Recommending the identity of bidders in public procurement auctions (tenders) has a significant impact in many areas of public procurement, but it has not yet been studied in depth. A bidders recommender would be a very beneficial tool because a supplier (company) can search appropriate tenders and, vice versa, a public procurement agency can discover automatically unknown companies which are suitable for its tender. This paper develops a pioneering algorithm to recommend potential bidders using a machine learning method, particularly a random forest classifier. The bidders recommender is described theoretically, so it can be implemented or adapted to any particular situation. It has been successfully validated with a case study: an actual Spanish tender dataset (free public information) which has 102,087 tenders from 2014 to 2020 and a company dataset (nonfree public information) which has 1,353,213 Spanish companies. Quantitative, graphical, and statistical descriptions of both datasets are presented. The results of the case study were satisfactory: the winning bidding company is within the recommended companies group, from 24% to 38% of the tenders, according to different test conditions and scenarios.


Author(s):  
Amy Marie Campbell ◽  
Marie-Fanny Racault ◽  
Stephen Goult ◽  
Angus Laurenson

Oceanic and coastal ecosystems have undergone complex environmental changes in recent years, amid a context of climate change. These changes are also reflected in the dynamics of water-borne diseases as some of the causative agents of these illnesses are ubiquitous in the aquatic environment and their survival rates are impacted by changes in climatic conditions. Previous studies have established strong relationships between essential climate variables and the coastal distribution and seasonal dynamics of the bacteria Vibrio cholerae, pathogenic types of which are responsible for human cholera disease. In this study we provide a novel exploration of the potential of a machine learning approach to forecast environmental cholera risk in coastal India, home to more than 200 million inhabitants, utilising atmospheric, terrestrial and oceanic satellite-derived essential climate variables. A Random Forest classifier model is developed, trained and tested on a cholera outbreak dataset over the period 2010–2018 for districts along coastal India. The random forest classifier model has an Accuracy of 0.99, an F1 Score of 0.942 and a Sensitivity score of 0.895, meaning that 89.5% of outbreaks are correctly identified. Spatio-temporal patterns emerged in terms of the model’s performance based on seasons and coastal locations. Further analysis of the specific contribution of each Essential Climate Variable to the model outputs shows that chlorophyll-a concentration, sea surface salinity and land surface temperature are the strongest predictors of the cholera outbreaks in the dataset used. The study reveals promising potential of the use of random forest classifiers and remotely-sensed essential climate variables for the development of environmental cholera-risk applications. Further exploration of the present random forest model and associated essential climate variables is encouraged on cholera surveillance datasets in other coastal areas affected by the disease to determine the model’s transferability potential and applicative value for cholera forecasting systems.


2019 ◽  
Vol 20 (S2) ◽  
Author(s):  
Varun Khanna ◽  
Lei Li ◽  
Johnson Fung ◽  
Shoba Ranganathan ◽  
Nikolai Petrovsky

Abstract Background Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to the considerable number of rotatable bonds in ODNs, high-throughput in silico screening for potential TLR9 activity via traditional structure-based virtual screening approaches of CpG ODNs is challenging. In the current study, we present a machine learning based method for predicting novel mouse TLR9 (mTLR9) agonists based on features including count and position of motifs, the distance between the motifs and graphically derived features such as the radius of gyration and moment of Inertia. We employed an in-house experimentally validated dataset of 396 single-stranded synthetic ODNs, to compare the results of five machine learning algorithms. Since the dataset was highly imbalanced, we used an ensemble learning approach based on repeated random down-sampling. Results Using in-house experimental TLR9 activity data we found that random forest algorithm outperformed other algorithms for our dataset for TLR9 activity prediction. Therefore, we developed a cross-validated ensemble classifier of 20 random forest models. The average Matthews correlation coefficient and balanced accuracy of our ensemble classifier in test samples was 0.61 and 80.0%, respectively, with the maximum balanced accuracy and Matthews correlation coefficient of 87.0% and 0.75, respectively. We confirmed common sequence motifs including ‘CC’, ‘GG’,‘AG’, ‘CCCG’ and ‘CGGC’ were overrepresented in mTLR9 agonists. Predictions on 6000 randomly generated ODNs were ranked and the top 100 ODNs were synthesized and experimentally tested for activity in a mTLR9 reporter cell assay, with 91 of the 100 selected ODNs showing high activity, confirming the accuracy of the model in predicting mTLR9 activity. Conclusion We combined repeated random down-sampling with random forest to overcome the class imbalance problem and achieved promising results. Overall, we showed that the random forest algorithm outperformed other machine learning algorithms including support vector machines, shrinkage discriminant analysis, gradient boosting machine and neural networks. Due to its predictive performance and simplicity, the random forest technique is a useful method for prediction of mTLR9 ODN agonists.


2020 ◽  
Author(s):  
Sonam Wangchuk ◽  
Tobias Bolch

<p>An accurate detection and mapping of glacial lakes in the Alpine regions such as the Himalayas, the Alps and the Andes are challenged by many factors. These factors include 1) a small size of glacial lakes, 2) cloud cover in optical satellite images, 3) cast shadows from mountains and clouds, 4) seasonal snow in satellite images, 5) varying degree of turbidity amongst glacial lakes, and 6) frozen glacial lake surface. In our study, we propose a fully automated approach, that overcomes most of the above mentioned challenges, to detect and map glacial lakes accurately using multi-source data and machine learning techniques such as the random forest classifier algorithm. The multi-source data are from the Sentinel-1 Synthetic Aperture Radar data (radar backscatter), the Sentinel-2 multispectral instrument data (NDWI), and the SRTM digital elevation model (slope). We use these data as inputs for the rule-based segmentation of potential glacial lakes, where decision rules are implemented from the expert system. The potential glacial lake polygons are then classified either as glacial lakes or non-glacial lakes by the trained and tested random forest classifier algorithm. The performance of the method was assessed in eight test sites located across the Alpine regions (e.g. the Boshula mountain range and Koshi basin in the Himalayas, the Tajiks Pamirs, the Swiss Alps and the Peruvian Andes) of the word. We show that the proposed method performs efficiently irrespective of geographic, geologic, climatic, and glacial lake conditions.</p>


2020 ◽  
Vol 184 ◽  
pp. 01011
Author(s):  
Sreethi Musunuru ◽  
Mahaalakshmi Mukkamala ◽  
Latha Kunaparaju ◽  
N V Ganapathi Raju

Though banks hold an abundance of data on their customers in general, it is not unusual for them to track the actions of the creditors regularly to improve the services they offer to them and understand why a lot of them choose to exit and shift to other banks. Analyzing customer behavior can be highly beneficial to the banks as they can reach out to their customers on a personal level and develop a business model that will improve the pricing structure, communication, advertising, and benefits for their customers and themselves. Features like the amount a customer credits every month, his salary per annum, the gender of the customer, etc. are used to classify them using machine learning algorithms like K Neighbors Classifier and Random Forest Classifier. On classifying the customers, banks can get an idea of who will be continuing with them and who will be leaving them in the near future. Our study determines to remove the features that are independent but are not influential to determine the status of the customers in the future without the loss of accuracy and to improve the model to see if this will also increase the accuracy of the results.


Sign in / Sign up

Export Citation Format

Share Document