scholarly journals Prediction of probable backorder scenarios in the supply chain using Distributed Random Forest and Gradient Boosting Machine learning techniques

2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Samiul Islam ◽  
Saman Hassanzadeh Amin
Webology ◽  
2021 ◽  
Vol 18 (Special Issue 01) ◽  
pp. 183-195
Author(s):  
Thingbaijam Lenin ◽  
N. Chandrasekaran

Student’s academic performance is one of the most important parameters for evaluating the standard of any institute. It has become a paramount importance for any institute to identify the student at risk of underperforming or failing or even drop out from the course. Machine Learning techniques may be used to develop a model for predicting student’s performance as early as at the time of admission. The task however is challenging as the educational data required to explore for modelling are usually imbalanced. We explore ensemble machine learning techniques namely bagging algorithm like random forest (rf) and boosting algorithms like adaptive boosting (adaboost), stochastic gradient boosting (gbm), extreme gradient boosting (xgbTree) in an attempt to develop a model for predicting the student’s performance of a private university at Meghalaya using three categories of data namely demographic, prior academic record, personality. The collected data are found to be highly imbalanced and also consists of missing values. We employ k-nearest neighbor (knn) data imputation technique to tackle the missing values. The models are developed on the imputed data with 10 fold cross validation technique and are evaluated using precision, specificity, recall, kappa metrics. As the data are imbalanced, we avoid using accuracy as the metrics of evaluating the model and instead use balanced accuracy and F-score. We compare the ensemble technique with single classifier C4.5. The best result is provided by random forest and adaboost with F-score of 66.67%, balanced accuracy of 75%, and accuracy of 96.94%.


2018 ◽  
Vol 13 (2) ◽  
pp. 235-250 ◽  
Author(s):  
Yixuan Ma ◽  
Zhenji Zhang ◽  
Alexander Ihler ◽  
Baoxiang Pan

Boosted by the growing logistics industry and digital transformation, the sharing warehouse market is undergoing a rapid development. Both supply and demand sides in the warehouse rental business are faced with market perturbations brought by unprecedented peer competitions and information transparency. A key question faced by the participants is how to price warehouses in the open market. To understand the pricing mechanism, we built a real world warehouse dataset using data collected from the classified advertisements websites. Based on the dataset, we applied machine learning techniques to relate warehouse price with its relevant features, such as warehouse size, location and nearby real estate price. Four candidate models are used here: Linear Regression, Regression Tree, Random Forest Regression and Gradient Boosting Regression Trees. The case study in the Beijing area shows that warehouse rent is closely related to its location and land price. Models considering multiple factors have better skill in estimating warehouse rent, compared to singlefactor estimation. Additionally, tree models have better performance than the linear model, with the best model (Random Forest) achieving correlation coefficient of 0.57 in the test set. Deeper investigation of feature importance illustrates that distance from the city center plays the most important role in determining warehouse price in Beijing, followed by nearby real estate price and warehouse size.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 403
Author(s):  
Muhammad Waleed ◽  
Tai-Won Um ◽  
Tariq Kamal ◽  
Syed Muhammad Usman

In this paper, we apply the multi-class supervised machine learning techniques for classifying the agriculture farm machinery. The classification of farm machinery is important when performing the automatic authentication of field activity in a remote setup. In the absence of a sound machine recognition system, there is every possibility of a fraudulent activity taking place. To address this need, we classify the machinery using five machine learning techniques—K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and Gradient Boosting (GB). For training of the model, we use the vibration and tilt of machinery. The vibration and tilt of machinery are recorded using the accelerometer and gyroscope sensors, respectively. The machinery included the leveler, rotavator and cultivator. The preliminary analysis on the collected data revealed that the farm machinery (when in operation) showed big variations in vibration and tilt, but observed similar means. Additionally, the accuracies of vibration-based and tilt-based classifications of farm machinery show good accuracy when used alone (with vibration showing slightly better numbers than the tilt). However, the accuracies improve further when both (the tilt and vibration) are used together. Furthermore, all five machine learning algorithms used for classification have an accuracy of more than 82%, but random forest was the best performing. The gradient boosting and random forest show slight over-fitting (about 9%), but both algorithms produce high testing accuracy. In terms of execution time, the decision tree takes the least time to train, while the gradient boosting takes the most time.


2021 ◽  
Vol 11 (5) ◽  
pp. 343
Author(s):  
Fabiana Tezza ◽  
Giulia Lorenzoni ◽  
Danila Azzolina ◽  
Sofia Barbar ◽  
Lucia Anna Carmela Leone ◽  
...  

The present work aims to identify the predictors of COVID-19 in-hospital mortality testing a set of Machine Learning Techniques (MLTs), comparing their ability to predict the outcome of interest. The model with the best performance will be used to identify in-hospital mortality predictors and to build an in-hospital mortality prediction tool. The study involved patients with COVID-19, proved by PCR test, admitted to the “Ospedali Riuniti Padova Sud” COVID-19 referral center in the Veneto region, Italy. The algorithms considered were the Recursive Partition Tree (RPART), the Support Vector Machine (SVM), the Gradient Boosting Machine (GBM), and Random Forest. The resampled performances were reported for each MLT, considering the sensitivity, specificity, and the Receiving Operative Characteristic (ROC) curve measures. The study enrolled 341 patients. The median age was 74 years, and the male gender was the most prevalent. The Random Forest algorithm outperformed the other MLTs in predicting in-hospital mortality, with a ROC of 0.84 (95% C.I. 0.78–0.9). Age, together with vital signs (oxygen saturation and the quick SOFA) and lab parameters (creatinine, AST, lymphocytes, platelets, and hemoglobin), were found to be the strongest predictors of in-hospital mortality. The present work provides insights for the prediction of in-hospital mortality of COVID-19 patients using a machine-learning algorithm.


Author(s):  
Zulqarnain Khokhar ◽  
◽  
Murtaza Ahmed Siddiqi ◽  

Wi-Fi based indoor positioning with the help of access points and smart devices have become an integral part in finding a device or a person’s location. Wi-Fi based indoor localization technology has been among the most attractive field for researchers for a number of years. In this paper, we have presented Wi-Fi based in-door localization using three different machine-learning techniques. The three machine learning algorithms implemented and compared are Decision Tree, Random Forest and Gradient Boosting classifier. After making a fingerprint of the floor based on Wi-Fi signals, mentioned algorithms were used to identify device location at thirty different positions on the floor. Random Forest and Gradient Boosting classifier were able to identify the location of the device with accuracy higher than 90%. While Decision Tree was able to identify the location with accuracy a bit higher than 80%.


2021 ◽  
Author(s):  
Viviane Costa Silva ◽  
Mateus Silva Rocha ◽  
Glaucia Amorim Faria ◽  
Silvio Fernando Alves Xavier Junior ◽  
Tiago Almeida de Oliveira ◽  
...  

Abstract The Agriculture sector has created and collected large amounts of data. It can be gathered, stored, and analyzed to assist in decision making generating competitive value, and the use of Machine Learning techniques has been very effective for this market. In this work, a Machine Learning study was carried out using supervised classification models based on boosting to predict disease in a crop, thus identifying the model with the best areas under curve metrics. Light Gradient Boosting Machine, CatBoost Classifier, Extreme Gradient, Gradient Boosting Classifier, Adaboost models were used to qualify the crop as healthy or sick. One can see that the LightGBM algorithm provided a better fit to the data with an area under the curve of 0.76 under the use of BORUTA variable selection.


Author(s):  
Vikash Chandra Sharma ◽  
David Frankenfield ◽  
Anupam Gupta ◽  
Rama Krishna Singh

More than two-third of emerging infectious diseases in recent decades are zoonotic in origin. Timely prediction of these diseases which migrate from animals to humans and preventive measures to stop the loss in terms of morbidity and mortality is the requirement of healthcare industry. Avian Influenza is one of the zoonotic diseases that have created havoc in recent past especially in Asian subcontinent. In past, attempts have been made to predict influenza using traditional time-series techniques (AR, MA, ARMA, ARIMA etc.) as well as machine learning techniques to capture the cyclicity and seasonality of these virus strains. In current research an effort has been made to utilize the Empirical Mode Decomposition (EMD) to extract the Intrinsic Mode function (IMF) and then apply state of art Machine Learning (ML) techniques to predict the series. Several machine learning techniques like Random Forest (RF) along with Gradient Boosting Machine (GBM) and Support Vector Regression (SVR)have been applied on the decomposed series. Exogenous models showed variables like temperature, humidity and precipitation have been incorporated to improve upon the forecast. An ensemble approach of ML models showed significant improvement over the traditional models in terms of long term forecast accuracy.


2019 ◽  
Vol 8 (3) ◽  
pp. 1268-1271

On the 15th of April, 1912 the titanic witnessed a disaster resulting in the sinking of her passengers on the maiden voyage near North Atlantic. Even though it is a very long time since this maritime disaster took place, the idea behind what impacts each individual survival is still a great research attracting researcher’s attention. The approach taken in this paper is to utilize the publically available data set from website called Kaggle. Kaggle is a popular data science webpage that put together information of people in the titanic into a data set for the data mining competition: “Titanic: Machine Learning from Disaster”. The research and comparisons in this paper uses a few machine learning techniques and algorithms to analyse the data for classification and prediction of survivors. The prediction and efficiency of these algorithms depend greatly on data analysis and model. The techniques used to do so are Random Forest, Support Vector Machine, Gradient Boosting Machine.


2021 ◽  
Author(s):  
Randa Natras ◽  
Michael Schmidt

<p>The accuracy and reliability of Global Navigation Satellite System (GNSS) applications are affected by the state of the Earth‘s ionosphere, especially when using single frequency observations, which are employed mostly in mass-market GNSS receivers. In addition, space weather can be the cause of strong sudden disturbances in the ionosphere, representing a major risk for GNSS performance and reliability. Accurate corrections of ionospheric effects and early warning information in the presence of space weather are therefore crucial for GNSS applications. This correction information can be obtained by employing a model that describes the complex relation of space weather processes with the non-linear spatial and temporal variability of the Vertical Total Electron Content (VTEC) within the ionosphere and includes a forecast component considering space weather events to provide an early warning system. To develop such a model is challenging but an important task and of high interest for the GNSS community.</p><p>To model the impact of space weather, a complex chain of physical dynamical processes between the Sun, the interplanetary magnetic field, the Earth's magnetic field and the ionosphere need to be taken into account. Machine learning techniques are suitable in finding patterns and relationships from historical data to solve problems that are too complex for a traditional approach requiring an extensive set of rules (equations) or for which there is no acceptable solution available yet.</p><p>The main objective of this study is to develop a model for forecasting the ionospheric VTEC taking into account physical processes and utilizing state-of-art machine learning techniques to learn complex non-linear relationships from the data. In this work, supervised learning is applied to forecast VTEC. This means that the model is provided by a set of (input) variables that have some influence on the VTEC forecast (output). To be more specific, data of solar activity, solar wind, interplanetary and geomagnetic field and other information connected to the VTEC variability are used as input to predict VTEC values in the future. Different machine learning algorithms are applied, such as decision tree regression, random forest regression and gradient boosting. The decision trees are the simplest and easiest to interpret machine learning algorithms, but the forecasted VTEC lacks smoothness. On the other hand, random forest and gradient boosting use a combination of multiple regression trees, which lead to improvements in the prediction accuracy and smoothness. However, the results show that the overall performance of the algorithms, measured by the root mean square error, does not differ much from each other and improves when the data are well prepared, i.e. cleaned and transformed to remove trends. Preliminary results of this study will be presented including the methodology, goals, challenges and perspectives of developing the machine learning model.</p>


Materials ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1089
Author(s):  
Sung-Hee Kim ◽  
Chanyoung Jeong

This study aims to demonstrate the feasibility of applying eight machine learning algorithms to predict the classification of the surface characteristics of titanium oxide (TiO2) nanostructures with different anodization processes. We produced a total of 100 samples, and we assessed changes in TiO2 nanostructures’ thicknesses by performing anodization. We successfully grew TiO2 films with different thicknesses by one-step anodization in ethylene glycol containing NH4F and H2O at applied voltage differences ranging from 10 V to 100 V at various anodization durations. We found that the thicknesses of TiO2 nanostructures are dependent on anodization voltages under time differences. Therefore, we tested the feasibility of applying machine learning algorithms to predict the deformation of TiO2. As the characteristics of TiO2 changed based on the different experimental conditions, we classified its surface pore structure into two categories and four groups. For the classification based on granularity, we assessed layer creation, roughness, pore creation, and pore height. We applied eight machine learning techniques to predict classification for binary and multiclass classification. For binary classification, random forest and gradient boosting algorithm had relatively high performance. However, all eight algorithms had scores higher than 0.93, which signifies high prediction on estimating the presence of pore. In contrast, decision tree and three ensemble methods had a relatively higher performance for multiclass classification, with an accuracy rate greater than 0.79. The weakest algorithm used was k-nearest neighbors for both binary and multiclass classifications. We believe that these results show that we can apply machine learning techniques to predict surface quality improvement, leading to smart manufacturing technology to better control color appearance, super-hydrophobicity, super-hydrophilicity or batter efficiency.


Sign in / Sign up

Export Citation Format

Share Document