scholarly journals Boosting algorithms for prediction in agriculture: an application of feature importance and feature selection boosting algorithms for prediction crop damage.

Author(s):  
Viviane Costa Silva ◽  
Mateus Silva Rocha ◽  
Glaucia Amorim Faria ◽  
Silvio Fernando Alves Xavier Junior ◽  
Tiago Almeida de Oliveira ◽  
...  

Abstract The Agriculture sector has created and collected large amounts of data. It can be gathered, stored, and analyzed to assist in decision making generating competitive value, and the use of Machine Learning techniques has been very effective for this market. In this work, a Machine Learning study was carried out using supervised classification models based on boosting to predict disease in a crop, thus identifying the model with the best areas under curve metrics. Light Gradient Boosting Machine, CatBoost Classifier, Extreme Gradient, Gradient Boosting Classifier, Adaboost models were used to qualify the crop as healthy or sick. One can see that the LightGBM algorithm provided a better fit to the data with an area under the curve of 0.76 under the use of BORUTA variable selection.

Author(s):  
Vikash Chandra Sharma ◽  
David Frankenfield ◽  
Anupam Gupta ◽  
Rama Krishna Singh

More than two-third of emerging infectious diseases in recent decades are zoonotic in origin. Timely prediction of these diseases which migrate from animals to humans and preventive measures to stop the loss in terms of morbidity and mortality is the requirement of healthcare industry. Avian Influenza is one of the zoonotic diseases that have created havoc in recent past especially in Asian subcontinent. In past, attempts have been made to predict influenza using traditional time-series techniques (AR, MA, ARMA, ARIMA etc.) as well as machine learning techniques to capture the cyclicity and seasonality of these virus strains. In current research an effort has been made to utilize the Empirical Mode Decomposition (EMD) to extract the Intrinsic Mode function (IMF) and then apply state of art Machine Learning (ML) techniques to predict the series. Several machine learning techniques like Random Forest (RF) along with Gradient Boosting Machine (GBM) and Support Vector Regression (SVR)have been applied on the decomposed series. Exogenous models showed variables like temperature, humidity and precipitation have been incorporated to improve upon the forecast. An ensemble approach of ML models showed significant improvement over the traditional models in terms of long term forecast accuracy.


2019 ◽  
Vol 8 (3) ◽  
pp. 1268-1271

On the 15th of April, 1912 the titanic witnessed a disaster resulting in the sinking of her passengers on the maiden voyage near North Atlantic. Even though it is a very long time since this maritime disaster took place, the idea behind what impacts each individual survival is still a great research attracting researcher’s attention. The approach taken in this paper is to utilize the publically available data set from website called Kaggle. Kaggle is a popular data science webpage that put together information of people in the titanic into a data set for the data mining competition: “Titanic: Machine Learning from Disaster”. The research and comparisons in this paper uses a few machine learning techniques and algorithms to analyse the data for classification and prediction of survivors. The prediction and efficiency of these algorithms depend greatly on data analysis and model. The techniques used to do so are Random Forest, Support Vector Machine, Gradient Boosting Machine.


Materials ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1089
Author(s):  
Sung-Hee Kim ◽  
Chanyoung Jeong

This study aims to demonstrate the feasibility of applying eight machine learning algorithms to predict the classification of the surface characteristics of titanium oxide (TiO2) nanostructures with different anodization processes. We produced a total of 100 samples, and we assessed changes in TiO2 nanostructures’ thicknesses by performing anodization. We successfully grew TiO2 films with different thicknesses by one-step anodization in ethylene glycol containing NH4F and H2O at applied voltage differences ranging from 10 V to 100 V at various anodization durations. We found that the thicknesses of TiO2 nanostructures are dependent on anodization voltages under time differences. Therefore, we tested the feasibility of applying machine learning algorithms to predict the deformation of TiO2. As the characteristics of TiO2 changed based on the different experimental conditions, we classified its surface pore structure into two categories and four groups. For the classification based on granularity, we assessed layer creation, roughness, pore creation, and pore height. We applied eight machine learning techniques to predict classification for binary and multiclass classification. For binary classification, random forest and gradient boosting algorithm had relatively high performance. However, all eight algorithms had scores higher than 0.93, which signifies high prediction on estimating the presence of pore. In contrast, decision tree and three ensemble methods had a relatively higher performance for multiclass classification, with an accuracy rate greater than 0.79. The weakest algorithm used was k-nearest neighbors for both binary and multiclass classifications. We believe that these results show that we can apply machine learning techniques to predict surface quality improvement, leading to smart manufacturing technology to better control color appearance, super-hydrophobicity, super-hydrophilicity or batter efficiency.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Tahani Daghistani ◽  
Huda AlGhamdi ◽  
Riyad Alshammari ◽  
Raed H. AlHazme

AbstractOutpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813‬) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques.


Webology ◽  
2021 ◽  
Vol 18 (Special Issue 01) ◽  
pp. 183-195
Author(s):  
Thingbaijam Lenin ◽  
N. Chandrasekaran

Student’s academic performance is one of the most important parameters for evaluating the standard of any institute. It has become a paramount importance for any institute to identify the student at risk of underperforming or failing or even drop out from the course. Machine Learning techniques may be used to develop a model for predicting student’s performance as early as at the time of admission. The task however is challenging as the educational data required to explore for modelling are usually imbalanced. We explore ensemble machine learning techniques namely bagging algorithm like random forest (rf) and boosting algorithms like adaptive boosting (adaboost), stochastic gradient boosting (gbm), extreme gradient boosting (xgbTree) in an attempt to develop a model for predicting the student’s performance of a private university at Meghalaya using three categories of data namely demographic, prior academic record, personality. The collected data are found to be highly imbalanced and also consists of missing values. We employ k-nearest neighbor (knn) data imputation technique to tackle the missing values. The models are developed on the imputed data with 10 fold cross validation technique and are evaluated using precision, specificity, recall, kappa metrics. As the data are imbalanced, we avoid using accuracy as the metrics of evaluating the model and instead use balanced accuracy and F-score. We compare the ensemble technique with single classifier C4.5. The best result is provided by random forest and adaboost with F-score of 66.67%, balanced accuracy of 75%, and accuracy of 96.94%.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
B. A Omodunbi

Diabetes mellitus is a health disorder that occurs when the blood sugar level becomes extremely high due to body resistance in producing the required amount of insulin. The aliment happens to be among the major causes of death in Nigeria and the world at large. This study was carried out to detect diabetes mellitus by developing a hybrid model that comprises of two machine learning model namely Light Gradient Boosting Machine (LGBM) and K-Nearest Neighbor (KNN). This research is aimed at developing a machine learning model for detecting the occurrence of diabetes in patients. The performance metrics employed in evaluating the finding for this study are Receiver Operating Characteristics (ROC) Curve, Five-fold Cross-validation, precision, and accuracy score. The proposed system had an accuracy of 91% and the area under the Receiver Operating Characteristic Curve was 93%. The experimental result shows that the prediction accuracy of the hybrid model is better than traditional machine learning


2020 ◽  
Vol 29 (4) ◽  
pp. e70-e80
Author(s):  
Mireia Ladios-Martin ◽  
José Fernández-de-Maya ◽  
Francisco-Javier Ballesta-López ◽  
Adrián Belso-Garzas ◽  
Manuel Mas-Asencio ◽  
...  

Background Pressure injuries are an important problem in hospital care. Detecting the population at risk for pressure injuries is the first step in any preventive strategy. Available tools such as the Norton and Braden scales do not take into account all of the relevant risk factors. Data mining and machine learning techniques have the potential to overcome this limitation. Objectives To build a model to detect pressure injury risk in intensive care unit patients and to put the model into production in a real environment. Methods The sample comprised adult patients admitted to an intensive care unit (N = 6694) at University Hospital of Torrevieja and University Hospital of Vinalopó. A retrospective design was used to train (n = 2508) and test (n = 1769) the model and then a prospective design was used to test the model in a real environment (n = 2417). Data mining was used to extract variables from electronic medical records and a predictive model was built with machine learning techniques. The sensitivity, specificity, area under the curve, and accuracy of the model were evaluated. Results The final model used logistic regression and incorporated 23 variables. The model had sensitivity of 0.90, specificity of 0.74, and area under the curve of 0.89 during the initial test, and thus it outperformed the Norton scale. The model performed well 1 year later in a real environment. Conclusions The model effectively predicts risk of pressure injury. This allows nurses to focus on patients at high risk for pressure injury without increasing workload.


2018 ◽  
Vol 13 (2) ◽  
pp. 235-250 ◽  
Author(s):  
Yixuan Ma ◽  
Zhenji Zhang ◽  
Alexander Ihler ◽  
Baoxiang Pan

Boosted by the growing logistics industry and digital transformation, the sharing warehouse market is undergoing a rapid development. Both supply and demand sides in the warehouse rental business are faced with market perturbations brought by unprecedented peer competitions and information transparency. A key question faced by the participants is how to price warehouses in the open market. To understand the pricing mechanism, we built a real world warehouse dataset using data collected from the classified advertisements websites. Based on the dataset, we applied machine learning techniques to relate warehouse price with its relevant features, such as warehouse size, location and nearby real estate price. Four candidate models are used here: Linear Regression, Regression Tree, Random Forest Regression and Gradient Boosting Regression Trees. The case study in the Beijing area shows that warehouse rent is closely related to its location and land price. Models considering multiple factors have better skill in estimating warehouse rent, compared to singlefactor estimation. Additionally, tree models have better performance than the linear model, with the best model (Random Forest) achieving correlation coefficient of 0.57 in the test set. Deeper investigation of feature importance illustrates that distance from the city center plays the most important role in determining warehouse price in Beijing, followed by nearby real estate price and warehouse size.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Chalachew Muluken Liyew ◽  
Haileyesus Amsaya Melese

AbstractPredicting the amount of daily rainfall improves agricultural productivity and secures food and water supply to keep citizens healthy. To predict rainfall, several types of research have been conducted using data mining and machine learning techniques of different countries’ environmental datasets. An erratic rainfall distribution in the country affects the agriculture on which the economy of the country depends on. Wise use of rainfall water should be planned and practiced in the country to minimize the problem of the drought and flood occurred in the country. The main objective of this study is to identify the relevant atmospheric features that cause rainfall and predict the intensity of daily rainfall using machine learning techniques. The Pearson correlation technique was used to select relevant environmental variables which were used as an input for the machine learning model. The dataset was collected from the local meteorological office at Bahir Dar City, Ethiopia to measure the performance of three machine learning techniques (Multivariate Linear Regression, Random Forest, and Extreme Gradient Boost). Root mean squared error and Mean absolute Error methods were used to measure the performance of the machine learning model. The result of the study revealed that the Extreme Gradient Boosting machine learning algorithm performed better than others.


Sign in / Sign up

Export Citation Format

Share Document