Assessing the Sensitivity of Global Maize Price to Regional Productions Using Statistical and Machine Learning Methods

Agricultural price shocks strongly affect farmers' income and food security. It is therefore important to understand and anticipate their origins and occurrence, particularly for the world's main agricultural commodities. In this study, we assess the impacts of yearly variations in regional maize productions and yields on global maize prices using several statistical and machine-learning (ML) methods. Our results show that, of all regions considered, Northern America is by far the most influential. More specifically, our models reveal that a yearly yield gain of +8% in Northern America negatively impacts the global maize price by about –7%, while a decrease of –0.1% is expected to increase global maize price by more than +7%. Our classification models show that a small decrease in the maize yield in Northern America can inflate the probability of maize price increase on the global scale. The maize productions in the other regions have a much lower influence on the global price. Among the tested methods, random forest and gradient boosting perform better than linear models. Our results highlight the interest of ML in analyzing global prices of major commodities and reveal the strong sensitivity of maize prices to small variations of maize production in Northern America.

Download Full-text

Machine learning algorithms for predicting undernutrition among under-five children in Ethiopia

Public Health Nutrition ◽

10.1017/s1368980021004262 ◽

2021 ◽

pp. 1-29

Author(s):

Fikrewold H. Bitew ◽

Corey S. Sparks ◽

Samuel H. Nyarko

Keyword(s):

Machine Learning ◽

Linear Models ◽

Learning Algorithms ◽

Public Health Problem ◽

Water Source ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Global Public Health ◽

Prediction Ability ◽

Extreme Gradient Boosting

Abstract Objective: Child undernutrition is a global public health problem with serious implications. In this study, estimate predictive algorithms for the determinants of childhood stunting by using various machine learning (ML) algorithms. Design: This study draws on data from the Ethiopian Demographic and Health Survey of 2016. Five machine learning algorithms including eXtreme gradient boosting (xgbTree), k-nearest neighbors (K-NN), random forest (RF), neural network (NNet), and the generalized linear models (GLM) were considered to predict the socio-demographic risk factors for undernutrition in Ethiopia. Setting: Households in Ethiopia. Participants: A total of 9,471 children below five years of age. Results: The descriptive results show substantial regional variations in child stunting, wasting, and underweight in Ethiopia. Also, among the five ML algorithms, xgbTree algorithm shows a better prediction ability than the generalized linear mixed algorithm. The best predicting algorithm (xgbTree) shows diverse important predictors of undernutrition across the three outcomes which include time to water source, anemia history, child age greater than 30 months, small birth size, and maternal underweight, among others. Conclusions: The xgbTree algorithm was a reasonably superior ML algorithm for predicting childhood undernutrition in Ethiopia compared to other ML algorithms considered in this study. The findings support improvement in access to water supply, food security, and fertility regulation among others in the quest to considerably improve childhood nutrition in Ethiopia.

Download Full-text

Machine learning-based prediction of acute kidney injury after nephrectomy in patients with renal cell carcinoma

Scientific Reports ◽

10.1038/s41598-021-95019-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yeonhee Lee ◽

Jiwon Ryu ◽

Min Woo Kang ◽

Kyung Ha Seo ◽

Jayoun Kim ◽

...

Keyword(s):

Machine Learning ◽

Renal Cell Carcinoma ◽

Acute Kidney Injury ◽

Cell Carcinoma ◽

Renal Cell ◽

Kidney Injury ◽

Gradient Boosting ◽

Scoring Model ◽

Postoperative Aki ◽

Better Than

AbstractThe precise prediction of acute kidney injury (AKI) after nephrectomy for renal cell carcinoma (RCC) is an important issue because of its relationship with subsequent kidney dysfunction and high mortality. Herein we addressed whether machine learning (ML) algorithms could predict postoperative AKI risk better than conventional logistic regression (LR) models. A total of 4104 RCC patients who had undergone unilateral nephrectomy from January 2003 to December 2017 were reviewed. ML models such as support vector machine, random forest, extreme gradient boosting, and light gradient boosting machine (LightGBM) were developed, and their performance based on the area under the receiver operating characteristic curve, accuracy, and F1 score was compared with that of the LR-based scoring model. Postoperative AKI developed in 1167 patients (28.4%). All the ML models had higher performance index values than the LR-based scoring model. Among them, the LightGBM model had the highest value of 0.810 (0.783–0.837). The decision curve analysis demonstrated a greater net benefit of the ML models than the LR-based scoring model over all the ranges of threshold probabilities. The application of ML algorithms improves the predictability of AKI after nephrectomy for RCC, and these models perform better than conventional LR-based models.

Download Full-text

Supporting an Expert-centric Process of New Product Introduction With Statistical Machine Learning

Business Information Systems ◽

10.52825/bis.v1i.57 ◽

2021 ◽

pp. 187-198

Author(s):

Shima Zahmatkesh ◽

Alessio Bernardo ◽

Emanuele Falzone ◽

Edgardo Di Nicola Carena ◽

Emanuele Della Valle

Keyword(s):

Machine Learning ◽

New Product ◽

Parameter Tuning ◽

Life Cycles ◽

New Product Introduction ◽

Gradient Boosting ◽

Statistical Machine Learning ◽

Product Introduction ◽

Better Than

Industries that sell products with short-term or seasonal life cycles must regularly introduce new products. Forecasting the demand for New Product Introduction (NPI) can be challenging due to the fluctuations of many factors such as trend, seasonality, or other external and unpredictable phenomena (e.g., COVID-19 pandemic). Traditionally, NPI is an expertcentric process. This paper presents a study on automating the forecast of NPI demands using statistical Machine Learning (namely, Gradient Boosting and XGBoost). We show how to overcome shortcomings of the traditional data preparation that underpins the manual process. Moreover, we illustrate the role of cross-validation techniques for the hyper-parameter tuning and the validation of the models. Finally, we provide empirical evidence that statistical Machine Learning can forecast NPI demand better than experts.

Download Full-text

Integrated Model for COVID-19 Diagnosis Based on Computed Tomography AI, and Clinical Features: A Multicenter Cohort Study

10.21203/rs.3.rs-979599/v1 ◽

2021 ◽

Author(s):

Yuki Kataoka ◽

Yuya Kimura ◽

Tatsuyoshi Ikenoue ◽

Yoshinori Matsuoka ◽

Junji Kumasawa ◽

...

Keyword(s):

Machine Learning ◽

Computed Tomography ◽

Cohort Study ◽

Clinical Features ◽

Tertiary Care ◽

Gradient Boosting ◽

Diagnostic Model ◽

Full Model ◽

Light Gradient ◽

Better Than

Abstract Background We developed and validated a machine learning diagnostic model for novel coronavirus (COVID-19) disease, integrating artificial-intelligence-based computed tomography (CT) imaging and clinical features. Methods We conducted a retrospective cohort study in 11 Japanese tertiary care facilities that treated COVID-19 patients. Participants were tested using both real-time reverse transcription polymerase chain reaction (RT-PCR) and chest CT between January 1 and May 30, 2020. We chronologically split the dataset in each hospital into training and test sets, containing patients in a 7:3 ratio. Light Gradient Boosting Machine model was used for analysis. Results A total of 703 patients were included with two models — the full model and the A-blood model — developed for their diagnosis. The A-blood model included eight variables (the Ali-M3 confidence, along with seven clinical features of blood counts and biochemistry markers). The areas under the receiver-operator curve of both models (0.91, 95% confidence interval (CI), 0.86 to 0.95 for the full model and 0.90, 95% CI, 0.86 to 0.94 for the A-blood model) were better than that of the Ali-M3 confidence (0.78, 95% CI, 0.71 to 0.83) in the test set. Conclusions The A-blood model, a COVID-19 diagnostic model developed in this study, combines machine-learning and CT evaluation with blood test data and is better than the Ali-M3 framework existing for this purpose. This would significantly aid physicians in making a quicker diagnosis of COVID-19.

Download Full-text

Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions

Briefings in Bioinformatics ◽

10.1093/bib/bbz173 ◽

2020 ◽

Cited By ~ 4

Author(s):

Chao Shen ◽

Ye Hu ◽

Zhe Wang ◽

Xujun Zhang ◽

Haiyang Zhong ◽

...

Keyword(s):

Machine Learning ◽

Biological Activities ◽

Gradient Boosting ◽

Linear Regression Method ◽

Scoring Functions ◽

Binding Affinities ◽

Training Set ◽

Sequence Similarities ◽

Better Than

Abstract How to accurately estimate protein–ligand binding affinity remains a key challenge in computer-aided drug design (CADD). In many cases, it has been shown that the binding affinities predicted by classical scoring functions (SFs) cannot correlate well with experimentally measured biological activities. In the past few years, machine learning (ML)-based SFs have gradually emerged as potential alternatives and outperformed classical SFs in a series of studies. In this study, to better recognize the potential of classical SFs, we have conducted a comparative assessment of 25 commonly used SFs. Accordingly, the scoring power was systematically estimated by using the state-of-the-art ML methods that replaced the original multiple linear regression method to refit individual energy terms. The results show that the newly-developed ML-based SFs consistently performed better than classical ones. In particular, gradient boosting decision tree (GBDT) and random forest (RF) achieved the best predictions in most cases. The newly-developed ML-based SFs were also tested on another benchmark modified from PDBbind v2007, and the impacts of structural and sequence similarities were evaluated. The results indicated that the superiority of the ML-based SFs could be fully guaranteed when sufficient similar targets were contained in the training set. Moreover, the effect of the combinations of features from multiple SFs was explored, and the results indicated that combining NNscore2.0 with one to four other classical SFs could yield the best scoring power. However, it was not applicable to derive a generic target-specific SF or SF combination.

Download Full-text

Protein pKa prediction by tree-based machine learning

10.26434/chemrxiv-2021-4d420 ◽

2021 ◽

Author(s):

Ada Y. Chen ◽

Juyong Lee ◽

Ana Damjanovic ◽

Bernard R. Brooks

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Pka Prediction ◽

Light Gradient ◽

Structure Database ◽

Gradient Boosting Machine ◽

Extreme Gradient Boosting ◽

Better Than ◽

Protein Pka

We present four tree-based machine learning models for protein pKa prediction. The four models, Random Forest, Extra Trees, eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM), were trained on three experimental PDB and pKa datasets, two of which included a notable portion of internal residues. We observed similar performance among the four machine learning algorithms. The best model trained on the largest dataset performs 37% better than the widely used empirical pKa prediction tool PROPKA. The overall RMSE for this model is 0.69, with surface and buried RMSE values being 0.56 and 0.78, respectively, considering six residue types (Asp, Glu, His, Lys, Cys and Tyr), and 0.63 when considering Asp, Glu, His and Lys only. We provide pKa predictions for proteins in human proteome from the AlphaFold Protein Structure Database and observed that 1% of Asp/Glu/Lys residues have highly shifted pKa values close to the physiological pH.

Download Full-text

MO353MACHINE LEARNING-BASED PREDICTION OF ACUTE KIDNEY INJURY AFTER NEPHRECTOMY IN PATIENTS WITH RENAL CELL CARCINOMA

Nephrology Dialysis Transplantation ◽

10.1093/ndt/gfab082.007 ◽

2021 ◽

Vol 36 (Supplement_1) ◽

Author(s):

Sejoong Kim ◽

Yeonhee Lee ◽

Seung Seok Han

Keyword(s):

Machine Learning ◽

Renal Cell Carcinoma ◽

Kidney Injury ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Learning Models ◽

Scoring Model ◽

Postoperative Aki ◽

Machine Learning Models ◽

Better Than

Abstract Background and Aims The precise prediction of acute kidney injury (AKI) after nephrectomy for renal cell carcinoma (RCC) is an important issue because of its relationship with subsequent kidney dysfunction and high mortality. Herein we addressed whether machine learning algorithms could predict postoperative AKI risk better than conventional logistic regression (LR) models. Method A total of 4,104 RCC patients who had undergone unilateral nephrectomy from January 2003 to December 2017 were reviewed. Machine learning models such as support vector machine, random forest, extreme gradient boosting, and light gradient boosting machine (LightGBM) were developed, and their performance based on the area under the receiver operating characteristic curve, accuracy, and F1 score was compared with that of the LR-based scoring model. Results Postoperative AKI developed in 1,167 patients (28.4%). All the machine learning models had higher performance index values than the LR-based scoring model. Among them, the LightGBM model had the highest value of 0.810 (0.783–0.837). The decision curve analysis demonstrated a greater net benefit of the machine learning models than the LR-based scoring model over all the ranges of threshold probabilities. The LightGBM and random forest models, but not others, were well calibrated. Conclusion The application of machine learning algorithms improves the predictability of AKI after nephrectomy for RCC, and these models perform better than conventional LR-based models.

Download Full-text

Machine Learning and Financial Literacy: An Exploration of Factors Influencing Financial Knowledge in Italy

Journal of Risk and Financial Management ◽

10.3390/jrfm14030120 ◽

2021 ◽

Vol 14 (3) ◽

pp. 120

Author(s):

Susanna Levantesi ◽

Giulia Zacchia

Keyword(s):

Machine Learning ◽

Financial Literacy ◽

Linear Models ◽

Financial Knowledge ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Learning Techniques ◽

Valuable Complement ◽

Standard Models ◽

Main Determinants

In recent years, machine learning techniques have assumed an increasingly central role in many areas of research, from computer science to medicine, including finance. In the current study, we applied it to financial literacy to test its accuracy, compared to a standard parametric model, in the estimation of the main determinants of financial knowledge. Using recent data on financial literacy and inclusion among Italian adults, we empirically tested how tree-based machine learning methods, such as decision trees, random, forest and gradient boosting techniques, can be a valuable complement to standard models (generalized linear models) for the identification of the groups in the population in most need of improving their financial knowledge.

Download Full-text

Predicting patient outcomes in psychiatric hospitals with routine data: a machine learning approach

10.21203/rs.2.15371/v1 ◽

2019 ◽

Author(s):

Jan Wolff ◽

Alexander Gary ◽

Daniela Jung ◽

Claus Normann ◽

Klaus Kaier ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Psychiatric Hospital ◽

Hospital Care ◽

Routine Data ◽

Psychiatric Hospitals ◽

Predictive Performance ◽

Gradient Boosting ◽

Stochastic Gradient Boosting ◽

Better Than

Abstract Background A common problem in machine learning applications is availability of data at the point of decision making. The aim of the present study was to use routine data readily available at admission to predict aspects relevant to the organization of psychiatric hospital care. A further aim was to compare the results of machine learning with those obtained through traditional methods and a naive baseline classifier.Methods The study included consecutively discharged patients between 1st of January 2017 and 31st of December 2018 from nine psychiatric hospitals in Hesse, Germany. We compared the predictive performance achieved by stochastic gradient boosting (GBM) with multiple logistic regression and a naive baseline classifier. We tested the performance of our final models on unseen patients from another calendar year and from different hospitals.Results The study included 45,388 inpatient episodes. The models’ performance, as measured by the area under the Receiver Operating Characteristic curve, varied strongly between the predicted outcomes, with relatively high performance in the prediction of coercive treatment (area under the curve: 0.83) and 1:1 observations (0.80) and relatively poor performance in the prediction of short length of stay (0.69) and non-response to treatment (0.65). The GBM performed slightly better than logistic regression. Both approaches were substantially better than a naive prediction based solely on basic diagnostic grouping.Conclusion The present study has shown that administrative routine data can be used to predict aspects relevant to the organisation of psychiatric hospital care. Future research should investigate the predictive performance that is necessary to provide effective assistance in clinical practice for the benefit of both staff and patients.

Download Full-text

Assessment and Prediction of Maize Production Considering Climate Change by Extreme Learning Machine in Czechia

Agronomy ◽

10.3390/agronomy11112344 ◽

2021 ◽

Vol 11 (11) ◽

pp. 2344

Author(s):

Mansoor Maitah ◽

Karel Malec ◽

Ying Ge ◽

Zdeňka Gebeltová ◽

Luboš Smutka ◽

...

Keyword(s):

Machine Learning ◽

Water Deficit ◽

Extreme Learning Machine ◽

Growth Period ◽

Maize Yield ◽

Machine Learning Algorithms ◽

Weather Data ◽

Yield Prediction ◽

Maize Production ◽

Learning Machine

Machine learning algorithms have been applied in the agriculture field to forecast crop productivity. Previous studies mainly focused on the whole crop growth period while different time windows on yield prediction were still unknown. The entire growth period was separated into each month to assess their corresponding predictive ability by taking maize production (silage and grain) in Czechia. We present a thorough assessment of county-level maize yield prediction in Czechia using a machine learning algorithm (extreme learning machine (ELM)) and an extensive set of weather data and maize yields from 2002 to 2018. Results show that sunshine in June and water deficit in July were vastly influential factors for silage maize yield. The two primary climate parameters for grain maize yield are minimum temperature in September and water deficit in May. The average absolute relative deviation (AARD), root mean square error (RMSE), and coefficient (R2) of the proposed models are 6.565–32.148%, 1.006–1.071%, 0.641–0.716, respectively. Based on the results, silage yield will decrease by 1.367 t/ha (3.826% loss), and grain yield will increase by 0.337 t/ha (5.394% increase) when the max temperature in May increases by 2 °C. In conclusion, ELM models show a great potential application for predicting maize yield.

Download Full-text