Estimating Avocado Sales Using Machine Learning Algorithms and Weather Data

Persea americana, commonly known as avocado, is becoming increasingly important in global agriculture. There are dozens of avocado varieties, but more than 85% of the avocados harvested and sold in the world are of the Hass one. Furthermore, information on the market of agricultural products is valuable for decision-making; this has made researchers try to determine the behavior of the avocado market, based on data that might affect it one way or another. In this paper, a machine learning approach for estimating the number of units sold monthly and the total sales of Hass avocados in several cities in the United States, using weather data and historical sales records, is presented. For that purpose, four algorithms were evaluated: Linear Regression, Multilayer Perceptron, Support Vector Machine for Regression and Multivariate Regression Prediction Model. The last two showed the best accuracy, with a correlation coefficient of 0.995 and 0.996, and a Relative Absolute Error of 7.971 and 7.812, respectively. Using the Multivariate Regression Prediction Model, an application that allows avocado producers and sellers to plan sales through the estimation of the profits in dollars and the number of avocados that could be sold in the United States was created.

Download Full-text

Predicting Student’s Performance Using Machine Learning Algorithm

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1209 ◽

2021 ◽

pp. 53-58

Author(s):

Sheela Rani P ◽

Dhivya S ◽

Dharshini Priya M ◽

Dharmila Chowdary A

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Prediction Model ◽

Naive Bayes ◽

Learning Algorithm ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbors

Machine learning is a new analysis discipline that uses knowledge to boost learning, optimizing the training method and developing the atmosphere within which learning happens. There square measure 2 sorts of machine learning approaches like supervised and unsupervised approach that square measure accustomed extract the knowledge that helps the decision-makers in future to require correct intervention. This paper introduces an issue that influences students' tutorial performance prediction model that uses a supervised variety of machine learning algorithms like support vector machine , KNN(k-nearest neighbors), Naïve Bayes and supplying regression and logistic regression. The results supported by various algorithms are compared and it is shown that the support vector machine and Naïve Bayes performs well by achieving improved accuracy as compared to other algorithms. The final prediction model during this paper may have fairly high prediction accuracy .The objective is not just to predict future performance of students but also provide the best technique for finding the most impactful features that influence student’s while studying.

Download Full-text

Cereal yield forecasting combining satellite drought-based indices, regional climate and weather data using machine learning approaches in Morocco

10.5194/egusphere-egu21-14590 ◽

2021 ◽

Author(s):

El houssaine Bouras ◽

Lionel Jarlan ◽

Salah Er-Raki ◽

Riad Balaghi ◽

Abdelhakim Amazirh ◽

...

Keyword(s):

Machine Learning ◽

Regional Climate ◽

Model Development ◽

Machine Learning Algorithms ◽

Weather Data ◽

Drought Indices ◽

Support Vector ◽

Learning Approaches ◽

Climate Data ◽

Yield Forecasting

Cereals are the main crop in Morocco. Its production exhibits a high inter-annual due to uncertain rainfall and recurrent drought periods. Considering the importance of this resource to the country's economy, it is thus important for decision makers to have reliable forecasts of the annual cereal production in order to pre-empt importation needs. In this study, we assessed the joint use of satellite-based drought indices, weather (precipitation and temperature) and climate data (pseudo-oscillation indices including NAO and the leading modes of sea surface temperature -SST- in the mid-latitude and in the tropical area) to predict cereal yields at the level of the agricultural province using machine learning algorithms (Support Vector Machine -SVM-, Random forest -FR- and eXtreme Gradient Boost -XGBoost-) in addition to Multiple Linear Regression (MLR). Also, we evaluate the models for different lead times along the growing season from January (about 5 months before harvest) to March (2 months before harvest). The results show the combination of data from the different sources outperformed the use of a single dataset; the highest accuracy being obtained when the three data sources were all considered in the model development. In addition, the results show that the models can accurately predict yields in January (5 months before harvesting) with an R&#178; = 0.90 and RMSE about 3.4 Qt.ha-1. &#160;When comparing the model&#8217;s performance, XGBoost represents the best one for predicting yields. Also, considering specific models for each province separately improves the statistical metrics by approximately 10-50% depending on the province with regards to one global model applied to all the provinces. The results of this study pointed out that machine learning is a promising tool for cereal yield forecasting. Also, the proposed methodology can be extended to different crops and different regions for crop yield forecasting.

Download Full-text

Characteristics of Twitter Use by State Medicaid Programs in the United States: Machine Learning Approach

Journal of Medical Internet Research ◽

10.2196/18401 ◽

2020 ◽

Vol 22 (8) ◽

pp. e18401

Author(s):

Jane M Zhu ◽

Abeed Sarker ◽

Sarah Gollust ◽

Raina Merchant ◽

David Grande

Keyword(s):

Public Health ◽

United States ◽

Machine Learning ◽

Public Health Education ◽

The United States ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Care Organization ◽

The Public ◽

The Mean

Background Twitter is a potentially valuable tool for public health officials and state Medicaid programs in the United States, which provide public health insurance to 72 million Americans. Objective We aim to characterize how Medicaid agencies and managed care organization (MCO) health plans are using Twitter to communicate with the public. Methods Using Twitter’s public application programming interface, we collected 158,714 public posts (“tweets”) from active Twitter profiles of state Medicaid agencies and MCOs, spanning March 2014 through June 2019. Manual content analyses identified 5 broad categories of content, and these coded tweets were used to train supervised machine learning algorithms to classify all collected posts. Results We identified 15 state Medicaid agencies and 81 Medicaid MCOs on Twitter. The mean number of followers was 1784, the mean number of those followed was 542, and the mean number of posts was 2476. Approximately 39% of tweets came from just 10 accounts. Of all posts, 39.8% (63,168/158,714) were classified as general public health education and outreach; 23.5% (n=37,298) were about specific Medicaid policies, programs, services, or events; 18.4% (n=29,203) were organizational promotion of staff and activities; and 11.6% (n=18,411) contained general news and news links. Only 4.5% (n=7142) of posts were responses to specific questions, concerns, or complaints from the public. Conclusions Twitter has the potential to enhance community building, beneficiary engagement, and public health outreach, but appears to be underutilized by the Medicaid program.

Download Full-text

A Machine Learning Analysis of COVID-19 Mental Health Data

10.21203/rs.3.rs-1129807/v1 ◽

2022 ◽

Author(s):

Mostafa Rezapour ◽

Lucas Hansen

Keyword(s):

Mental Health ◽

United States ◽

Machine Learning ◽

Survey Data ◽

Social Research ◽

Multinomial Logistic Regression ◽

The United States ◽

Support Vector ◽

K Nearest Neighbors ◽

The Individual

Abstract In late December 2019, the novel coronavirus (Sars-Cov-2) and the resulting disease COVID-19 were first identified in Wuhan China. The disease slipped through containment measures, with the first known case in the United States being identified on January 20th, 2020. In this paper, we utilize survey data from the Inter-university Consortium for Political and Social Research and apply several statistical and machine learning models and techniques such as Decision Trees, Multinomial Logistic Regression, Naive Bayes, k-Nearest Neighbors, Support Vector Machines, Neural Networks, Random Forests, Gradient Tree Boosting, XGBoost, CatBoost, LightGBM, Synthetic Minority Oversampling, and Chi-Squared Test to analyze the impacts the COVID-19 pandemic has had on the mental health of frontline workers in the United States. Through the interpretation of the many models applied to the mental health survey data, we have concluded that the most important factor in predicting the mental health decline of a frontline worker is the healthcare role the individual is in (Nurse, Emergency Room Staff, Surgeon, etc.), followed by the amount of sleep the individual has had in the last week, the amount of COVID-19 related news an individual has consumed on average in a day, the age of the worker, and the usage of alcohol and cannabis.

Download Full-text

Prediction of Type 2 Diabetes Based on Machine Learning Algorithm

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18063317 ◽

2021 ◽

Vol 18 (6) ◽

pp. 3317

Author(s):

Henock M. Deberneh ◽

Intaek Kim

Keyword(s):

Machine Learning ◽

Type 2 Diabetes ◽

Prediction Model ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Recursive Feature Elimination ◽

Medical Institute ◽

Support Vector

Prediction of type 2 diabetes (T2D) occurrence allows a person at risk to take actions that can prevent onset or delay the progression of the disease. In this study, we developed a machine learning (ML) model to predict T2D occurrence in the following year (Y + 1) using variables in the current year (Y). The dataset for this study was collected at a private medical institute as electronic health records from 2013 to 2018. To construct the prediction model, key features were first selected using ANOVA tests, chi-squared tests, and recursive feature elimination methods. The resultant features were fasting plasma glucose (FPG), HbA1c, triglycerides, BMI, gamma-GTP, age, uric acid, sex, smoking, drinking, physical activity, and family history. We then employed logistic regression, random forest, support vector machine, XGBoost, and ensemble machine learning algorithms based on these variables to predict the outcome as normal (non-diabetic), prediabetes, or diabetes. Based on the experimental results, the performance of the prediction model proved to be reasonably good at forecasting the occurrence of T2D in the Korean population. The model can provide clinicians and patients with valuable predictive information on the likelihood of developing T2D. The cross-validation (CV) results showed that the ensemble models had a superior performance to that of the single models. The CV performance of the prediction models was improved by incorporating more medical history from the dataset.

Download Full-text

How to Guarantee Food Safety via Grain Storage? An Approach to Improve Management Effectiveness by Machine Learning Algorithms

Journal of Biomedical Research & Environmental Sciences ◽

10.37871/jbres1296 ◽

2021 ◽

Vol 2 (8) ◽

pp. 675-684

Author(s):

Jin Wang ◽

Youjun Jiang ◽

Li Li ◽

Chao Yang ◽

Ke Li ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Prediction Model ◽

Bp Neural Network ◽

Machine Learning Algorithms ◽

Support Vector ◽

Grain Storage ◽

Management Effectiveness

The purpose of grain storage management is to dynamically analyze the quality change of the reserved grains, adopt scientific and effective management methods to delay the speed of the quality deterioration, and reduce the loss rate during storage. At present, the supervision of the grain quality in the reserve mainly depends on the periodic measurements of the quality of the grains and the milled products. The data obtained by the above approach is accurate and reliable, but the workload is too large while the frequency is high. The obtained conclusions are also limited to the studied area and not applicable to be extended into other scenarios. Therefore, there is an urgent need of a general method that can quickly predict the quality of grains given different species, regions and storage periods based on historical data. In this study, we introduced Back-Propagation (BP) neural network algorithm and support vector machine algorithm into the quality prediction of the reserved grains. We used quality index, temperature and humidity data to build both an intertemporal prediction model and a synchronous prediction model. The results show that the BP neural network based on the storage characters from the first three periods can accurately predict the key storage characters intertemporally. The support vector machine can provide precise predictions of the key storage characters synchronously. The average predictive error for each of wheat, rice and corn is less than 15%, while the one for soybean is about 20%, all of which can meet the practical demands. In conclusion, the machine learning algorithms are helpful to improve the management effectiveness of grain storage.

Download Full-text

Predicting speech discrimination scores from pure-tone thresholds—A machine learning-based approach using data from 12,697 subjects

PLoS ONE ◽

10.1371/journal.pone.0261433 ◽

2021 ◽

Vol 16 (12) ◽

pp. e0261433

Author(s):

Hantai Kim ◽

JaeYeon Park ◽

Yun-Hoon Choung ◽

Jeong Hun Jang ◽

JeongGil Ko

Keyword(s):

Machine Learning ◽

Hearing Loss ◽

Random Forest ◽

Prediction Model ◽

Pure Tone ◽

Large Scale ◽

Pure Tone Audiometry ◽

Machine Learning Algorithms ◽

Support Vector ◽

Speech Discrimination

Diagnostic tests for hearing impairment not only determines the presence (or absence) of hearing loss, but also evaluates its degree and type, and provides physicians with essential data for future treatment and rehabilitation. Therefore, accurately measuring hearing loss conditions is very important for proper patient understanding and treatment. In current-day practice, to quantify the level of hearing loss, physicians exploit specialized test scores such as the pure-tone audiometry (PTA) thresholds and speech discrimination scores (SDS) as quantitative metrics in examining a patient’s auditory function. However, given that these metrics can be easily affected by various human factors, which includes intentional (or accidental) patient intervention, there are needs to cross validate the accuracy of each metric. By understanding a “normal” relationship between the SDS and PTA, physicians can reveal the need for re-testing, additional testing in different dimensions, and also potential malingering cases. For this purpose, in this work, we propose a prediction model for estimating the SDS of a patient by using PTA thresholds via a Random Forest-based machine learning approach to overcome the limitations of the conventional statistical (or even manual) methods. For designing and evaluating the Random Forest-based prediction model, we collected a large-scale dataset from 12,697 subjects, and report a SDS level prediction accuracy of 95.05% and 96.64% for the left and right ears, respectively. We also present comparisons with other widely-used machine learning algorithms (e.g., Support Vector Machine, Multi-layer Perceptron) to show the effectiveness of our proposed Random Forest-based approach. Results obtained from this study provides implications and potential feasibility in providing a practically-applicable screening tool for identifying patient-intended malingering in hearing loss-related tests.

Download Full-text

Malaria Prediction Model Using Machine Learning Algorithms

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i10.5655 ◽

2021 ◽

Vol 12 (10) ◽

pp. 7488-7496

Author(s):

Yusuf Aliyu Adamu, Et. al.

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Model ◽

Public Awareness ◽

Health Sector ◽

Weather Condition ◽

Machine Learning Algorithms ◽

Support Vector ◽

African Countries ◽

Data Set

Measures have been taking to ensure the safety of individuals from the burden of vector-borne disease but it remains the causative agent of death than any other diseases in Africa. Many human lives are lost particularly of children below five years regardless of the efforts made. The effect of malaria is much more challenging mostly in developing countries. In 2019, 51% of malaria fatality happen in Africa which it increased by 20% in 2020 due to the covid-19 pandemic. The majority of African countries lack a proper or a sound health care system, proper environmental settlement, economic hardship, limited funding in the health sector, and absence of good policies to ensure the safety of individuals. Information has to become available to the peoples on the effect of malaria by making public awareness program to make sure people become acquainted with the disease so that certain measure can be maintained. The prediction model can help the policymakers to know more about the expected time of the malaria occurrence based on the existing features so that people will get to know the information regarding the disease on time, health equipment and medication to be made available by government through it policy. In this research weather condition, non-climatic features, and malaria cases are considered in designing the model for prediction purposes and also the performance of six different machine learning classifiers for instance Support Vector Machine, K-Nearest Neighbour, Random Forest, Decision Tree, Logistic Regression, and Naïve Bayes is identified and found that Random Forest is the best with accuracy (97.72%), AUC (98%) AUC, and (100%) precision based on the data set used in the analysis.

Download Full-text

ESTIMATING CORN YIELD IN THE UNITED STATES WITH MODIS EVI AND MACHINE LEARNING METHODS

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-iii-8-131-2016 ◽

2016 ◽

Vol III-8 ◽

pp. 131-136 ◽

Cited By ~ 8

Author(s):

K. Kuwata ◽

R. Shibasaki

Keyword(s):

Neural Network ◽

United States ◽

Machine Learning ◽

Crop Yield ◽

The United States ◽

Machine Learning Algorithms ◽

Corn Yield ◽

County Level ◽

Entire Area ◽

Modis Evi

Satellite remote sensing is commonly used to monitor crop yield in wide areas. Because many parameters are necessary for crop yield estimation, modelling the relationships between parameters and crop yield is generally complicated. Several methodologies using machine learning have been proposed to solve this issue, but the accuracy of county-level estimation remains to be improved. In addition, estimating county-level crop yield across an entire country has not yet been achieved. In this study, we applied a deep neural network (DNN) to estimate corn yield. We evaluated the estimation accuracy of the DNN model by comparing it with other models trained by different machine learning algorithms. We also prepared two time-series datasets differing in duration and confirmed the feature extraction performance of models by inputting each dataset. As a result, the DNN estimated county-level corn yield for the entire area of the United States with a determination coefficient (R2) of 0.780 and a root mean square error (RMSE) of 18.2 bushels/acre. In addition, our results showed that estimation models that were trained by a neural network extracted features from the input data better than an existing machine learning algorithm.

Download Full-text

Collusive Algorithms as Mere Tools, Super-tools or Legal Persons

Journal of Competition Law & Economics ◽

10.1093/joclec/nhz010 ◽

2019 ◽

Author(s):

Guan Zheng ◽

Hong Wu

Keyword(s):

United States ◽

Machine Learning ◽

Learning Algorithms ◽

The United States ◽

Machine Learning Algorithms ◽

Tacit Collusion ◽

Antitrust Law ◽

Strong Argument ◽

Market Pricing ◽

Distinct Features

Abstract The widespread use of algorithmic technologies makes rules on tacit collusion, which are already controversial in antitrust law, more complicated. These rules have obvious limitations in effectively regulating algorithmic collusion. Although some scholars and practitioners within antitrust circles in the United States, Europe and beyond have taken notice of this problem, they have failed to a large extent to make clear its specific manifestations, root causes, and effective legal solutions. In this article, the authors make a strong argument that it is no longer appropriate to regard algorithms as mere tools of firms, and that the distinct features of machine learning algorithms as super-tools and as legal persons may inevitably bring about two new cracks in antitrust law. This article clarifies the root causes why these rules are inapplicable to a large extent to algorithmic collusion particularly in the case of machine learning algorithms, classifies the new legal cracks, and provides sound legal criteria for the courts and competition authorities to assess the legality of algorithmic collusion much more accurately. More importantly, this article proposes an efficacious solution to revive the market pricing mechanism for the purposes of resolving the two new cracks identified in antitrust law.

Download Full-text