scholarly journals ESTIMATING CORN YIELD IN THE UNITED STATES WITH MODIS EVI AND MACHINE LEARNING METHODS

Author(s):  
K. Kuwata ◽  
R. Shibasaki

Satellite remote sensing is commonly used to monitor crop yield in wide areas. Because many parameters are necessary for crop yield estimation, modelling the relationships between parameters and crop yield is generally complicated. Several methodologies using machine learning have been proposed to solve this issue, but the accuracy of county-level estimation remains to be improved. In addition, estimating county-level crop yield across an entire country has not yet been achieved. In this study, we applied a deep neural network (DNN) to estimate corn yield. We evaluated the estimation accuracy of the DNN model by comparing it with other models trained by different machine learning algorithms. We also prepared two time-series datasets differing in duration and confirmed the feature extraction performance of models by inputting each dataset. As a result, the DNN estimated county-level corn yield for the entire area of the United States with a determination coefficient (<i>R</i><sup>2</sup>) of 0.780 and a root mean square error (<i>RMSE</i>) of 18.2 bushels/acre. In addition, our results showed that estimation models that were trained by a neural network extracted features from the input data better than an existing machine learning algorithm.

Author(s):  
K. Kuwata ◽  
R. Shibasaki

Satellite remote sensing is commonly used to monitor crop yield in wide areas. Because many parameters are necessary for crop yield estimation, modelling the relationships between parameters and crop yield is generally complicated. Several methodologies using machine learning have been proposed to solve this issue, but the accuracy of county-level estimation remains to be improved. In addition, estimating county-level crop yield across an entire country has not yet been achieved. In this study, we applied a deep neural network (DNN) to estimate corn yield. We evaluated the estimation accuracy of the DNN model by comparing it with other models trained by different machine learning algorithms. We also prepared two time-series datasets differing in duration and confirmed the feature extraction performance of models by inputting each dataset. As a result, the DNN estimated county-level corn yield for the entire area of the United States with a determination coefficient (&lt;i&gt;R&lt;/i&gt;&lt;sup&gt;2&lt;/sup&gt;) of 0.780 and a root mean square error (&lt;i&gt;RMSE&lt;/i&gt;) of 18.2 bushels/acre. In addition, our results showed that estimation models that were trained by a neural network extracted features from the input data better than an existing machine learning algorithm.


10.2196/18401 ◽  
2020 ◽  
Vol 22 (8) ◽  
pp. e18401
Author(s):  
Jane M Zhu ◽  
Abeed Sarker ◽  
Sarah Gollust ◽  
Raina Merchant ◽  
David Grande

Background Twitter is a potentially valuable tool for public health officials and state Medicaid programs in the United States, which provide public health insurance to 72 million Americans. Objective We aim to characterize how Medicaid agencies and managed care organization (MCO) health plans are using Twitter to communicate with the public. Methods Using Twitter’s public application programming interface, we collected 158,714 public posts (“tweets”) from active Twitter profiles of state Medicaid agencies and MCOs, spanning March 2014 through June 2019. Manual content analyses identified 5 broad categories of content, and these coded tweets were used to train supervised machine learning algorithms to classify all collected posts. Results We identified 15 state Medicaid agencies and 81 Medicaid MCOs on Twitter. The mean number of followers was 1784, the mean number of those followed was 542, and the mean number of posts was 2476. Approximately 39% of tweets came from just 10 accounts. Of all posts, 39.8% (63,168/158,714) were classified as general public health education and outreach; 23.5% (n=37,298) were about specific Medicaid policies, programs, services, or events; 18.4% (n=29,203) were organizational promotion of staff and activities; and 11.6% (n=18,411) contained general news and news links. Only 4.5% (n=7142) of posts were responses to specific questions, concerns, or complaints from the public. Conclusions Twitter has the potential to enhance community building, beneficiary engagement, and public health outreach, but appears to be underutilized by the Medicaid program.


2021 ◽  
Author(s):  
Kunal Menda ◽  
Lucas Laird ◽  
Mykel J. Kochenderfer ◽  
Rajmonda S. Caceres

AbstractCOVID-19 epidemics have varied dramatically in nature across the United States, where some counties have clear peaks in infections, and others have had a multitude of unpredictable and non-distinct peaks. In this work, we seek to explain the diversity in epidemic progressions by considering an extension to the compartmental SEIRD model. The model we propose uses a neural network to predict the infection rate as a function of time and of the prevalence of the disease. We provide a methodology for fitting this model to available county-level data describing aggregate cases and deaths. Our method uses Expectation-Maximization in order to overcome the challenge of partial observability—that the system’s state is only partially reflected in available data. We fit a single model to data from multiple counties in the United States exhibiting different behavior. By simulating the model, we show that it is capable of exhibiting both single peak and multi-peak behavior, reproducing behavior observed in counties both in and out of the training set. We also numerically compare the error of simulations from our model with a standard SEIRD model, showing that the proposed extensions are necessary to be able to explain the spread of COVID-19.


Author(s):  
Abolfazl Mollalo ◽  
Kiara M. Rivera ◽  
Behzad Vahedi

Prediction of the COVID-19 incidence rate is a matter of global importance, particularly in the United States. As of 4 June 2020, more than 1.8 million confirmed cases and over 108 thousand deaths have been reported in this country. Few studies have examined nationwide modeling of COVID-19 incidence in the United States particularly using machine-learning algorithms. Thus, we collected and prepared a database of 57 candidate explanatory variables to examine the performance of multilayer perceptron (MLP) neural network in predicting the cumulative COVID-19 incidence rates across the continental United States. Our results indicated that a single-hidden-layer MLP could explain almost 65% of the correlation with ground truth for the holdout samples. Sensitivity analysis conducted on this model showed that the age-adjusted mortality rates of ischemic heart disease, pancreatic cancer, and leukemia, together with two socioeconomic and environmental factors (median household income and total precipitation), are among the most substantial factors for predicting COVID-19 incidence rates. Moreover, results of the logistic regression model indicated that these variables could explain the presence/absence of the hotspots of disease incidence that were identified by Getis-Ord Gi* (p < 0.05) in a geographic information system environment. The findings may provide useful insights for public health decision makers regarding the influence of potential risk factors associated with the COVID-19 incidence at the county level.


2021 ◽  
Vol 8 (Supplement_1) ◽  
pp. S759-S759
Author(s):  
Stephanie Kujawski ◽  
Boshu Ru ◽  
Amar K Das ◽  
Nelson L Afanador ◽  
richard baumgartner ◽  
...  

Abstract Background Although measles is still rare in the United States (U.S.), there have been recent resurgent outbreaks in the U.S. To improve the accuracy of prediction given the rarity of measles events, we used machine learning (ML) algorithms to model measles case predictions at the U.S. county level. Methods The main outcome was occurrence of ≥1 measles case at the U.S. county level. Two ML prediction models were developed (HDBSCAN, a clustering algorithm, and XGBoost, a gradient boosting algorithm) and compared with traditional logistic regression. We included 28 predictors in the following categories: sociodemographics, population statistics, measles vaccination coverage, healthcare access, and exposure to measles via international air travel. The models were trained on 2014 case data and validated on 2018 case data. Models were compared using area under the receiver operating curve (AUC), sensitivity, specificity, positive predictive value (PPV), and F2 score (combined measure of sensitivity and PPV). Results There were 667 measles cases in 2014 and 375 in 2018 in the U.S. We identified U.S. counties for 635 (95.2%) cases in 2014 and 366 (97.6%) cases in 2018 through published sources, corresponding to 81/3143 (2.6%) counties in 2014 and 64/3143 (2.0%) counties in 2018 with ≥1 measles case. HDBSCAN had the highest sensitivity (0.92), but lowest AUC (0.68) and PPV (0.04) (Table). XGBoost had the highest F2 score (0.49), best balance of sensitivity (0.72) and specificity (0.94), and AUC = 0.92. Logistic regression had high AUC (0.91) and specificity (1.00) but the lowest sensitivity (0.16). Conclusion Machine learning approaches outperformed logistic regression by maximizing sensitivity to predict counties with measles cases, an important criterion to consider to prevent or prepare for future outbreaks. XGBoost or logistic regression could be considered to maximize specificity. Prioritizing sensitivity versus specificity may depend on county resources, priorities, and measles risk. Different modeling approaches could be considered to optimize surveillance efforts and develop effective interventions for timely response. Disclosures Stephanie Kujawski, PhD MPH, Merck & Co., Inc. (Employee, Shareholder) Boshu Ru, Ph.D., Merck & Co. Kenilworth, NJ (NYSE: MRK) (Employee, Shareholder) Amar K. Das, MD, PhD, Merck (Employee) richard baumgartner, PhD, Merck (Employee) Shuang Lu, MBA, MS, Merck (Employee) Matthew Pillsbury, PhD, Merck & CO. (Employee, Shareholder) Joseph Lewnard, PhD, Merck (Consultant, Grant/Research Support) James H. Conway, MD, FAAP, GSK (Advisor or Review Panel member)Merck (Advisor or Review Panel member)Moderna (Advisor or Review Panel member)Pfizer (Advisor or Review Panel member)Sanofi Pasteur (Research Grant or Support) Manjiri D. Pawaskar, PhD, Merck & Co., Inc. (Employee, Shareholder)


Author(s):  
Guan Zheng ◽  
Hong Wu

Abstract The widespread use of algorithmic technologies makes rules on tacit collusion, which are already controversial in antitrust law, more complicated. These rules have obvious limitations in effectively regulating algorithmic collusion. Although some scholars and practitioners within antitrust circles in the United States, Europe and beyond have taken notice of this problem, they have failed to a large extent to make clear its specific manifestations, root causes, and effective legal solutions. In this article, the authors make a strong argument that it is no longer appropriate to regard algorithms as mere tools of firms, and that the distinct features of machine learning algorithms as super-tools and as legal persons may inevitably bring about two new cracks in antitrust law. This article clarifies the root causes why these rules are inapplicable to a large extent to algorithmic collusion particularly in the case of machine learning algorithms, classifies the new legal cracks, and provides sound legal criteria for the courts and competition authorities to assess the legality of algorithmic collusion much more accurately. More importantly, this article proposes an efficacious solution to revive the market pricing mechanism for the purposes of resolving the two new cracks identified in antitrust law.


2019 ◽  
Author(s):  
Sing-Chun Wang ◽  
Yuxuan Wang

Abstract. Occurrences of devastating wildfires have been on the rise in the United States for the past decades. While the environmental controls, including weather, climate, and fuels, are known to play important roles in controlling wildfires, the interrelationships between fires and the environmental controls are highly complex and may not be well represented by traditional parametric regressions. Here we develop a model integrating multiple machine learning algorithms to predict gridded monthly wildfire burned area during 2002–2015 over the South Central United States and identify the relative importance of the environmental drivers on the burned area for both the winter-spring and summer fire seasons of that region. The developed model is able to alleviate the issue of unevenly-distributed burned area data and achieve a cross-validation (CV) R2 value of 0.42 and 0.40 for the two fire seasons. For the total burned area over the study domain, the model can explain 50 % and 79 % of interannual total burned area for the winter-spring and summer fire season, respectively. The prediction model ranks relative humidity (RH) anomalies and preceding months’ drought severity as the top two most important predictors on the gridded burned area for both fire seasons. Sensitivity experiments with the model show that the effect of climate change represented by a group of climate-anomaly variables contributes the most to the burned area for both fire seasons. Antecedent fuel amount and conditions are found to outweigh weather effects for the burned area in the winter-spring fire season, while the current-month fire weather is more important for the summer fire season likely due to the controlling effect of weather on fuel moisture in this season. This developed model allows us to predict gridded burned area and to access specific fire management strategies for different fire mechanisms in the two seasons.


2020 ◽  
Author(s):  
Jane M Zhu ◽  
Abeed Sarker ◽  
Sarah Gollust ◽  
Raina Merchant ◽  
David Grande

BACKGROUND Twitter is a potentially valuable tool for public health officials and state Medicaid programs in the United States, which provide public health insurance to 72 million Americans. OBJECTIVE We aim to characterize how Medicaid agencies and managed care organization (MCO) health plans are using Twitter to communicate with the public. METHODS Using Twitter’s public application programming interface, we collected 158,714 public posts (“tweets”) from active Twitter profiles of state Medicaid agencies and MCOs, spanning March 2014 through June 2019. Manual content analyses identified 5 broad categories of content, and these coded tweets were used to train supervised machine learning algorithms to classify all collected posts. RESULTS We identified 15 state Medicaid agencies and 81 Medicaid MCOs on Twitter. The mean number of followers was 1784, the mean number of those followed was 542, and the mean number of posts was 2476. Approximately 39% of tweets came from just 10 accounts. Of all posts, 39.8% (63,168/158,714) were classified as general public health education and outreach; 23.5% (n=37,298) were about specific Medicaid policies, programs, services, or events; 18.4% (n=29,203) were organizational promotion of staff and activities; and 11.6% (n=18,411) contained general news and news links. Only 4.5% (n=7142) of posts were responses to specific questions, concerns, or complaints from the public. CONCLUSIONS Twitter has the potential to enhance community building, beneficiary engagement, and public health outreach, but appears to be underutilized by the Medicaid program.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kunal Menda ◽  
Lucas Laird ◽  
Mykel J. Kochenderfer ◽  
Rajmonda S. Caceres

AbstractCOVID-19 epidemics have varied dramatically in nature across the United States, where some counties have clear peaks in infections, and others have had a multitude of unpredictable and non-distinct peaks. Our lack of understanding of how the pandemic has evolved leads to increasing errors in our ability to predict the spread of the disease. This work seeks to explain this diversity in epidemic progressions by considering an extension to the compartmental SEIRD model. The model we propose uses a neural network to predict the infection rate as a function of both time and the disease’s prevalence. We provide a methodology for fitting this model to available county-level data describing aggregate cases and deaths. Our method uses Expectation-Maximization to overcome the challenge of partial observability, due to the fact that the system’s state is only partially reflected in available data. We fit a single model to data from multiple counties in the United States exhibiting different behavior. By simulating the model, we show that it can exhibit both single peak and multi-peak behavior, reproducing behavior observed in counties both in and out of the training set. We then compare the error of simulations from our model with a standard SEIRD model, and show that ours substantially reduces errors. We also use simulated data to compare our methodology for handling partial observability with a standard approach, showing that ours is significantly better at estimating the values of unobserved quantities.


Sign in / Sign up

Export Citation Format

Share Document