scholarly journals Comparison of Machine Learning Techniques in Cotton Yield Prediction Using Satellite Remote Sensing

Author(s):  
Francielle Morelli-Ferreira ◽  
Nayane Jaqueline Costa Maia ◽  
Danilo Tedesco ◽  
Elizabeth Haruna Kazama ◽  
Franciele Morlin Carneiro ◽  
...  

The use of machine learning techniques to predict yield based on remote sensing is a no-return path and studies conducted on farm aim to help rural producers in decision-making. Thus, commercial fields equipped with technologies in Mato Grosso, Brazil, were monitored by satellite images to predict cotton yield using supervised learning techniques. The objective of this research was to identify how early in the growing season, which vegetation indices and which machine learning algorithms are best to predict cotton yield at the farm level. For that, we went through the following steps: 1) We observed the yield in 398 ha (3 fields) and eight vegetation indices (VI) were calculated on five dates during the growing season. 2) Scenarios were created to facilitate the analysis and interpretation of results: Scenario 1: All Data (8 indices on 5 dates = 40 inputs) and Scenario 2: best variable selected by Stepwise regression (1 input). 3) In the search for the best algorithm, hyperparameter adjustments, calibrations and tests using machine learning were performed to predict yield and performances were evaluated. Scenario 1 had the best metrics in all fields of study, and the Multilayer Perceptron (MLP) and Random Forest (RF) algorithms showed the best performances with adjusted R2 of 47% and RMSE of only 0.24 t ha-1, however, in this scenario all predictive inputs that were generated throughout the growing season (approx. 180 days) are needed, so we optimized the prediction and tested only the best VI in each field, and found that among the eight VIs, the Simple Ratio (SR), driven by the K-Nearest Neighbor (KNN) algorithm predicts with 0.26 and 0.28 t ha-1 of RMSE and 5.20% MAPE, anticipating the cotton yield with low error by ±143 days, and with important aspect of requiring less computational demand in the generation of the prediction when compared to MLP and RF, for example, enabling its use as a technique that helps predict cotton yield, resulting in time savings for planning, whether in marketing or in crop management strategies.

Author(s):  
Francielle Morelli-Ferreira ◽  
Nayane Maia ◽  
Danilo Tedesco ◽  
Elizabeth Kazama ◽  
Franciele Carneiro ◽  
...  

The use of machine learning techniques to predict yield based on remote sensing is a no-return path and studies conducted on farm aim to help rural producers in decision-making. Thus, commercial fields equipped with technologies in Mato Grosso, Brazil, were monitored by satellite images to predict cotton yield using supervised learning techniques. The objective of this research was to identify how early in the growing season, which vegetation indices and which machine learning algorithms are best to predict cotton yield at the farm level. For that, we went through the following steps: 1) We observed the yield in 398 ha (3 fields) and eight vegetation indices (VI) were calculated on five dates during the growing season. 2) Scenarios were created to facilitate the analysis and interpretation of results: Scenario 1: All Data (8 indices on 5 dates = 40 inputs) and Scenario 2: best variable selected by Stepwise regression (1 input). 3) In the search for the best algorithm, hyperparameter adjustments, calibrations and tests using machine learning were performed to predict yield and performances were evaluated. Scenario 1 had the best metrics in all fields of study, and the Multilayer Perceptron (MLP) and Random Forest (RF) algorithms showed the best performances with adjusted R2 of 47% and RMSE of only 0.24 t ha-1, however, in this scenario all predictive inputs that were generated throughout the growing season (approx. 180 days) are needed, so we optimized the prediction and tested only the best VI in each field, and found that among the eight VIs, the Simple Ratio (SR), driven by the K-Nearest Neighbor (KNN) algorithm predicts with 0.26 and 0.28 t ha-1 of RMSE and 5.20% MAPE, anticipating the cotton yield with low error by ±143 days, and with important aspect of requiring less computational demand in the generation of the prediction when compared to MLP and RF, for example, enabling its use as a technique that helps predict cotton yield, resulting in time savings for planning, whether in marketing or in crop management strategies.


2021 ◽  
Author(s):  
Nisha Agnihotri

<i>Bipolar disorder, a complex disorder in brain has affected many millions of people around the world. This brain disorder is identified by the occurrence of the oscillations of the patient’s changing mood. The mood swing between two states i.e. depression and mania. This is a result of different psychological and physical features. A set of psycholinguistic features like behavioral changes, mood swings and mental illness are observed to provide feedback on health and wellness. The study is an objective measure of identifying the stress level of human brain that could improve the harmful effects associated with it considerably. In the paper, we present the study prediction of symptoms and behavior of a commonly known mental health illness, bipolar disorder using Machine Learning Techniques. Therefore, we extracted data from articles and research papers were studied and analyzed by using statistical analysis tools and machine learning (ML) techniques. Data is visualized to extract and communicate meaningful information from complex datasets on predicting and optimizing various day to day analyses. The study also includes the various research papers having machine Learning algorithms and different classifiers like Decision Trees, Random Forest, Support Vector Machine, Naïve Bayes, Logistic Regression and K- Nearest Neighbor are studied and analyzed for identifying the mental state in a target group. The purpose of the paper is mainly to explore the challenges, adequacy and limitations in detecting the mental health condition using Machine Learning Techniques</i>


2021 ◽  
Author(s):  
Nisha Agnihotri

<i>Bipolar disorder, a complex disorder in brain has affected many millions of people around the world. This brain disorder is identified by the occurrence of the oscillations of the patient’s changing mood. The mood swing between two states i.e. depression and mania. This is a result of different psychological and physical features. A set of psycholinguistic features like behavioral changes, mood swings and mental illness are observed to provide feedback on health and wellness. The study is an objective measure of identifying the stress level of human brain that could improve the harmful effects associated with it considerably. In the paper, we present the study prediction of symptoms and behavior of a commonly known mental health illness, bipolar disorder using Machine Learning Techniques. Therefore, we extracted data from articles and research papers were studied and analyzed by using statistical analysis tools and machine learning (ML) techniques. Data is visualized to extract and communicate meaningful information from complex datasets on predicting and optimizing various day to day analyses. The study also includes the various research papers having machine Learning algorithms and different classifiers like Decision Trees, Random Forest, Support Vector Machine, Naïve Bayes, Logistic Regression and K- Nearest Neighbor are studied and analyzed for identifying the mental state in a target group. The purpose of the paper is mainly to explore the challenges, adequacy and limitations in detecting the mental health condition using Machine Learning Techniques</i>


2020 ◽  
Vol 12 (7) ◽  
pp. 1200 ◽  
Author(s):  
Sunmin Lee ◽  
Yunjung Hyun ◽  
Saro Lee ◽  
Moung-Jin Lee

Adequate groundwater development for the rural population is essential because groundwater is an important source of drinking water and agricultural water. In this study, ensemble models of decision tree-based machine learning algorithms were used with geographic information system (GIS) to map and test groundwater yield potential in Yangpyeong-gun, South Korea. Groundwater control factors derived from remote sensing data were used for mapping, including nine topographic factors, two hydrological factors, forest type, soil material, land use, and two geological factors. A total of 53 well locations with both specific capacity (SPC) data and transmissivity (T) data were selected and randomly divided into two classes for model training (70%) and testing (30%). First, the frequency ratio (FR) was calculated for SPC and T, and then the boosted classification tree (BCT) method of the machine learning model was applied. In addition, an ensemble model, FR-BCT, was applied to generate and compare groundwater potential maps. Model performance was evaluated using the receiver operating characteristic (ROC) method. To test the model, the area under the ROC curve was calculated; the curve for the predicted dataset of SPC showed values of 80.48% and 87.75% for the BCT and FR-BCT models, respectively. The accuracy rates from T were 72.27% and 81.49% for the BCT and FR-BCT models, respectively. Both the BCT and FR-BCT models measured the contributions of individual groundwater control factors, which showed that soil was the most influential factor. The machine learning techniques used in this study showed effective modeling of groundwater potential in areas where data are relatively scarce. The results of this study may be used for sustainable development of groundwater resources by identifying areas of high groundwater potential.


Information ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 528
Author(s):  
David Opeoluwa Oyewola ◽  
Emmanuel Gbenga Dada ◽  
Sanjay Misra ◽  
Robertas Damaševičius

The application of machine learning techniques to the epidemiology of COVID-19 is a necessary measure that can be exploited to curtail the further spread of this endemic. Conventional techniques used to determine the epidemiology of COVID-19 are slow and costly, and data are scarce. We investigate the effects of noise filters on the performance of machine learning algorithms on the COVID-19 epidemiology dataset. Noise filter algorithms are used to remove noise from the datasets utilized in this study. We applied nine machine learning techniques to classify the epidemiology of COVID-19, which are bagging, boosting, support vector machine, bidirectional long short-term memory, decision tree, naïve Bayes, k-nearest neighbor, random forest, and multinomial logistic regression. Data from patients who contracted coronavirus disease were collected from the Kaggle database between 23 January 2020 and 24 June 2020. Noisy and filtered data were used in our experiments. As a result of denoising, machine learning models have produced high results for the prediction of COVID-19 cases in South Korea. For isolated cases after performing noise filtering operations, machine learning techniques achieved an accuracy between 98–100%. The results indicate that filtering noise from the dataset can improve the accuracy of COVID-19 case prediction algorithms.


Author(s):  
Mehreen Ahmed ◽  
Rafia Mumtaz ◽  
Syed Mohammad

Abstract Water Quality Index (WQI) is a unique and effective rating technique for assessing the quality of water. Nevertheless, most of the indices are not applicable to all water types as these are dependent on core physico-chemical water parameters that can make them biased and sensitive towards specific attributes including: (i) time, location and frequency for data sampling; (ii) number, variety and weights allocation of parameters. Hence, there is a need to evaluate these indices to eliminate uncertainties that make them unpredictable which may lead to manipulation of the water quality classes. The present study calculated five WQIs for two temporal periods: (i) June to December 2019 obtained in real time (using Internet of Things (IoT) nodes) at inlet and outlet streams of Rawal Dam; (ii) 2012–2019 obtained from the Rawal Dam Water Filtration Plant, collected through GIS-based grab sampling. The computed WQIs categorized the collected datasets as ‘Very Poor’, primarily owing to the uneven distribution of the water samples that has led to class imbalance in the data. Additionally, this study investigates the classification of water quality using machine learning algorithms namely: Decision Tree (DT), K Nearest Neighbor (KNN), Logistic Regression (LogR), Multilayer Perceptron (MLP) and Naive Bayes (NB); based on the parameters including: pH, dissolved oxygen, conductivity, turbidity, fecal coliform and temperature. The classification results showed that DT algorithm has outperformed other models with a classification accuracy of 99%. Although WQI is a popular method used to assess the water quality, there is a need to address the uncertainties and biases introduced by the limitations of data acquisition (like specific location/area, type and number of parameters or water type) leading to class im- balance. This can be achieved by developing a more refined index that considers various other factors such as topographical and hydrological parameters with spatial temporal variations combined machine learning techniques to effectively contribute in estimation of water quality for all regions.


2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


Author(s):  
Anantvir Singh Romana

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.


2021 ◽  
pp. 1-17
Author(s):  
Ahmed Al-Tarawneh ◽  
Ja’afer Al-Saraireh

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.


Sign in / Sign up

Export Citation Format

Share Document