Potato Yield Prediction Using Machine Learning Techniques and Sentinel 2 Data

Traditional potato growth models evidence certain limitations, such as the cost of obtaining the input data required to run the models, the lack of spatial information in some instances, or the actual quality of input data. In order to address these issues, we develop a model to predict potato yield using satellite remote sensing. In an effort to offer a good predictive model that improves the state of the art on potato precision agriculture, we use images from the twin Sentinel 2 satellites (European Space Agency—Copernicus Programme) over three growing seasons, applying different machine learning models. First, we fitted nine machine learning algorithms with various pre-processing scenarios using variables from July, August and September based on the red, red-edge and infra-red bands of the spectrum. Second, we selected the best performing models and evaluated them against independent test data. Finally, we repeated the previous two steps using only variables corresponding to July and August. Our results showed that the feature selection step proved vital during data pre-processing in order to reduce multicollinearity among predictors. The Regression Quantile Lasso model (11.67% Root Mean Square Error, RMSE; R2 = 0.88 and 9.18% Mean Absolute Error, MAE) and Leap Backwards model (10.94% RMSE, R2 = 0.89 and 8.95% MAE) performed better when predictors with a correlation coefficient > 0.5 were removed from the dataset. In contrast, the Support Vector Machine Radial (svmRadial) performed better with no feature selection method (11.7% RMSE, R2 = 0.93 and 8.64% MAE). In addition, we used a random forest model to predict potato yields in Castilla y León (Spain) 1–2 months prior to harvest, and obtained satisfactory results (11.16% RMSE, R2 = 0.89 and 8.71% MAE). These results demonstrate the suitability of our models to predict potato yields in the region studied.

Download Full-text

Delineating Smallholder Maize Farms from Sentinel-1 Coupled with Sentinel-2 Data Using Machine Learning

Sustainability ◽

10.3390/su13094728 ◽

2021 ◽

Vol 13 (9) ◽

pp. 4728

Author(s):

Zinhle Mashaba-Munghemezulu ◽

George Johannes Chirima ◽

Cilence Munghemezulu

Keyword(s):

Machine Learning ◽

Food Security ◽

Rural Communities ◽

Machine Learning Algorithms ◽

Support Vector ◽

Subsistence Agriculture ◽

Smallholder Farms ◽

Main Driver ◽

Sentinel 2

Rural communities rely on smallholder maize farms for subsistence agriculture, the main driver of local economic activity and food security. However, their planted area estimates are unknown in most developing countries. This study explores the use of Sentinel-1 and Sentinel-2 data to map smallholder maize farms. The random forest (RF), support vector (SVM) machine learning algorithms and model stacking (ST) were applied. Results show that the classification of combined Sentinel-1 and Sentinel-2 data improved the RF, SVM and ST algorithms by 24.2%, 8.7%, and 9.1%, respectively, compared to the classification of Sentinel-1 data individually. Similarities in the estimated areas (7001.35 ± 1.2 ha for RF, 7926.03 ± 0.7 ha for SVM and 7099.59 ± 0.8 ha for ST) show that machine learning can estimate smallholder maize areas with high accuracies. The study concludes that the single-date Sentinel-1 data were insufficient to map smallholder maize farms. However, single-date Sentinel-1 combined with Sentinel-2 data were sufficient in mapping smallholder farms. These results can be used to support the generation and validation of national crop statistics, thus contributing to food security.

Download Full-text

Recognition Technology of Athlete’s Limb Movement Combined Based on the Integrated Learning Algorithm

Journal of Sensors ◽

10.1155/2021/3057557 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Fei Tan ◽

Xiaoqing Xie

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithm ◽

Human Motion ◽

Machine Learning Algorithms ◽

Support Vector ◽

Recording Device ◽

Table Tennis ◽

Movement Recognition ◽

Random Forest Tree

Human motion recognition based on inertial sensor is a new research direction in the field of pattern recognition. It carries out preprocessing, feature selection, and feature selection by placing inertial sensors on the surface of the human body. Finally, it mainly classifies and recognizes the extracted features of human action. There are many kinds of swing movements in table tennis. Accurately identifying these movement modes is of great significance for swing movement analysis. With the development of artificial intelligence technology, human movement recognition has made many breakthroughs in recent years, from machine learning to deep learning, from wearable sensors to visual sensors. However, there is not much work on movement recognition for table tennis, and the methods are still mainly integrated into the traditional field of machine learning. Therefore, this paper uses an acceleration sensor as a motion recording device for a table tennis disc and explores the three-axis acceleration data of four common swing motions. Traditional machine learning algorithms (decision tree, random forest tree, and support vector) are used to classify the swing motion, and a classification algorithm based on the idea of integration is designed. Experimental results show that the ensemble learning algorithm developed in this paper is better than the traditional machine learning algorithm, and the average recognition accuracy is 91%.

Download Full-text

Techniques for Detecting Malware Traffic: A Comprehensive Approach to Feature Selection and Classification

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39088 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1-10

Author(s):

Harsha A K

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Steady Increase ◽

Extreme Gradient Boosting

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.

Download Full-text

Evaluating Variable Selection and Machine Learning Algorithms for Estimating Forest Heights by Combining Lidar and Hyperspectral Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9090507 ◽

2020 ◽

Vol 9 (9) ◽

pp. 507

Author(s):

Sanjiwana Arjasakusuma ◽

Sandiaga Swahyu Kusuma ◽

Stuart Phinn

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithms ◽

Principal Component ◽

Hyperspectral Data ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Forest Height ◽

Extreme Gradient Boosting

Machine learning has been employed for various mapping and modeling tasks using input variables from different sources of remote sensing data. For feature selection involving high- spatial and spectral dimensionality data, various methods have been developed and incorporated into the machine learning framework to ensure an efficient and optimal computational process. This research aims to assess the accuracy of various feature selection and machine learning methods for estimating forest height using AISA (airborne imaging spectrometer for applications) hyperspectral bands (479 bands) and airborne light detection and ranging (lidar) height metrics (36 metrics), alone and combined. Feature selection and dimensionality reduction using Boruta (BO), principal component analysis (PCA), simulated annealing (SA), and genetic algorithm (GA) in combination with machine learning algorithms such as multivariate adaptive regression spline (MARS), extra trees (ET), support vector regression (SVR) with radial basis function, and extreme gradient boosting (XGB) with trees (XGbtree and XGBdart) and linear (XGBlin) classifiers were evaluated. The results demonstrated that the combinations of BO-XGBdart and BO-SVR delivered the best model performance for estimating tropical forest height by combining lidar and hyperspectral data, with R2 = 0.53 and RMSE = 1.7 m (18.4% of nRMSE and 0.046 m of bias) for BO-XGBdart and R2 = 0.51 and RMSE = 1.8 m (15.8% of nRMSE and −0.244 m of bias) for BO-SVR. Our study also demonstrated the effectiveness of BO for variables selection; it could reduce 95% of the data to select the 29 most important variables from the initial 516 variables from lidar metrics and hyperspectral data.

Download Full-text

A Machine Learning Approach for Mapping Forest Vegetation in Riparian Zones in an Atlantic Biome Environment Using Sentinel-2 Imagery

Remote Sensing ◽

10.3390/rs12244086 ◽

2020 ◽

Vol 12 (24) ◽

pp. 4086

Author(s):

Danielle Elis Garcia Furuya ◽

João Alex Floriano Aguiar ◽

Nayara V. Estrabis ◽

Mayara Maezano Faita Pinheiro ◽

Michelle Taís Garcia Furuya ◽

...

Keyword(s):

Machine Learning ◽

Environmental Planning ◽

Riparian Zone ◽

Learning Algorithms ◽

Vegetation Mapping ◽

Forest Vegetation ◽

Machine Learning Algorithms ◽

Riparian Zones ◽

Support Vector ◽

Sentinel 2

Riparian zones consist of important environmental regions, specifically to maintain the quality of water resources. Accurately mapping forest vegetation in riparian zones is an important issue, since it may provide information about numerous surface processes that occur in these areas. Recently, machine learning algorithms have gained attention as an innovative approach to extract information from remote sensing imagery, including to support the mapping task of vegetation areas. Nonetheless, studies related to machine learning application for forest vegetation mapping in the riparian zones exclusively is still limited. Therefore, this paper presents a framework for forest vegetation mapping in riparian zones based on machine learning models using orbital multispectral images. A total of 14 Sentinel-2 images registered throughout the year, covering a large riparian zone of a portion of a wide river in the Pontal do Paranapanema region, São Paulo state, Brazil, was adopted as the dataset. This area is mainly composed of the Atlantic Biome vegetation, and it is near to the last primary fragment of its biome, being an important region from the environmental planning point of view. We compared the performance of multiple machine learning algorithms like decision tree (DT), random forest (RF), support vector machine (SVM), and normal Bayes (NB). We evaluated different dates and locations with all models. Our results demonstrated that the DT learner has, overall, the highest accuracy in this task. The DT algorithm also showed high accuracy when applied on different dates and in the riparian zone of another river. We conclude that the proposed approach is appropriated to accurately map forest vegetation in riparian zones, including temporal context.

Download Full-text

Diagnostic Performance of 2D and 3D T2WI-Based Radiomics Features With Machine Learning Algorithms to Distinguish Solid Solitary Pulmonary Lesion

Frontiers in Oncology ◽

10.3389/fonc.2021.683587 ◽

2021 ◽

Vol 11 ◽

Author(s):

Qi Wan ◽

Jiaxuan Zhou ◽

Xiaoying Xia ◽

Jianfeng Hu ◽

Peng Wang ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Diagnostic Performance ◽

Feature Selection Method ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

Selection Methods ◽

Linear Discriminant ◽

2D And 3D

ObjectiveTo evaluate the performance of 2D and 3D radiomics features with different machine learning approaches to classify SPLs based on magnetic resonance(MR) T2 weighted imaging (T2WI).Material and MethodsA total of 132 patients with pathologically confirmed SPLs were examined and randomly divided into training (n = 92) and test datasets (n = 40). A total of 1692 3D and 1231 2D radiomics features per patient were extracted. Both radiomics features and clinical data were evaluated. A total of 1260 classification models, comprising 3 normalization methods, 2 dimension reduction algorithms, 3 feature selection methods, and 10 classifiers with 7 different feature numbers (confined to 3–9), were compared. The ten-fold cross-validation on the training dataset was applied to choose the candidate final model. The area under the receiver operating characteristic curve (AUC), precision-recall plot, and Matthews Correlation Coefficient were used to evaluate the performance of machine learning approaches.ResultsThe 3D features were significantly superior to 2D features, showing much more machine learning combinations with AUC greater than 0.7 in both validation and test groups (129 vs. 11). The feature selection method Analysis of Variance(ANOVA), Recursive Feature Elimination(RFE) and the classifier Logistic Regression(LR), Linear Discriminant Analysis(LDA), Support Vector Machine(SVM), Gaussian Process(GP) had relatively better performance. The best performance of 3D radiomics features in the test dataset (AUC = 0.824, AUC-PR = 0.927, MCC = 0.514) was higher than that of 2D features (AUC = 0.740, AUC-PR = 0.846, MCC = 0.404). The joint 3D and 2D features (AUC=0.813, AUC-PR = 0.926, MCC = 0.563) showed similar results as 3D features. Incorporating clinical features with 3D and 2D radiomics features slightly improved the AUC to 0.836 (AUC-PR = 0.918, MCC = 0.620) and 0.780 (AUC-PR = 0.900, MCC = 0.574), respectively.ConclusionsAfter algorithm optimization, 2D feature-based radiomics models yield favorable results in differentiating malignant and benign SPLs, but 3D features are still preferred because of the availability of more machine learning algorithmic combinations with better performance. Feature selection methods ANOVA and RFE, and classifier LR, LDA, SVM and GP are more likely to demonstrate better diagnostic performance for 3D features in the current study.

Download Full-text

A Hybrid Feature Selection Method for Improve the Accuracy of Medical Classification Process

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a9624.1111121 ◽

2021 ◽

Vol 11 (1) ◽

pp. 50-55

Author(s):

Maria Mohammad Yousef ◽

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Dimensionality Reduction ◽

Classification Accuracy ◽

Fitness Function ◽

Machine Learning Algorithms ◽

Feature Subset Selection ◽

High Dimensionality ◽

Support Vector ◽

Feature Subset

Generally, medical dataset classification has become one of the biggest problems in data mining research. Every database has a given number of features but it is observed that some of these features can be redundant and can be harmful as well as disrupt the process of classification and this problem is known as a high dimensionality problem. Dimensionality reduction in data preprocessing is critical for increasing the performance of machine learning algorithms. Besides the contribution of feature subset selection in dimensionality reduction gives a significant improvement in classification accuracy. In this paper, we proposed a new hybrid feature selection approach based on (GA assisted by KNN) to deal with issues of high dimensionality in biomedical data classification. The proposed method first applies the combination between GA and KNN for feature selection to find the optimal subset of features where the classification accuracy of the k-Nearest Neighbor (kNN) method is used as the fitness function for GA. After selecting the best-suggested subset of features, Support Vector Machine (SVM) are used as the classifiers. The proposed method experiments on five medical datasets of the UCI Machine Learning Repository. It is noted that the suggested technique performs admirably on these databases, achieving higher classification accuracy while using fewer features.

Download Full-text

Recognition of Maize Phenology in Sentinel Images with Machine Learning

Sensors ◽

10.3390/s22010094 ◽

2021 ◽

Vol 22 (1) ◽

pp. 94

Author(s):

Alvaro Murguia-Cozar ◽

Antonia Macedo-Cruz ◽

Demetrio Salvador Fernandez-Reynoso ◽

Jorge Arturo Salgado Transito

Keyword(s):

Machine Learning ◽

Satellite Image ◽

Spatial Association ◽

Machine Learning Algorithms ◽

Support Vector ◽

Classification Models ◽

Phenological Stages ◽

Area Index ◽

Stage Of Development ◽

Sentinel 2

The scarcity of water for agricultural use is a serious problem that has increased due to intense droughts, poor management, and deficiencies in the distribution and application of the resource. The monitoring of crops through satellite image processing and the application of machine learning algorithms are technological strategies with which developed countries tend to implement better public policies regarding the efficient use of water. The purpose of this research was to determine the main indicators and characteristics that allow us to discriminate the phenological stages of maize crops (Zea mays L.) in Sentinel 2 satellite images through supervised classification models. The training data were obtained by monitoring cultivated plots during an agricultural cycle. Indicators and characteristics were extracted from 41 Sentinel 2 images acquired during the monitoring dates. With these images, indicators of texture, vegetation, and colour were calculated to train three supervised classifiers: linear discriminant (LD), support vector machine (SVM), and k-nearest neighbours (kNN) models. It was found that 45 of the 86 characteristics extracted contributed to maximizing the accuracy by stage of development and the overall accuracy of the trained classification models. The characteristics of the Moran’s I local indicator of spatial association (LISA) improved the accuracy of the classifiers when applied to the L*a*b* colour model and to the near-infrared (NIR) band. The local binary pattern (LBP) increased the accuracy of the classification when applied to the red, green, blue (RGB) and NIR bands. The colour ratios, leaf area index (LAI), RGB colour model, L*a*b* colour space, LISA, and LBP extracted the most important intrinsic characteristics of maize crops with regard to classifying the phenological stages of the maize cultivation. The quadratic SVM model was the best classifier of maize crop phenology, with an overall accuracy of 82.3%.

Download Full-text

Estimation of Potato Yield Using Satellite Data at a Municipal Level: A Machine Learning Approach

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9060343 ◽

2020 ◽

Vol 9 (6) ◽

pp. 343 ◽

Cited By ~ 1

Author(s):

Pablo Salvador ◽

Diego Gómez ◽

Julia Sanz ◽

José Luis Casanova

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Mean Squared Error ◽

Irrigation System ◽

Meteorological Data ◽

Potato Yield ◽

Machine Learning Algorithms ◽

Support Vector ◽

Improved Model ◽

Municipal Level

Crop growth modeling and yield forecasting are essential to improve food security policies worldwide. To estimate potato (Solanum tubersum L.) yield over Mexico at a municipal level, we used meteorological data provided by the ERA5 (ECMWF Re-Analysis) dataset developed by the Copernicus Climate Change Service, satellite imagery from the TERRA platform, and field information. Five different machine learning algorithms were used to build the models: random forest (rf), support vector machine linear (svmL), support vector machine polynomial (svmP), support vector machine radial (svmR), and general linear model (glm). The optimized models were tested using independent data (2017 and 2018) not used in the training and optimization phase (2004–2016). In terms of percent root mean squared error (%RMSE), the best results were obtained by the rf algorithm in the winter cycle using variables from the first three months of the cycle (R2 = 0.757 and %RMSE = 18.9). For the summer cycle, the best performing model was the svmP which used the first five months of the cycle as variables (R2 = 0.858 and %RMSE = 14.9). Our results indicated that adding predictor variables of the last two months before the harvest did not significantly improved model performances. These results demonstrate that our models can predict potato yield by analyzing the yield of the previous year, the general conditions of NDVI, meteorology, and information related to the irrigation system at a municipal level.

Download Full-text

Migraine classification using somatosensory evoked potentials

Cephalalgia ◽

10.1177/0333102419839975 ◽

2019 ◽

Vol 39 (9) ◽

pp. 1143-1155 ◽

Cited By ~ 12

Author(s):

Bingzhao Zhu ◽

Gianluca Coppola ◽

Mahsa Shoaran

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Evoked Potentials ◽

Healthy Volunteers ◽

Somatosensory Evoked Potentials ◽

Evoked Potential ◽

Machine Learning Algorithms ◽

Somatosensory Evoked Potential ◽

Support Vector ◽

Extreme Gradient Boosting

Objective The automatic detection of migraine states using electrophysiological recordings may play a key role in migraine diagnosis and early treatment. Migraineurs are characterized by a deficit of habituation in cortical information processing, causing abnormal changes of somatosensory evoked potentials. Here, we propose a machine learning approach to utilize somatosensory evoked potential-based biomarkers for migraine classification in a noninvasive setting. Methods Forty-two migraine patients, including 29 interictal and 13 ictal, were recruited and compared with 15 healthy volunteers of similar age and gender distribution. The right median nerve somatosensory evoked potentials were collected from all subjects. State-of-the-art machine learning algorithms including random forest, extreme gradient-boosting trees, support vector machines, K-nearest neighbors, multilayer perceptron, linear discriminant analysis, and logistic regression were used for classification and were built upon somatosensory evoked potential features in time and frequency domains. A feature selection method was employed to assess the contribution of features and compare it with previous clinical findings, and to build an optimal feature set by removing redundant features. Results Using a set of relevant features and different machine learning models, accuracies ranging from 51.2% to 72.4% were achieved for the healthy volunteers-ictal-interictal classification task. Following model and feature selection, we successfully separated the three groups of subjects with an accuracy of 89.7% for the healthy volunteers-ictal, 88.7% for healthy volunteers-interictal, 80.2% for ictal-interictal, and 73.3% for healthy volunteers-ictal-interictal classification tasks, respectively. Conclusion Our proposed model suggests the potential use of somatosensory evoked potentials as a prominent and reliable signal in migraine classification. This non-invasive somatosensory evoked potential-based classification system offers the potential to reliably separate migraine patients in ictal and interictal states from healthy controls.

Download Full-text