Cereal yield forecasting combining satellite drought-based indices, regional climate and weather data using machine learning approaches in Morocco

Cereals are the main crop in Morocco. Its production exhibits a high inter-annual due to uncertain rainfall and recurrent drought periods. Considering the importance of this resource to the country's economy, it is thus important for decision makers to have reliable forecasts of the annual cereal production in order to pre-empt importation needs. In this study, we assessed the joint use of satellite-based drought indices, weather (precipitation and temperature) and climate data (pseudo-oscillation indices including NAO and the leading modes of sea surface temperature -SST- in the mid-latitude and in the tropical area) to predict cereal yields at the level of the agricultural province using machine learning algorithms (Support Vector Machine -SVM-, Random forest -FR- and eXtreme Gradient Boost -XGBoost-) in addition to Multiple Linear Regression (MLR). Also, we evaluate the models for different lead times along the growing season from January (about 5 months before harvest) to March (2 months before harvest). The results show the combination of data from the different sources outperformed the use of a single dataset; the highest accuracy being obtained when the three data sources were all considered in the model development. In addition, the results show that the models can accurately predict yields in January (5 months before harvesting) with an R&#178; = 0.90 and RMSE about 3.4 Qt.ha-1. &#160;When comparing the model&#8217;s performance, XGBoost represents the best one for predicting yields. Also, considering specific models for each province separately improves the statistical metrics by approximately 10-50% depending on the province with regards to one global model applied to all the provinces. The results of this study pointed out that machine learning is a promising tool for cereal yield forecasting. Also, the proposed methodology can be extended to different crops and different regions for crop yield forecasting.

Download Full-text

Cereal Yield Forecasting with Satellite Drought-Based Indices, Weather Data and Regional Climate Indices Using Machine Learning in Morocco

Remote Sensing ◽

10.3390/rs13163101 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3101

Author(s):

El houssaine Bouras ◽

Lionel Jarlan ◽

Salah Er-Raki ◽

Riad Balaghi ◽

Abdelhakim Amazirh ◽

...

Keyword(s):

Machine Learning ◽

Regional Climate ◽

Machine Learning Algorithms ◽

Weather Data ◽

Drought Indices ◽

Support Vector ◽

Climate Indices ◽

Combining Data ◽

Yield Forecasting ◽

Cereal Yields

Accurate seasonal forecasting of cereal yields is an important decision support tool for countries, such as Morocco, that are not self-sufficient in order to predict, as early as possible, importation needs. This study aims to develop an early forecasting model of cereal yields (soft wheat, barley and durum wheat) at the scale of the agricultural province considering the 15 most productive over 2000–2017 (i.e., 15 × 18 = 270 yields values). To this objective, we built on previous works that showed a tight linkage between cereal yields and various datasets including weather data (rainfall and air temperature), regional climate indices (North Atlantic Oscillation in particular), and drought indices derived from satellite observations in different wavelengths. The combination of the latter three data sets is assessed to predict cereal yields using linear (Multiple Linear Regression, MLR) and non-linear (Support Vector Machine, SVM; Random Forest, RF, and eXtreme Gradient Boost, XGBoost) machine learning algorithms. The calibration of the algorithmic parameters of the different approaches are carried out using a 5-fold cross validation technique and a leave-one-out method is implemented for model validation. The statistical metrics of the models are first analyzed as a function of the input datasets that are used, and as a function of the lead times, from 4 months to 2 months before harvest. The results show that combining data from multiple sources outperformed models based on one dataset only. In addition, the satellite drought indices are a major source of information for cereal prediction when the forecasting is carried out close to harvest (2 months before), while weather data and, to a lesser extent, climate indices, are key variables for earlier predictions. The best models can accurately predict yield in January (4 months before harvest) with an R2 = 0.88 and RMSE around 0.22 t. ha−1. The XGBoost method exhibited the best metrics. Finally, training a specific model separately for each group of provinces, instead of one global model, improved the prediction performance by reducing the RMSE by 10% to 35% depending on the provinces. In conclusion, the results of this study pointed out that combining remote sensing drought indices with climate and weather variables using a machine learning technique is a promising approach for cereal yield forecasting.

Download Full-text

Predicting Student’s Performance Using Machine Learning Algorithm

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1209 ◽

2021 ◽

pp. 53-58

Author(s):

Sheela Rani P ◽

Dhivya S ◽

Dharshini Priya M ◽

Dharmila Chowdary A

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Prediction Model ◽

Naive Bayes ◽

Learning Algorithm ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbors

Machine learning is a new analysis discipline that uses knowledge to boost learning, optimizing the training method and developing the atmosphere within which learning happens. There square measure 2 sorts of machine learning approaches like supervised and unsupervised approach that square measure accustomed extract the knowledge that helps the decision-makers in future to require correct intervention. This paper introduces an issue that influences students' tutorial performance prediction model that uses a supervised variety of machine learning algorithms like support vector machine , KNN(k-nearest neighbors), Naïve Bayes and supplying regression and logistic regression. The results supported by various algorithms are compared and it is shown that the support vector machine and Naïve Bayes performs well by achieving improved accuracy as compared to other algorithms. The final prediction model during this paper may have fairly high prediction accuracy .The objective is not just to predict future performance of students but also provide the best technique for finding the most impactful features that influence student’s while studying.

Download Full-text

Combination of NIR spectroscopy and machine learning for monitoring chili sauce adulterated with ripened papaya

E3S Web of Conferences ◽

10.1051/e3sconf/202018704001 ◽

2020 ◽

Vol 187 ◽

pp. 04001

Author(s):

Ravipat Lapcharoensuk ◽

Kitticheat Danupattanin ◽

Chaowarin Kanjanapornprapa ◽

Tawin Inkawee

Keyword(s):

Machine Learning ◽

Food Industry ◽

Partial Least Squares Regression ◽

Nir Spectroscopy ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

Least Squares Regression ◽

Validation Set ◽

Global Food

This research aimed to study the combination of NIR spectroscopy and machine learning for monitoring chilli sauce adulterated with papaya smoothie. The chilli sauce was produced by the famous community enterprise of chilli sauce processing in Thailand. The ingredients of the chilli sauce consisted of 45% chilli, 25% sugar, 20% garlic, 5% vinegar, and 5% salt. The chilli sauce sample was mixed with ripened papaya (Khaek Dam variety) smoothie with 9 levels from 10 to 90 %w/w. The NIR spectra of pure chilli sauce, papaya smoothie and 9 adulterated chilli sauce samples were recorded using FT-NIR spectrometer in the wavenumber range of 12500 and 4000 cm-1. Three machine learning algorithms were applied to develop a model for monitoring adulterated chilli sauce, including partial least squares regression (PLS), support vector machine (SVM), and backpropagation neural network (BPNN). All model presented performance of prediction in the validation set with R2al = 0.99 while RMSEP of PLS, SVM and BPNN were 1.71, 2.18 and 3.27% w/w respectively. This finding indicated that NIR spectroscopy coupled with machine learning approaches were shown to be an alternative technique to monitor papaya smoothie adulterated in chilli sauce in the global food industry.

Download Full-text

Diagnostic Performance of 2D and 3D T2WI-Based Radiomics Features With Machine Learning Algorithms to Distinguish Solid Solitary Pulmonary Lesion

Frontiers in Oncology ◽

10.3389/fonc.2021.683587 ◽

2021 ◽

Vol 11 ◽

Author(s):

Qi Wan ◽

Jiaxuan Zhou ◽

Xiaoying Xia ◽

Jianfeng Hu ◽

Peng Wang ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Diagnostic Performance ◽

Feature Selection Method ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

Selection Methods ◽

Linear Discriminant ◽

2D And 3D

ObjectiveTo evaluate the performance of 2D and 3D radiomics features with different machine learning approaches to classify SPLs based on magnetic resonance(MR) T2 weighted imaging (T2WI).Material and MethodsA total of 132 patients with pathologically confirmed SPLs were examined and randomly divided into training (n = 92) and test datasets (n = 40). A total of 1692 3D and 1231 2D radiomics features per patient were extracted. Both radiomics features and clinical data were evaluated. A total of 1260 classification models, comprising 3 normalization methods, 2 dimension reduction algorithms, 3 feature selection methods, and 10 classifiers with 7 different feature numbers (confined to 3–9), were compared. The ten-fold cross-validation on the training dataset was applied to choose the candidate final model. The area under the receiver operating characteristic curve (AUC), precision-recall plot, and Matthews Correlation Coefficient were used to evaluate the performance of machine learning approaches.ResultsThe 3D features were significantly superior to 2D features, showing much more machine learning combinations with AUC greater than 0.7 in both validation and test groups (129 vs. 11). The feature selection method Analysis of Variance(ANOVA), Recursive Feature Elimination(RFE) and the classifier Logistic Regression(LR), Linear Discriminant Analysis(LDA), Support Vector Machine(SVM), Gaussian Process(GP) had relatively better performance. The best performance of 3D radiomics features in the test dataset (AUC = 0.824, AUC-PR = 0.927, MCC = 0.514) was higher than that of 2D features (AUC = 0.740, AUC-PR = 0.846, MCC = 0.404). The joint 3D and 2D features (AUC=0.813, AUC-PR = 0.926, MCC = 0.563) showed similar results as 3D features. Incorporating clinical features with 3D and 2D radiomics features slightly improved the AUC to 0.836 (AUC-PR = 0.918, MCC = 0.620) and 0.780 (AUC-PR = 0.900, MCC = 0.574), respectively.ConclusionsAfter algorithm optimization, 2D feature-based radiomics models yield favorable results in differentiating malignant and benign SPLs, but 3D features are still preferred because of the availability of more machine learning algorithmic combinations with better performance. Feature selection methods ANOVA and RFE, and classifier LR, LDA, SVM and GP are more likely to demonstrate better diagnostic performance for 3D features in the current study.

Download Full-text

Detection of fraudulent credit card transactions: A comparative analysis of data sampling and classification techniques

Journal of Physics Conference Series ◽

10.1088/1742-6596/2161/1/012072 ◽

2022 ◽

Vol 2161 (1) ◽

pp. 012072

Author(s):

Konduri Praveen Mahesh ◽

Shaik Ashar Afrouz ◽

Anu Shaju Areeckal

Keyword(s):

Machine Learning ◽

Credit Card ◽

Research Problem ◽

Machine Learning Algorithms ◽

Support Vector ◽

Unbalanced Data ◽

Learning Approaches ◽

Data Sampling ◽

Sampled Data ◽

Under Sampling

Abstract Every year there is an increasing loss of a huge amount of money due to fraudulent credit card transactions. Recently there is a focus on using machine learning algorithms to identify fraud transactions. The number of fraud cases to non-fraud transactions is very low. This creates a skewed or unbalanced data, which poses a challenge to training the machine learning models. The availability of a public dataset for this research problem is scarce. The dataset used for this work is obtained from Kaggle. In this paper, we explore different sampling techniques such as under-sampling, Synthetic Minority Oversampling Technique (SMOTE) and SMOTE-Tomek, to work on the unbalanced data. Classification models, such as k-Nearest Neighbour (KNN), logistic regression, random forest and Support Vector Machine (SVM), are trained on the sampled data to detect fraudulent credit card transactions. The performance of the various machine learning approaches are evaluated for its precision, recall and F1-score. The classification results obtained is promising and can be used for credit card fraud detection.

Download Full-text

Three simple steps to improve the interpretability of EEG-SVM studies

10.1101/2021.12.14.472588 ◽

2021 ◽

Author(s):

Coralie Joucla ◽

Damien Gabriel ◽

Emmanuel Haffen ◽

Juan-Pablo Ortega

Keyword(s):

Machine Learning ◽

Model Development ◽

Research Literature ◽

Machine Learning Algorithms ◽

Support Vector ◽

Machine Learning Classification ◽

Diagnosis And Prognosis ◽

Eeg Data ◽

Clinical Adoption

Research in machine-learning classification of electroencephalography (EEG) data offers important perspectives for the diagnosis and prognosis of a wide variety of neurological and psychiatric conditions, but the clinical adoption of such systems remains low. We propose here that much of the difficulties translating EEG-machine learning research to the clinic result from consistent inaccuracies in their technical reporting, which severely impair the interpretability of their often-high claims of performance. Taking example from a major class of machine-learning algorithms used in EEG research, the support-vector machine (SVM), we highlight three important aspects of model development (normalization, hyperparameter optimization and cross-validation) and show that, while these 3 aspects can make or break the performance of the system, they are left entirely undocumented in a shockingly vast majority of the research literature. Providing a more systematic description of these aspects of model development constitute three simple steps to improve the interpretability of EEG-SVM research and, in fine, its clinical adoption.

Download Full-text

Ice Detection on Aircraft Surface Using Machine Learning Approaches Based on Hyperspectral and Multispectral Images

Drones ◽

10.3390/drones4030045 ◽

2020 ◽

Vol 4 (3) ◽

pp. 45

Author(s):

Maria Angela Musci ◽

Luigi Mazzara ◽

Andrea Maria Lingua

Keyword(s):

Machine Learning ◽

Low Cost ◽

Critical Role ◽

Operating Time ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

Ice Detection ◽

Under Construction ◽

Hyperspectral Camera

Aircraft ground de-icing operations play a critical role in flight safety. However, to handle the aircraft de-icing, a considerable quantity of de-icing fluids is commonly employed. Moreover, some pre-flight inspections are carried out with engines running; thus, a large amount of fuel is wasted, and CO2 is emitted. This implies substantial economic and environmental impacts. In this context, the European project (reference call: MANUNET III 2018, project code: MNET18/ICT-3438) called SEI (Spectral Evidence of Ice) aims to provide innovative tools to identify the ice on aircraft and improve the efficiency of the de-icing process. The project includes the design of a low-cost UAV (uncrewed aerial vehicle) platform and the development of a quasi-real-time ice detection methodology to ensure a faster and semi-automatic activity with a reduction of applied operating time and de-icing fluids. The purpose of this work, developed within the activities of the project, is defining and testing the most suitable sensor using a radiometric approach and machine learning algorithms. The adopted methodology consists of classifying ice through spectral imagery collected by two different sensors: multispectral and hyperspectral camera. Since the UAV prototype is under construction, the experimental analysis was performed with a simulation dataset acquired on the ground. The comparison among the two approaches, and their related algorithms (random forest and support vector machine) for image processing, was presented: practical results show that it is possible to identify the ice in both cases. Nonetheless, the hyperspectral camera guarantees a more reliable solution reaching a higher level of accuracy of classified iced surfaces.

Download Full-text

A study of deep learning approaches for medication and adverse drug event extraction from clinical text

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz063 ◽

2019 ◽

Vol 27 (1) ◽

pp. 13-21 ◽

Cited By ~ 11

Author(s):

Qiang Wei ◽

Zongcheng Ji ◽

Zhiheng Li ◽

Jingcheng Du ◽

Jingqi Wang ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Algorithms ◽

Joint Model ◽

Machine Learning Algorithms ◽

Entity Recognition ◽

Superior Performance ◽

Drug Event ◽

Support Vector ◽

Learning Approaches

AbstractObjectiveThis article presents our approaches to extraction of medications and associated adverse drug events (ADEs) from clinical documents, which is the second track of the 2018 National NLP Clinical Challenges (n2c2) shared task.Materials and MethodsThe clinical corpus used in this study was from the MIMIC-III database and the organizers annotated 303 documents for training and 202 for testing. Our system consists of 2 components: a named entity recognition (NER) and a relation classification (RC) component. For each component, we implemented deep learning-based approaches (eg, BI-LSTM-CRF) and compared them with traditional machine learning approaches, namely, conditional random fields for NER and support vector machines for RC, respectively. In addition, we developed a deep learning-based joint model that recognizes ADEs and their relations to medications in 1 step using a sequence labeling approach. To further improve the performance, we also investigated different ensemble approaches to generating optimal performance by combining outputs from multiple approaches.ResultsOur best-performing systems achieved F1 scores of 93.45% for NER, 96.30% for RC, and 89.05% for end-to-end evaluation, which ranked #2, #1, and #1 among all participants, respectively. Additional evaluations show that the deep learning-based approaches did outperform traditional machine learning algorithms in both NER and RC. The joint model that simultaneously recognizes ADEs and their relations to medications also achieved the best performance on RC, indicating its promise for relation extraction.ConclusionIn this study, we developed deep learning approaches for extracting medications and their attributes such as ADEs, and demonstrated its superior performance compared with traditional machine learning algorithms, indicating its uses in broader NER and RC tasks in the medical domain.

Download Full-text

USE OF MULTIVARIATE MACHINE LEARNING ANALYSIS TECHNIQUES FOR FLOOD RISK PREVENTION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-w4-549-2018 ◽

2018 ◽

Vol XLII-3/W4 ◽

pp. 549-554

Author(s):

D. Vito

Keyword(s):

Machine Learning ◽

Flood Risk ◽

Multivariate Analyses ◽

Regional Climate ◽

Learning Algorithms ◽

Warning System ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Multiple Variables

Abstract. Natural disasters such as flood are regarded to be caused by extreme weather conditions as well as changes in global and regional climate. The prediction of flood incoming is a key factor to ensure civil protection in case of emergency and to provide effective early warning system. The risk of flood is affected by several factors such as land use, meteorological events, hydrology and the topology of the land. Predict such a risk implies the use of data coming from different sources such satellite images, water basin levels, meteorological and GIS data, that nowadays are easily produced by the availability new satellite portals as SENTINEL and distributed sensor networks on the field. In order to have a comprehensive and accurate prediction of flood risk is essential to perform a selective and multivariate analyses among the different types of inputs. Multivariate Analysis refers to all statistical techniques that simultaneously analyse multiple variables. Among multivariate analyses, Machine learning to provide increasing levels of accuracy precision and efficiency by discovering patterns in large and heterogeneous input datasets. Basically, machine learning algorithms automatically acquire experience information from data. This is done by the process of learning, by which the algorithm can generalize beyond the examples given by training data in input. Machine learning is interesting for predictions because it adapts the resolution strategies to the features of the data. This peculiarity can be used to predict extreme from high variable data, as in the case of floods. This work propose strategies and case studies on the application on machine learning algorithms on floods events prediction. Particullarly the study will focus on the application of Support Vector Machines and Artificial Neural Networks on a multivariate set of data related to river Seveso, in order to propose a more general framework from the case study.

Download Full-text

ANOMALY DETECTION USING MACHINE LEARNING APPROACHES

Azerbaijan Journal of High Performance Computing ◽

10.32010/26166127.2020.3.2.196.206 ◽

2020 ◽

Vol 3 (2) ◽

pp. 196-206

Author(s):

Mausumi Das Nath ◽

◽

Tapalina Bhattasali

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Vital Role ◽

Machine Learning Algorithms ◽

Intrusion Detection Systems ◽

Abnormal Behavior ◽

Support Vector ◽

Learning Approaches ◽

Detection Systems ◽

Internet Users

Due to the enormous usage of the Internet, users share resources and exchange voluminous amounts of data. This increases the high risk of data theft and other types of attacks. Network security plays a vital role in protecting the electronic exchange of data and attempts to avoid disruption concerning finances or disrupted services due to the unknown proliferations in the network. Many Intrusion Detection Systems (IDS) are commonly used to detect such unknown attacks and unauthorized access in a network. Many approaches have been put forward by the researchers which showed satisfactory results in intrusion detection systems significantly which ranged from various traditional approaches to Artificial Intelligence (AI) based approaches.AI based techniques have gained an edge over other statistical techniques in the research community due to its enormous benefits. Procedures can be designed to display behavior learned from previous experiences. Machine learning algorithms are used to analyze the abnormal instances in a particular network. Supervised learning is essential in terms of training and analyzing the abnormal behavior in a network. In this paper, we propose a model of Naïve Bayes and SVM (Support Vector Machine) to detect anomalies and an ensemble approach to solve the weaknesses and to remove the poor detection results

Download Full-text