scholarly journals UAV-Based Hyperspectral and Ensemble Machine Learning for Predicting Yield in Winter Wheat

Agronomy ◽  
2022 ◽  
Vol 12 (1) ◽  
pp. 202
Author(s):  
Zhen Chen ◽  
Qian Cheng ◽  
Fuyi Duan ◽  
Xiuqiao Huang ◽  
Honggang Xu ◽  
...  

Winter wheat is a widely-grown cereal crop worldwide. Using growth-stage information to estimate winter wheat yields in a timely manner is essential for accurate crop management and rapid decision-making in sustainable agriculture, and to increase productivity while reducing environmental impact. UAV remote sensing is widely used in precision agriculture due to its flexibility and increased spatial and spectral resolution. Hyperspectral data are used to model crop traits because of their ability to provide continuous rich spectral information and higher spectral fidelity. In this study, hyperspectral image data of the winter wheat crop canopy at the flowering and grain-filling stages was acquired by a low-altitude unmanned aerial vehicle (UAV), and machine learning was used to predict winter wheat yields. Specifically, a large number of spectral indices were extracted from the spectral data, and three feature selection methods, recursive feature elimination (RFE), Boruta feature selection, and the Pearson correlation coefficient (PCC), were used to filter high spectral indices in order to reduce the dimensionality of the data. Four major basic learner models, (1) support vector machine (SVM), (2) Gaussian process (GP), (3) linear ridge regression (LRR), and (4) random forest (RF), were also constructed, and an ensemble machine learning model was developed by combining the four base learner models. The results showed that the SVM yield prediction model, constructed on the basis of the preferred features, performed the best among the base learner models, with an R2 between 0.62 and 0.73. The accuracy of the proposed ensemble learner model was higher than that of each base learner model; moreover, the R2 (0.78) for the yield prediction model based on Boruta’s preferred characteristics was the highest at the grain-filling stage.

2019 ◽  
Vol 26 (3) ◽  
pp. 1810-1826 ◽  
Author(s):  
Behnaz Raef ◽  
Masoud Maleki ◽  
Reza Ferdousi

The aim of this study is to develop a computational prediction model for implantation outcome after an embryo transfer cycle. In this study, information of 500 patients and 1360 transferred embryos, including cleavage and blastocyst stages and fresh or frozen embryos, from April 2016 to February 2018, were collected. The dataset containing 82 attributes and a target label (indicating positive and negative implantation outcomes) was constructed. Six dominant machine learning approaches were examined based on their performance to predict embryo transfer outcomes. Also, feature selection procedures were used to identify effective predictive factors and recruited to determine the optimum number of features based on classifiers performance. The results revealed that random forest was the best classifier (accuracy = 90.40% and area under the curve = 93.74%) with optimum features based on a 10-fold cross-validation test. According to the Support Vector Machine-Feature Selection algorithm, the ideal numbers of features are 78. Follicle stimulating hormone/human menopausal gonadotropin dosage for ovarian stimulation was the most important predictive factor across all examined embryo transfer features. The proposed machine learning-based prediction model could predict embryo transfer outcome and implantation of embryos with high accuracy, before the start of an embryo transfer cycle.


Inventions ◽  
2020 ◽  
Vol 5 (4) ◽  
pp. 57
Author(s):  
Attique Ur Rehman ◽  
Tek Tjing Lie ◽  
Brice Vallès ◽  
Shafiqur Rahman Tito

The recent advancement in computational capabilities and deployment of smart meters have caused non-intrusive load monitoring to revive itself as one of the promising techniques of energy monitoring. Toward effective energy monitoring, this paper presents a non-invasive load inference approach assisted by feature selection and ensemble machine learning techniques. For evaluation and validation purposes of the proposed approach, one of the major residential load elements having solid potential toward energy efficiency applications, i.e., water heating, is considered. Moreover, to realize the real-life deployment, digital simulations are carried out on low-sampling real-world load measurements: New Zealand GREEN Grid Database. For said purposes, MATLAB and Python (Scikit-Learn) are used as simulation tools. The employed learning models, i.e., standalone and ensemble, are trained on a single household’s load data and later tested rigorously on a set of diverse households’ load data, to validate the generalization capability of the employed models. This paper presents a comprehensive performance evaluation of the presented approach in the context of event detection, feature selection, and learning models. Based on the presented study and corresponding analysis of the results, it is concluded that the proposed approach generalizes well to the unseen testing data and yields promising results in terms of non-invasive load inference.


2020 ◽  
Vol 13 (10) ◽  
pp. 305
Author(s):  
Eugene Lin ◽  
Po-Hsiu Kuo ◽  
Yu-Li Liu ◽  
Younger W.-Y. Yu ◽  
Albert C. Yang ◽  
...  

In the wake of recent advances in machine learning research, the study of pharmacogenomics using predictive algorithms serves as a new paradigmatic application. In this work, our goal was to explore an ensemble machine learning approach which aims to predict probable antidepressant treatment response and remission in major depressive disorder (MDD). To discover the status of antidepressant treatments, we established an ensemble predictive model with a feature selection algorithm resulting from the analysis of genetic variants and clinical variables of 421 patients who were treated with selective serotonin reuptake inhibitors. We also compared our ensemble machine learning framework with other state-of-the-art models including multi-layer feedforward neural networks (MFNNs), logistic regression, support vector machine, C4.5 decision tree, naïve Bayes, and random forests. Our data revealed that the ensemble predictive algorithm with feature selection (using fewer biomarkers) performed comparably to other predictive algorithms (such as MFNNs and logistic regression) to derive the perplexing relationship between biomarkers and the status of antidepressant treatments. Our study demonstrates that the ensemble machine learning framework may present a useful technique to create bioinformatics tools for discriminating non-responders from responders prior to antidepressant treatments.


The Bank Marketing data set at Kaggle is mostly used in predicting if bank clients will subscribe a long-term deposit. We believe that this data set could provide more useful information such as predicting whether a bank client could be approved for a loan. This is a critical choice that has to be made by decision makers at the bank. Building a prediction model for such high-stakes decision does not only require high model prediction accuracy, but also needs a reasonable prediction interpretation. In this research, different ensemble machine learning techniques have been deployed such as Bagging and Boosting. Our research results showed that the loan approval prediction model has an accuracy of 83.97%, which is approximately 25% better than most state-of-the-art other loan prediction models found in the literature. As well, the model interpretation efforts done in this research was able to explain a few critical cases that the bank decision makers may encounter; therefore, the high accuracy of the designed models was accompanied with a trust in prediction. We believe that the achieved model accuracy accompanied with the provided interpretation information are vitally needed for decision makers to understand how to maintain balance between security and reliability of their financial lending system, while providing fair credit opportunities to their clients.


2021 ◽  
Author(s):  
Amit Kumar Srivast ◽  
Nima Safaei ◽  
Saeed Khaki ◽  
Gina Lopez ◽  
Wenzhi Zeng ◽  
...  

Abstract Crop yield forecasting depends on many interactive factors including crop genotype, weather, soil, and management practices. This study analyzes the performance of machine learning and deep learning methods for winter wheat yield prediction using extensive datasets of weather, soil, and crop phenology. We propose a convolutional neural network (CNN) which uses the 1-dimentional convolution operation to capture the time dependencies of environmental variables. The proposed CNN, evaluated along with other machine learning models for winter wheat yield prediction in Germany, outperformed all other models tested. To address the seasonality, weekly features were used that explicitly take soil moisture and meteorological events into account. Our results indicated that nonlinear models such as deep learning models and XGboost are more effective in finding the functional relationship between the crop yield and input data compared to linear models and deep neural networks had a higher prediction accuracy than XGboost. One of the main limitations of machine learning models is their black box property. Therefore, we moved beyond prediction and performed feature selection, as it provides key results towards explaining yield prediction (variable importance by time). As such, our study indicates which variables have the most significant effect on winter wheat yield.


2019 ◽  
Author(s):  
Wongeun Song ◽  
Se Young Jung ◽  
Hyunyoung Baek ◽  
Chang Won Choi ◽  
Young Hwa Jung ◽  
...  

BACKGROUND Neonatal sepsis is associated with most cases of mortalities and morbidities in the neonatal intensive care unit (NICU). Many studies have developed prediction models for the early diagnosis of bloodstream infections in newborns, but there are limitations to data collection and management because these models are based on high-resolution waveform data. OBJECTIVE The aim of this study was to examine the feasibility of a prediction model by using noninvasive vital sign data and machine learning technology. METHODS We used electronic medical record data in intensive care units published in the Medical Information Mart for Intensive Care III clinical database. The late-onset neonatal sepsis (LONS) prediction algorithm using our proposed forward feature selection technique was based on NICU inpatient data and was designed to detect clinical sepsis 48 hours before occurrence. The performance of this prediction model was evaluated using various feature selection algorithms and machine learning models. RESULTS The performance of the LONS prediction model was found to be comparable to that of the prediction models that use invasive data such as high-resolution vital sign data, blood gas estimations, blood cell counts, and pH levels. The area under the receiver operating characteristic curve of the 48-hour prediction model was 0.861 and that of the onset detection model was 0.868. The main features that could be vital candidate markers for clinical neonatal sepsis were blood pressure, oxygen saturation, and body temperature. Feature generation using kurtosis and skewness of the features showed the highest performance. CONCLUSIONS The findings of our study confirmed that the LONS prediction model based on machine learning can be developed using vital sign data that are regularly measured in clinical settings. Future studies should conduct external validation by using different types of data sets and actual clinical verification of the developed model.


2020 ◽  
Author(s):  
Yulan Liang ◽  
Amin Gharipour ◽  
Erik Kelemen ◽  
Arpad Kelemen

Abstract Background: The identification of important proteins is critical for medical diagnosis and prognosis in common diseases. Diverse sets of computational tools were developed for omics data reductions and protein selections. However, standard statistical models with single feature selection involve the multi-testing burden of low power with the available limited samples. Furthermore, high correlations among proteins with high redundancy and moderate effects often lead to unstable selections and cause reproducibility issues. Ensemble feature selection in machine learning may identify a stable set of disease biomarkers that could improve the prediction performance of subsequent classification models, and thereby simplify their interpretability. In this study, we developed a three-stage homogeneous ensemble feature selection approach for both identifying proteins and improving prediction accuracy. This approach was implemented and applied to ovarian cancer proteogenomics data sets: 1) binary putative homologous recombination deficiency positive or negative; and 2) multiple mRNA classes (differentiated, proliferative, immunoreactive, mesenchymal, and unknown). We conducted and compared various machine learning approaches with homogeneous ensemble feature selection including random forest, support vector machine, and neural network for predicting both binary and multiple class outcomes. Various performance criteria including sensitivity, specificity, kappa statistics were used to assess the prediction consistency and accuracy. Results: With the proposed three-stage homogeneous ensemble feature selection approaches, prediction accuracy can be improved with the limited sample through continuously reducing errors and redundancy, i.e. Treebag provided 83% prediction accuracy (85% sensitivity and 81% specificity) for binary ovarian outcomes. For mRNA multi-classes classification, our approach provided even better accuracy with increased sample size. Conclusions: Despite the different prediction accuracies from various models, homogeneous ensemble feature selection proposed identified consistent sets of top ranked important markers out of 9606 proteins linked to the binary disease and multiple mRNA class outcomes.


2021 ◽  
Vol 12 (1) ◽  
pp. 60
Author(s):  
Samuel Ndichu ◽  
Sangwook Kim ◽  
Seiichi Ozawa ◽  
Tao Ban ◽  
Takeshi Takahashi ◽  
...  

Attacks using Uniform Resource Locators (URLs) and their JavaScript (JS) code content to perpetrate malicious activities on the Internet are rampant and continuously evolving. Methods such as blocklisting, client honeypots, domain reputation inspection, and heuristic and signature-based systems are used to detect these malicious activities. Recently, machine learning approaches have been proposed; however, challenges still exist. First, blocklist systems are easily evaded by new URLs and JS code content, obfuscation, fast-flux, cloaking, and URL shortening. Second, heuristic and signature-based systems do not generalize well to zero-day attacks. Third, the Domain Name System allows cybercriminals to easily migrate their malicious servers to hide their Internet protocol addresses behind domain names. Finally, crafting fully representative features is challenging, even for domain experts. This study proposes a feature selection and classification approach for malicious JS code content using Shapley additive explanations and tree ensemble methods. The JS code features are obtained from the Abstract Syntax Tree form of the JS code, sample JS attack codes, and association rule mining. The malicious and benign JS code datasets obtained from Hynek Petrak and the Majestic Million Service were used for performance evaluation. We compared the performance of the proposed method to those of other feature selection methods in the task of malicious JS code content detection. With a recall of 0.9989, our experimental results show that the proposed approach is a better prediction model.


Sign in / Sign up

Export Citation Format

Share Document