Detecting a keystone species European aspen in boreal forests with airborne hyperspectral, LiDAR and UAV data with machine learning methods

Author(s):  
Timo Kumpula ◽  
Janne Mäyrä ◽  
Anton Kuzmin ◽  
Arto Viinikka ◽  
Sonja Kivinen ◽  
...  

<p>Sustainable forest management increasingly highlights the maintenance of biological diversity and requires up-to-date information on the occurrence and distribution of key ecological features in forest environments. Different proxy variables indicating species richness and quality of the sites are essential for efficient detecting and monitoring forest biodiversity. European aspen (Populus tremula L.) is a minor deciduous tree species with a high importance in maintaining biodiversity in boreal forests. Large aspen trees host hundreds of species, many of them classified as threatened. However, accurate fine-scale spatial data on aspen occurrence remains scarce and incomprehensive.</p><p> </p><p>We studied detection of aspen using different remote sensing techniques in Evo, southern Finland. Our study area of 83 km<sup>2</sup> contains both managed and protected southern boreal forests characterized by Scots pine (Pinus sylvestris L.), Norway spruce (Picea abies (L.) Karst), and birch (Betula pendula and pubescens L.), whereas European aspen has a relatively sparse and scattered occurrence in the area. We collected high-resolution airborne hyperspectral and airborne laser scanning data covering the whole study area and ultra-high resolution unmanned aerial vehicle (UAV) data with RGB and multispectral sensors from selected parts of the area. We tested the discrimination of aspen from other species at tree level using different machine learning methods (Support Vector Machines, Random Forest, Gradient Boosting Machine) and deep learning methods (3D convolutional neural networks).</p><p> </p><p>Airborne hyperspectral and lidar data gave excellent results with machine learning and deep learning classification methods The highest classification accuracies for aspen varied between 91-92% (F1-score). The most important wavelengths for discriminating aspen from other species included reflectance bands of red edge range (724–727 nm) and shortwave infrared (1520–1564 nm and 1684–1706 nm) (Viinikka et al. 2020; Mäyrä et al 2021). Aspen detection using RGB and multispectral data also gave good results (highest F1-score of aspen = 87%) (Kuzmin et al 2021). Different remote sensing data enabled production of a spatially explicit map of aspen occurrence in the study area. Information on aspen occurrence and abundance can significantly contribute to biodiversity management and conservation efforts in boreal forests. Our results can be further utilized in upscaling efforts aiming at aspen detection over larger geographical areas using satellite images.</p>

2019 ◽  
Vol 11 (2) ◽  
pp. 196 ◽  
Author(s):  
Omid Ghorbanzadeh ◽  
Thomas Blaschke ◽  
Khalil Gholamnia ◽  
Sansar Meena ◽  
Dirk Tiede ◽  
...  

There is a growing demand for detailed and accurate landslide maps and inventories around the globe, but particularly in hazard-prone regions such as the Himalayas. Most standard mapping methods require expert knowledge, supervision and fieldwork. In this study, we use optical data from the Rapid Eye satellite and topographic factors to analyze the potential of machine learning methods, i.e., artificial neural network (ANN), support vector machines (SVM) and random forest (RF), and different deep-learning convolution neural networks (CNNs) for landslide detection. We use two training zones and one test zone to independently evaluate the performance of different methods in the highly landslide-prone Rasuwa district in Nepal. Twenty different maps are created using ANN, SVM and RF and different CNN instantiations and are compared against the results of extensive fieldwork through a mean intersection-over-union (mIOU) and other common metrics. This accuracy assessment yields the best result of 78.26% mIOU for a small window size CNN, which uses spectral information only. The additional information from a 5 m digital elevation model helps to discriminate between human settlements and landslides but does not improve the overall classification accuracy. CNNs do not automatically outperform ANN, SVM and RF, although this is sometimes claimed. Rather, the performance of CNNs strongly depends on their design, i.e., layer depth, input window sizes and training strategies. Here, we conclude that the CNN method is still in its infancy as most researchers will either use predefined parameters in solutions like Google TensorFlow or will apply different settings in a trial-and-error manner. Nevertheless, deep-learning can improve landslide mapping in the future if the effects of the different designs are better understood, enough training samples exist, and the effects of augmentation strategies to artificially increase the number of existing samples are better understood.


2017 ◽  
Author(s):  
Fadhl M Alakwaa ◽  
Kumardeep Chaudhary ◽  
Lana X Garmire

ABSTRACTMetabolomics holds the promise as a new technology to diagnose highly heterogeneous diseases. Conventionally, metabolomics data analysis for diagnosis is done using various statistical and machine learning based classification methods. However, it remains unknown if deep neural network, a class of increasingly popular machine learning methods, is suitable to classify metabolomics data. Here we use a cohort of 271 breast cancer tissues, 204 positive estrogen receptor (ER+) and 67 negative estrogen receptor (ER-), to test the accuracies of autoencoder, a deep learning (DL) framework, as well as six widely used machine learning models, namely Random Forest (RF), Support Vector Machines (SVM), Recursive Partitioning and Regression Trees (RPART), Linear Discriminant Analysis (LDA), Prediction Analysis for Microarrays (PAM), and Generalized Boosted Models (GBM). DL framework has the highest area under the curve (AUC) of 0.93 in classifying ER+/ER-patients, compared to the other six machine learning algorithms. Furthermore, the biological interpretation of the first hidden layer reveals eight commonly enriched significant metabolomics pathways (adjusted P-value<0.05) that cannot be discovered by other machine learning methods. Among them, protein digestion & absorption and ATP-binding cassette (ABC) transporters pathways are also confirmed in integrated analysis between metabolomics and gene expression data in these samples. In summary, deep learning method shows advantages for metabolomics based breast cancer ER status classification, with both the highest prediction accurcy (AUC=0.93) and better revelation of disease biology. We encourage the adoption of autoencoder based deep learning method in the metabolomics research community for classification.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mohanad Mohammed ◽  
Henry Mwambi ◽  
Innocent B. Mboya ◽  
Murtada K. Elbashir ◽  
Bernard Omolo

AbstractCancer tumor classification based on morphological characteristics alone has been shown to have serious limitations. Breast, lung, colorectal, thyroid, and ovarian are the most commonly diagnosed cancers among women. Precise classification of cancers into their types is considered a vital problem for cancer diagnosis and therapy. In this paper, we proposed a stacking ensemble deep learning model based on one-dimensional convolutional neural network (1D-CNN) to perform a multi-class classification on the five common cancers among women based on RNASeq data. The RNASeq gene expression data was downloaded from Pan-Cancer Atlas using GDCquery function of the TCGAbiolinks package in the R software. We used least absolute shrinkage and selection operator (LASSO) as feature selection method. We compared the results of the new proposed model with and without LASSO with the results of the single 1D-CNN and machine learning methods which include support vector machines with radial basis function, linear, and polynomial kernels; artificial neural networks; k-nearest neighbors; bagging trees. The results show that the proposed model with and without LASSO has a better performance compared to other classifiers. Also, the results show that the machine learning methods (SVM-R, SVM-L, SVM-P, ANN, KNN, and bagging trees) with under-sampling have better performance than with over-sampling techniques. This is supported by the statistical significance test of accuracy where the p-values for differences between the SVM-R and SVM-P, SVM-R and ANN, SVM-R and KNN are found to be p = 0.003, p =  < 0.001, and p =  < 0.001, respectively. Also, SVM-L had a significant difference compared to ANN p = 0.009. Moreover, SVM-P and ANN, SVM-P and KNN are found to be significantly different with p-values p =  < 0.001 and p =  < 0.001, respectively. In addition, ANN and bagging trees, ANN and KNN were found to be significantly different with p-values p =  < 0.001 and p = 0.004, respectively. Thus, the proposed model can help in the early detection and diagnosis of cancer in women, and hence aid in designing early treatment strategies to improve survival.


2020 ◽  
Author(s):  
Thomas R. Lane ◽  
Daniel H. Foil ◽  
Eni Minerali ◽  
Fabio Urbina ◽  
Kimberley M. Zorn ◽  
...  

<p>Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay Central<sup>TM</sup> with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay Central<sup>TM</sup> and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay Central<sup>TM</sup> may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay Central<sup>TM</sup>performance, but support vector classification seems to be a strong competitor. We also apply Assay Central<sup>TM</sup> to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models. </p><p><b> </b></p>


2020 ◽  
Author(s):  
Timo Kumpula ◽  
Arto Viinikka ◽  
Janne Mäyrä ◽  
Anton Kuzmin ◽  
Pekka Hurskainen ◽  
...  

&lt;p&gt;Importance of biodiversity is increasingly highlighted as an essential part of sustainable forest management. As direct monitoring of biodiversity is not possible, proxy variables have been used to indicate site&amp;#8217;s species richness and quality. In boreal forests, European aspen (Populus tremula L.) is one of the most significant proxies for biodiversity. Aspen is a keystone species, hosting a range of endangered species, hence having a high importance in maintaining forest biodiversity. Still, reliable and fine-scale spatial data on aspen occurrence remains scarce and incomprehensive. Although remote sensing-based species classification has been used for decades for the needs of forestry, commercially less significant species (e.g., aspen) have typically been excluded from the studies. This creates a need for developing general methods for tree species classification covering also ecologically significant species.&lt;/p&gt;&lt;p&gt;&amp;#160;&lt;/p&gt;&lt;p&gt;Our study area, located in Evo, Southern Finland, covers approximately 83km&lt;sup&gt;2&lt;/sup&gt;, and contains both managed and protected southern boreal forests. The main tree species in the area are Scots pine (Pinus sylvestris L.), Norway spruce (Picea abies (L.) Karst), and birch (Betula pendula and pubescens L.), with relatively sparse and scattered occurrence of aspen. Along with a thorough field data, airborne hyperspectral and LiDAR data have been acquired from the study area. We also collected ultra high resolution unmanned aerial vehicle (UAV) data with RGB and multispectral sensors.&lt;/p&gt;&lt;p&gt;&amp;#160;&lt;/p&gt;&lt;p&gt;Our aim is to gather fundamental data on hyperspectral and multispectral species classification, that can be utilized to produce detailed aspen data at large scale. For this, we first analyze species detection at tree-level. We test and compare different machine learning methods (Support Vector Machines, Random Forest, Gradient Boosting Machine) and deep learning methods (3D convolutional neural networks), with specific emphasis on accurate and feasible aspen detection. The results will show, how accurately aspen can be detected from the forest canopy, and which bandwidths have the largest importance for aspen. This information can be utilized for aspen detection from satellite images at large scale.&lt;/p&gt;


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-16
Author(s):  
Afan Hasan ◽  
Oya Kalıpsız ◽  
Selim Akyokuş

Although the vast majority of fundamental analysts believe that technical analysts’ estimates and technical indicators used in these analyses are unresponsive, recent research has revealed that both professionals and individual traders are using technical indicators. A correct estimate of the direction of the financial market is a very challenging activity, primarily due to the nonlinear nature of the financial time series. Deep learning and machine learning methods on the other hand have achieved very successful results in many different areas where human beings are challenged. In this study, technical indicators were integrated into the methods of deep learning and machine learning, and the behavior of the traders was modeled in order to increase the accuracy of forecasting of the financial market direction. A set of technical indicators has been examined based on their application in technical analysis as input features to predict the oncoming (one-period-ahead) direction of Istanbul Stock Exchange (BIST100) national index. To predict the direction of the index, Deep Neural Network (DNN), Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR) classification techniques are used. The performance of these models is evaluated on the basis of various performance metrics such as confusion matrix, compound return, and max drawdown.


2021 ◽  
Vol 13 (9) ◽  
pp. 4607
Author(s):  
Shahram Rezapour ◽  
Erfan Jooyandeh ◽  
Mohsen Ramezanzade ◽  
Ali Mostafaeipour ◽  
Mehdi Jahangiri ◽  
...  

With the rising demand for food products and the direct impact of climate change on food production in many parts of the world, recent years have seen growing interest in the subject of food security and the role of rainfed farming in this area. Machine learning methods can be used to predict crop yield based on a combination of remote sensing data and data collected by ground weather stations. This paper argues that forecasting drylands farming yield can be reliable for management purpose under uncertain conditions using machine learning methods and remote sensing data and determines which indicators are most important in predicting the yield of chickpea. In this study, the yield of rainfed chickpea farms in 11 top chickpea producing counties in Kermanshah province, Iran, was predicted using three machine learning methods, namely support vector regression (SVR), random forest (RF), and K-nearest neighbors (KNN). To improve prediction accuracy, for each county, remote sensing data were overlaid by the satellite images of rainfed farms with a suitable slope and altitude for rainfed farming. An integrated database was created by combining weather data, remote sensing data, and chickpea yield statistics. The methods were evaluated using the leave-one-out cross-validation (LOOCV) technique and compared in terms of multiple measures. Given the sensitivity of rainfed chickpea yield to the time of data, the predictions were made in two scenarios: (1) using the averages of the data of all growing months, and (2) using the data of a combination of months. The results showed that RF provides more accurate yield predictions than other methods. The predictions of this method were 7–8% different from the statistics reported by the Statistical Center and the Ministry of Agriculture of Iran. It was found that for pre-harvest prediction of rainfed chickpea yield, using the data of the March–April period (the averages of two months) offers the best result in terms of the correlation coefficient for the relationship between the yield and the predictor indices.


2021 ◽  
Vol 11 (10) ◽  
pp. 4499
Author(s):  
Mei-Ling Huang ◽  
Yun-Zhi Li

Major League Baseball (MLB) is the highest level of professional baseball in the world and accounts for some of the most popular international sporting events. Many scholars have conducted research on predicting the outcome of MLB matches. The accuracy in predicting the results of baseball games is low. Therefore, deep learning and machine learning methods were used to build models for predicting the outcomes (win/loss) of MLB matches and investigate the differences between the models in terms of their performance. The match data of 30 teams during the 2019 MLB season with only the starting pitcher or with all pitchers in the pitcher category were collected to compare the prediction accuracy. A one-dimensional convolutional neural network (1DCNN), a traditional machine learning artificial neural network (ANN), and a support vector machine (SVM) were used to predict match outcomes with fivefold cross-validation to evaluate model performance. The highest prediction accuracies were 93.4%, 93.91%, and 93.90% with the 1DCNN, ANN, SVM models, respectively, before feature selection; after feature selection, the highest accuracies obtained were 94.18% and 94.16% with the ANN and SVM models, respectively. The prediction results obtained with the three models were similar, and the prediction accuracies were much higher than those obtained in related studies. Moreover, a 1DCNN was used for the first time for predicting the outcome of MLB matches, and it achieved a prediction accuracy similar to that achieved by machine learning methods.


2021 ◽  
Vol 13 (8) ◽  
pp. 1507
Author(s):  
Haibo Wang ◽  
Jianchao Qi ◽  
Yufei Lei ◽  
Jun Wu ◽  
Bo Li ◽  
...  

Automatic detection of newly constructed building areas (NCBAs) plays an important role in addressing issues of ecological environment monitoring, urban management, and urban planning. Compared with low-and-middle resolution remote sensing images, high-resolution remote sensing images are superior in spatial resolution and display of refined spatial details. Yet its problems of spectral heterogeneity and complexity have impeded research of change detection for high-resolution remote sensing images. As generalized machine learning (including deep learning) technologies proceed, the efficiency and accuracy of recognition for ground-object in remote sensing have been substantially improved, providing a new solution for change detection of high-resolution remote sensing images. To this end, this study proposes a refined NCBAs detection method consisting of four parts based on generalized machine learning: (1) pre-processing; (2) candidate NCBAs are obtained by means of bi-temporal building masks acquired by deep learning semantic segmentation, and then registered one by one; (3) rules and support vector machine (SVM) are jointly adopted for classification of NCBAs with high, medium and low confidence; and (4) the final vectors of NCBAs are obtained by post-processing. In addition, area-based and pixel-based methods are adopted for accuracy assessment. Firstly, the proposed method is applied to three groups of GF1 images covering the urban fringe areas of Jinan, whose experimental results are divided into three categories: high, high-medium, and high-medium-low confidence. The results show that NCBAs of high confidence share the highest F1 score and the best overall effect. Therefore, only NCBAs of high confidence are considered to be the final detection result by this method. Specifically, in NCBAs detection for three groups GF1 images in Jinan, the mean Recall of area-based and pixel-based assessment methods reach around 77% and 91%, respectively, the mean Pixel Accuracy (PA) 88% and 92%, and the mean F1 82% and 91%, confirming the effectiveness of this method on GF1. Similarly, the proposed method is applied to two groups of ZY302 images in Xi’an and Kunming. The scores of F1 for two groups of ZY302 images are also above 90% respectively, confirming the effectiveness of this method on ZY302. It can be concluded that adoption of area registration improves registration efficiency, and the joint use of prior rules and SVM classifier with probability features could avoid over and missing detection for NCBAs. In practical applications, this method is contributive to automatic NCBAs detection from high-resolution remote sensing images.


2020 ◽  
Author(s):  
Thomas R. Lane ◽  
Daniel H. Foil ◽  
Eni Minerali ◽  
Fabio Urbina ◽  
Kimberley M. Zorn ◽  
...  

<p>Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay Central<sup>TM</sup> with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay Central<sup>TM</sup> and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay Central<sup>TM</sup> may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay Central<sup>TM</sup>performance, but support vector classification seems to be a strong competitor. We also apply Assay Central<sup>TM</sup> to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models. </p><p><b> </b></p>


Sign in / Sign up

Export Citation Format

Share Document