Machine Learning Techniques for Tree Species Classification Using Co-Registered LiDAR and Hyperspectral Data

The use of light detection and ranging (LiDAR) techniques for recording and analyzing tree and forest structural variables shows strong promise for improving established hyperspectral-based tree species classifications; however, previous multi-sensoral projects were often limited by error resulting from seasonal or flight path differences. The National Aeronautics and Space Administration (NASA) Goddard’s LiDAR, hyperspectral, and thermal imager (G-LiHT) is now providing co-registered data on experimental forests in the United States, which are associated with established ground truths from existing forest plots. Free, user-friendly machine learning applications like the Orange Data Mining Extension for Python recently simplified the process of combining datasets, handling variable redundancy and noise, and reducing dimensionality in remotely sensed datasets. Neural networks, CN2 rules, and support vector machine methods are used here to achieve a final classification accuracy of 67% for dominant tree species in experimental plots of Howland Experimental Forest, a mixed coniferous–deciduous forest with ten dominant tree species, and 59% for plots in Penobscot Experimental Forest, a mixed coniferous–deciduous forest with 15 dominant tree species. These accuracies are higher than those produced using LiDAR or hyperspectral datasets separately, suggesting that combined spectral and structural data have a greater richness of complementary information than either dataset alone. Using greatly simplified datasets created by our dimensionality reduction methodology, machine learner performance remains comparable or higher to that using the full dataset. Across forests, the identification of shared structural and spectral variables suggests that this methodology can successfully identify parameters with high explanatory power for differentiating among tree species, and opens the possibility of addressing large-scale forestry questions using optimized remote sensing workflows.

Download Full-text

Reliable photometric membership (RPM) of galaxies in clusters – I. A machine learning method and its performance in the local universe

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa486 ◽

2020 ◽

Vol 493 (3) ◽

pp. 3429-3441

Author(s):

Paulo A A Lopes ◽

André L B Ribeiro

Keyword(s):

Machine Learning ◽

Galaxy Evolution ◽

Large Scale ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Validation Data ◽

Membership Probability ◽

Cluster Membership ◽

Stochastic Gradient Boosting

ABSTRACT We introduce a new method to determine galaxy cluster membership based solely on photometric properties. We adopt a machine learning approach to recover a cluster membership probability from galaxy photometric parameters and finally derive a membership classification. After testing several machine learning techniques (such as stochastic gradient boosting, model averaged neural network and k-nearest neighbours), we found the support vector machine algorithm to perform better when applied to our data. Our training and validation data are from the Sloan Digital Sky Survey main sample. Hence, to be complete to $M_r^* + 3$, we limit our work to 30 clusters with $z$phot-cl ≤ 0.045. Masses (M200) are larger than $\sim 0.6\times 10^{14} \, \mathrm{M}_{\odot }$ (most above $3\times 10^{14} \, \mathrm{M}_{\odot }$). Our results are derived taking in account all galaxies in the line of sight of each cluster, with no photometric redshift cuts or background corrections. Our method is non-parametric, making no assumptions on the number density or luminosity profiles of galaxies in clusters. Our approach delivers extremely accurate results (completeness, C $\sim 92{\rm{ per\ cent}}$ and purity, P $\sim 87{\rm{ per\ cent}}$) within R200, so that we named our code reliable photometric membership. We discuss possible dependencies on magnitude, colour, and cluster mass. Finally, we present some applications of our method, stressing its impact to galaxy evolution and cosmological studies based on future large-scale surveys, such as eROSITA, EUCLID, and LSST.

Download Full-text

Feature Selection from Lyme Disease Patient Survey Using Machine Learning

Algorithms ◽

10.3390/a13120334 ◽

2020 ◽

Vol 13 (12) ◽

pp. 334

Author(s):

Joshua Vendrow ◽

Jamie Haddock ◽

Deanna Needell ◽

Lorraine Johnson

Keyword(s):

Machine Learning ◽

Lyme Disease ◽

Large Scale ◽

Disease Patient ◽

Patient Survey ◽

Machine Learning Techniques ◽

Medical Community ◽

Support Vector ◽

Global Rating ◽

K Nearest Neighbors

Lyme disease is a rapidly growing illness that remains poorly understood within the medical community. Critical questions about when and why patients respond to treatment or stay ill, what kinds of treatments are effective, and even how to properly diagnose the disease remain largely unanswered. We investigate these questions by applying machine learning techniques to a large scale Lyme disease patient registry, MyLymeData, developed by the nonprofit LymeDisease.org. We apply various machine learning methods in order to measure the effect of individual features in predicting participants’ answers to the Global Rating of Change (GROC) survey questions that assess the self-reported degree to which their condition improved, worsened, or remained unchanged following antibiotic treatment. We use basic linear regression, support vector machines, neural networks, entropy-based decision tree models, and k-nearest neighbors approaches. We first analyze the general performance of the model and then identify the most important features for predicting participant answers to GROC. After we identify the “key” features, we separate them from the dataset and demonstrate the effectiveness of these features at identifying GROC. In doing so, we highlight possible directions for future study both mathematically and clinically.

Download Full-text

Evaluation of Light Gradient Boosted Machine Learning Technique in Large Scale Land Use and Land Cover Classification

Environments ◽

10.3390/environments7100084 ◽

2020 ◽

Vol 7 (10) ◽

pp. 84

Author(s):

Dakota Aaron McCarty ◽

Hyun Woo Kim ◽

Hye Kyung Lee

Keyword(s):

Machine Learning ◽

Land Use ◽

Support Vector Machines ◽

Random Forest ◽

Land Cover ◽

Large Scale ◽

Machine Learning Techniques ◽

Support Vector ◽

Light Gradient ◽

Vector Machines

The ability to rapidly produce accurate land use and land cover maps regularly and consistently has been a growing initiative as they have increasingly become an important tool in the efforts to evaluate, monitor, and conserve Earth’s natural resources. Algorithms for supervised classification of satellite images constitute a necessary tool for the building of these maps and they have made it possible to establish remote sensing as the most reliable means of map generation. In this paper, we compare three machine learning techniques: Random Forest, Support Vector Machines, and Light Gradient Boosted Machine, using a 70/30 training/testing evaluation model. Our research evaluates the accuracy of Light Gradient Boosted Machine models against the more classic and trusted Random Forest and Support Vector Machines when it comes to classifying land use and land cover over large geographic areas. We found that the Light Gradient Booted model is marginally more accurate with a 0.01 and 0.059 increase in the overall accuracy compared to Support Vector and Random Forests, respectively, but also performed around 25% quicker on average.

Download Full-text

Machine Learning for Estimating Leaf Dust Retention Based on Hyperspectral Measurements

Journal of Sensors ◽

10.1155/2018/6026259 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Wenlong Jing ◽

Xia Zhou ◽

Chen Zhang ◽

Chongyang Wang ◽

Hao Jiang

Keyword(s):

Machine Learning ◽

Southern China ◽

Regional Scale ◽

Hyperspectral Data ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Machine Learning Techniques ◽

Support Vector ◽

Plant Leaves ◽

Dust Retention

Hyperspectral sensors provide detailed information for dust retention content (DRC) estimation. However, rich hyperspectral data are not fully utilized by traditional image analysis techniques. We integrated several recently developed machine learning algorithms to estimate DRC on plant leaves using the spectra measured by the ASD FieldSpec 3. The experiments were carried out on three common green plants of southern China. The important hyperspectral variables were first identified by applying the random forest (RF) algorithm. Three estimation models were then developed using the support vector machine (SVM), classification and regression tree (CART), and RF algorithms. The results showed that the increase in dust retention contents on plant leaves enhanced their reflectance in the visible wavelength but weakened their reflectance in the infrared wavelength. Wavelengths in the ranges of 450–500 nm, 550–600 nm, 750–1000 nm, and 1100–1300 nm were identified as important variables using the RF algorithm and were used to estimate the DRC. The comparison of the three machine learning techniques for DRC estimation confirmed that the SVM and RF models performed well because their estimations were similar to the measured DRC. Specifically, the average R2 for SVM and RF model are 0.85 and 0.88. The technical approach of this study proved to be a successful illustration of using hyperspectral measurements to estimate the DRC on plant leaves. The findings of this study can be applied to monitor the DRC on leaves of other plants and can also be integrated with other types of spectral data to measure the DRC at a regional scale.

Download Full-text

A Comparison of the Performance of Supervised Learning Algorithms for Solar Power Prediction

Energies ◽

10.3390/en14154424 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4424

Author(s):

Leidy Gutiérrez ◽

Julian Patiño ◽

Eduardo Duque-Grisales

Keyword(s):

Machine Learning ◽

Power Generation ◽

Large Scale ◽

Fossil Fuels ◽

Machine Learning Techniques ◽

Support Vector ◽

Power Prediction ◽

Electric Networks ◽

K Nearest Neighbors ◽

Supervised Learning Algorithms

Science seeks strategies to mitigate global warming and reduce the negative impacts of the long-term use of fossil fuels for power generation. In this sense, implementing and promoting renewable energy in different ways becomes one of the most effective solutions. The inaccuracy in the prediction of power generation from photovoltaic (PV) systems is a significant concern for the planning and operational stages of interconnected electric networks and the promotion of large-scale PV installations. This study proposes the use of Machine Learning techniques to model the photovoltaic power production for a system in Medellín, Colombia. Four forecasting models were generated from techniques compatible with Machine Learning and Artificial Intelligence methods: K-Nearest Neighbors (KNN), Linear Regression (LR), Artificial Neural Networks (ANN) and Support Vector Machines (SVM). The results obtained indicate that the four methods produced adequate estimations of photovoltaic energy generation. However, the best estimate according to RMSE and MAE is the ANN forecasting model. The proposed Machine Learning-based models were demonstrated to be practical and effective solutions to forecast PV power generation in Medellin.

Download Full-text

“Big Data” in Educational Administration: An Application for Predicting School Dropout Risk

Educational Administration Quarterly ◽

10.1177/0013161x18799439 ◽

2018 ◽

Vol 55 (3) ◽

pp. 404-446 ◽

Cited By ~ 6

Author(s):

Lucy C. Sorensen

Keyword(s):

Machine Learning ◽

Educational Administration ◽

Large Scale ◽

Economic Recession ◽

Machine Learning Techniques ◽

Dropping Out ◽

Support Vector ◽

Full Potential ◽

North Carolina Department ◽

The North

Purpose: In an era of unprecedented student measurement and emphasis on data-driven educational decision making, the full potential for using data to target resources to students has yet to be realized. This study explores the utility of machine-learning techniques with large-scale administrative data to identify student dropout risk. Research Methods: Using longitudinal student records data from the North Carolina Department of Public Instruction, this article assesses modern prediction techniques, with a focus on tree-based classification methods and support vector machines. These methods incorporate 74 predictors measures from Grades 3 through 8, including academic achievement, behavioral indicators, and socioeconomic and demographic characteristics. Findings: Two of the assessed classification algorithms predict high school graduation and dropping out correctly for more than 90% of an out-of-sample student cohort. Findings reveal a shift toward lower dropout incidence in regions hit hardest by the economic recession of 2008, especially for male students. Implications for Research and Practice: Machine-learning procedures, as demonstrated in this study, offer promise for allowing administrators to reliably identify students at risk of dropping out of school so as to provide targeted, intensive programs at the lowest possible cost.

Download Full-text

Identification of Significative LiDAR Metrics and Comparison of Machine Learning Approaches for Estimating Stand and Diversity Variables in Heterogeneous Brazilian Atlantic Forest

Remote Sensing ◽

10.3390/rs13132444 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2444

Author(s):

Rorai Pereira Martins-Neto ◽

Antonio Maria Garcia Tommaselli ◽

Nilton Nobuhiro Imai ◽

Hassan Camil David ◽

Milto Miltiadou ◽

...

Keyword(s):

Machine Learning ◽

Tropical Forests ◽

Atlantic Forest ◽

Tree Species ◽

Ordinary Least Squares ◽

Machine Learning Techniques ◽

Support Vector ◽

Brazilian Atlantic Forest ◽

Learning Techniques ◽

Mean Diameter

Data collection and estimation of variables that describe the structure of tropical forests, diversity, and richness of tree species are challenging tasks. Light detection and ranging (LiDAR) is a powerful technique due to its ability to penetrate small openings and cracks in the forest canopy, enabling the collection of structural information in complex forests. Our objective was to identify the most significant LiDAR metrics and machine learning techniques to estimate the stand and diversity variables in a disturbed heterogeneous tropical forest. Data were collected in a remnant of the Brazilian Atlantic Forest with different successional stages. LiDAR metrics were used in three types of transformation: (i) raw data (untransformed), (ii) correlation analysis, and (iii) principal component analysis (PCA). These transformations were tested with four machine learning techniques: (i) artificial neural network (ANN), ordinary least squares (OLS), random forests (RF), and support vector machine (SVM) with different configurations resulting in 27 combinations. The best technique was determined based on the lowest RMSE (%) and corrected Akaike information criterion (AICc), and bias (%) values close to zero. The output forest variables were mean diameter at breast height (MDBH), quadratic mean diameter (QMD), basal area (BA), density (DEN), number of tree species (NTS), as well as Shannon–Waver (H’) and Simpson’s diversity indices (D). The best input data were the new variables obtained from the PCA, and the best modeling method was ANN with two hidden layers for the variables MDBH, QMD, BA, and DEN while for NTS, H’and D, the ANN with three hidden layers were the best methods. For MDBH, QMD, H’and D, the RMSE was 5.2–10% with a bias between −1.7% and 3.6%. The BA, DEN, and NTS were the most difficult variables to estimate, due to their complexity in tropical forests; the RMSE was 16.2–27.6% and the bias between −12.4% and −0.24%. The results showed that it is possible to estimate the stand and diversity variables in heterogeneous forests with LiDAR data.

Download Full-text

Using Machine Learning Algorithms on Prediction of Stock Price

Journal of Modeling and Optimization ◽

10.32732/jmo.2020.12.2.84 ◽

2020 ◽

Vol 12 (2) ◽

pp. 84-99

Author(s):

Li-Pang Chen

Keyword(s):

Machine Learning ◽

Stock Price ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Short Term ◽

Learning Techniques ◽

Historical Database ◽

Long Short Term Memory

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.

Download Full-text

A Comparative Study of Different Machine Learning Algorithms for Disease Prediction

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i7/0177 ◽

2017 ◽

Vol 7 (7) ◽

pp. 172

Author(s):

Anantvir Singh Romana

Keyword(s):

Machine Learning ◽

Subsequent Treatment ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Disease Prediction ◽

Classification Problems ◽

Learning Techniques ◽

Neural Network Classifiers ◽

Diagnostic Detection

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.

Download Full-text

Structure-Based Virtual Screening of Perfluoroalkyl and Polyfluoroalkyl Substances (PFASs) as Endocrine Disruptors of Androgen Receptor Activity Using Molecular Docking and Machine Learning

10.26434/chemrxiv.11886702.v1 ◽

2020 ◽

Author(s):

Azhagiya Singam Ettayapuram Ramaprasad ◽

Phum Tachachartvanich ◽

Denis Fourches ◽

Anatoly Soshilov ◽

Jennifer C.Y. Hsieh ◽

...

Keyword(s):

Machine Learning ◽

Molecular Docking ◽

Androgen Receptor ◽

Endocrine Disruptors ◽

Hormone Receptors ◽

Steroid Hormone Receptors ◽

Machine Learning Techniques ◽

Support Vector ◽

Polyfluoroalkyl Substances ◽

Perfluoroalkyl And Polyfluoroalkyl Substances

Perfluoroalkyl and Polyfluoroalkyl Substances (PFASs) pose a substantial threat as endocrine disruptors, and thus early identification of those that may interact with steroid hormone receptors, such as the androgen receptor (AR), is critical. In this study we screened 5,206 PFASs from the CompTox database against the different binding sites on the AR using both molecular docking and machine learning techniques. We developed support vector machine models trained on Tox21 data to classify the active and inactive PFASs for AR using different chemical fingerprints as features. The maximum accuracy was 95.01% and Matthew’s correlation coefficient (MCC) was 0.76 respectively, based on MACCS fingerprints (MACCSFP). The combination of docking-based screening and machine learning models identified 29 PFASs that have strong potential for activity against the AR and should be considered priority chemicals for biological toxicity testing.

Download Full-text