Identifying key drivers of peat-fires across Kalimantan’s ex‑Mega Rice Project area using machine learning

Throughout Indonesia - ecological degradation, agricultural expansion, and the digging of draining canals has compromised the integrity and functioning of large swathes of peatland forest, leaving behind a fragmented landscape of scrubland, successional forest, and newly established plantations. These landscapes are susceptible to extensive and intensive wildfires that rage out of control each year. One of the most affected regions is the ex-Mega Rice Project (EMRP) area in Central Kalimantan on the island of Borneo, where 1 million ha of peatland forest were cleared and 4000 km of canals were dug between 1996-1998, in an attempt to initiate large scale industrial rice cultivation. This led to disturbances to the underlying hydrology, the local ecology, and the ability of the local population to maintain a livelihood, who&#8217;s efforts are thwarted each year but the returning wildfires.Directing&#160; fire prevention and mitigation efforts requires a detailed understanding of the main drivers of fire distribution and the conditions of initiation. To this end, we have developed a fire susceptibility model using machine learning (XGBoost random forest) that characterises the relationships between key predictor variables and the distribution of historic fire locations. Using the model, we determine the relative importance of each predictor variable in controlling the initiation and spread of fires. We included land-cover classifications, a forest clearance index, vegetation indices , drought indexes, distances to infrastructure , topography, and peat depth, as well as the Oceanic Ni&#241;o Index (ONI). The model was trained to separate burnt areas from not burnt areas using point samples of predictor variables taken from both, and then tested by applying the model across the entire study area for all years. The model performance consistently scores highly in both accuracy and precision across all years (>0.75 and >0.68 respectively), though recall metrics are much lower (>0.25).Our results confirm the anthropogenic dependence of extreme fire events in the region, with the distance to settlements, and distance to canals consistently weighted as some of the most important driving factors within the model structure. In combination, the vegetation indices were the strongest indicators of fire prevalence. Ours is the first analysis in the region to encompass the full range of driving factors within a single model that captures the inter-annual variation as well as the spatial distribution of peatland fires. Our results can be used to target the root causes of fire initiation and propagation to better construct regulation and rehabilitation efforts to mitigate future wildfires.

Download Full-text

Machine Learning Models of COVID-19 Cases in the United States: A Study of Initial Lockdown and Reopen Regimes

Applied Sciences ◽

10.3390/app112311227 ◽

2021 ◽

Vol 11 (23) ◽

pp. 11227

Author(s):

Arnold Kamis ◽

Yudan Ding ◽

Zhenzhen Qu ◽

Chenchen Zhang

Keyword(s):

United States ◽

Machine Learning ◽

Additive Model ◽

Regression Tree ◽

Predictor Variable ◽

The United States ◽

Predictor Variables ◽

Future Research ◽

Machine Learning Methods ◽

Variance Explained

The purpose of this paper is to model the cases of COVID-19 in the United States from 13 March 2020 to 31 May 2020. Our novel contribution is that we have obtained highly accurate models focused on two different regimes, lockdown and reopen, modeling each regime separately. The predictor variables include aggregated individual movement as well as state population density, health rank, climate temperature, and political color. We apply a variety of machine learning methods to each regime: Multiple Regression, Ridge Regression, Elastic Net Regression, Generalized Additive Model, Gradient Boosted Machine, Regression Tree, Neural Network, and Random Forest. We discover that Gradient Boosted Machines are the most accurate in both regimes. The best models achieve a variance explained of 95.2% in the lockdown regime and 99.2% in the reopen regime. We describe the influence of the predictor variables as they change from regime to regime. Notably, we identify individual person movement, as tracked by GPS data, to be an important predictor variable. We conclude that government lockdowns are an extremely important de-densification strategy. Implications and questions for future research are discussed.

Download Full-text

Application of machine learning to map the global distribution of deep-sea sediments

10.5194/egusphere-egu2020-6755 ◽

2020 ◽

Author(s):

Markus Diesing

Keyword(s):

Machine Learning ◽

Deep Sea ◽

Learning Algorithm ◽

Predictor Variable ◽

Global Ocean ◽

Predictor Variables ◽

Maximum Temperature ◽

Sea Floor ◽

Deep Sea Sediments ◽

Calcareous Sediment

The deep-sea floor accounts for >90% of seafloor area and >70% of the Earth&#8217;s surface. It acts as a receptor of the particle flux from the surface layers of the global ocean, is a place of biogeochemical cycling, records environmental and climate conditions through time and provides habitat for benthic organisms. Maps of the spatial patterns of deep-sea sediments are therefore a major prerequisite for many studies addressing aspects of deep-sea sedimentation, biogeochemistry, ecology and related fields.A new digital map of deep-sea sediments of the global ocean is presented. The map was derived by applying the Random Forest machine-learning algorithm to published sample data of seafloor lithologies and environmental predictor variables. The selection of environmental predictors was initially based on the current understanding of the controls on the distribution of deep-sea sediments and the availability of data. A predictor variable selection process ensured that only important and uncorrelated variables were employed in the model. The three most important predictor variables were sea-surface maximum salinity, sea-floor maximum temperature and bathymetry. The occurrence probabilities of seven seafloor lithologies (Calcareous sediment, Clay, Diatom ooze, Lithogenous sediment, Mixed calcareous-siliceous ooze, Radiolarian ooze and Siliceous mud) were spatially predicted. The final map shows the most probable seafloor lithology and an associated probability value, which may be viewed as a spatially explicit measure of map confidence. An assessment of the accuracy of the map was based on a test set of observations not used for model training. Overall map accuracy was 69.5% (95% confidence interval: 67.9% - 71.1%). The sea-floor lithology map bears some resemblance with previously published hand-drawn maps in that the distribution of Calcareous sediment, Clay and Diatom ooze are very similar. Clear differences were however also noted: Most strikingly, the map presented here does not display a band of Radiolarian ooze in the equatorial Pacific.The probability surfaces of individual seafloor lithologies, the categorical map of the seven mapped lithologies and the associated map confidence will be made freely available. It is hoped that they form a useful basis for research pertaining to deep-sea sediments.

Download Full-text

Continuous Monitoring of Cotton Stem Water Potential using Sentinel-2 Imagery

Remote Sensing ◽

10.3390/rs12071176 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1176 ◽

Cited By ~ 1

Author(s):

Yukun Lin ◽

Zhe Zhu ◽

Wenxuan Guo ◽

Yazhou Sun ◽

Xiaoyuan Yang ◽

...

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Water Potential ◽

Large Scale ◽

Temporal Frequency ◽

Vegetation Indices ◽

Stem Water Potential ◽

Spectral Bands ◽

Red Edge ◽

Sentinel 2

Monitoring cotton status during the growing season is critical in increasing production efficiency. The water status in cotton is a key factor for yield and cotton quality. Stem water potential (SWP) is a precise indicator for assessing cotton water status. Satellite remote sensing is an effective approach for monitoring cotton growth at a large scale. The aim of this study is to estimate cotton water stress at a high temporal frequency and at a large scale. In this study, we measured midday SWP samples according to the acquisition dates of Sentinel-2 images and used them to build linear-regression-based and machine-learning-based models to estimate cotton water stress during the growing season (June to August, 2018). For the linear-regression-based method, we estimated SWP based on different Sentinel-2 spectral bands and vegetation indices, where the normalized difference index 45 (NDI45) achieved the best performance (R2 = 0.6269; RMSE = 3.6802 (-1*swp (bars))). For the machine-learning-based method, we used random forest regression to estimate SWP and received even better results (R2 = 0.6709; RMSE = 3.3742 (-1*swp (bars))). To find the best selection of input variables for the machine-learning-based approach, we tried three different data input datasets, including (1) 9 original spectral bands (e.g., blue, green, red, red edge, near infrared (NIR), and shortwave infrared (SWIR)), (2) 21 vegetation indices, and (3) a combination of original Sentinel-2 spectral bands and vegetation indices. The highest accuracy was achieved when only the original spectral bands were used. We also found the SWIR and red edge band were the most important spectral bands, and the vegetation indices based on red edge and NIR bands were particularly helpful. Finally, we applied the best approach for the linear-regression-based and the machine-learning-based methods to generate cotton water potential maps at a large scale and high temporal frequency. Results suggests that the methods developed here has the potential for continuous monitoring of SWP at large scales and the machine-learning-based method is preferred.

Download Full-text

Large Scale Shrub Biomass Estimates for Multiple Purposes

Life ◽

10.3390/life10040033 ◽

2020 ◽

Vol 10 (4) ◽

pp. 33

Author(s):

Teresa Enes ◽

José Lousada ◽

Teresa Fonseca ◽

Hélder Viana ◽

Ana Calvão ◽

...

Keyword(s):

Forest Fires ◽

Large Scale ◽

Energy Use ◽

Fire Behavior ◽

Predictor Variable ◽

Fuel Load ◽

Soil Protection ◽

P Value ◽

Shrub Biomass ◽

Burnt Areas

With the increase of forest fires in Portugal in recent decades, a significant part of woodlands is being converted into shrubland areas. Background: From an ecological point of view, woodlands and shrublands play an essential role, as they not only prevent soil erosion and desertification, but also contribute to soil protection, habitat preservation and restoration, and also increased biodiversity for carbon sequestration. Concerning the shrublands, the assessment of their biomass is essential for evaluating the fuel load and forest fire behavior and also beneficial for obtaining estimates of carbon and biomass for energy use. Methods: In this study, we collected data about the potential shrub biomass accumulation along fifteen years in former burnt areas within North Portugal. Results: The achieved results showed that for a post-fire period ranging from one to 15 years, the accumulated shrubs’ biomass ranged from 0.12 up to 28.88 Mg ha−1. The model developed to estimate the shrub biomass using the time after a fire (age) as a predictor variable presented a high adjustment to data (p-value of the F statistic <0.01 and R2 = 0.89), allowing estimating shrub biomass regeneration within former burnt areas with an RMSE of 3.31 Mg ha−1. Conclusions: This paper provides practical information on the availability and assessment of shrub biomass in North Portugal, highlighting the suitability of shrubs as potential sources of biomass.

Download Full-text

Estimation of Nitrogen in Rice Crops from UAV-Captured Images

Remote Sensing ◽

10.3390/rs12203396 ◽

2020 ◽

Vol 12 (20) ◽

pp. 3396

Author(s):

Julian D. Colorado ◽

Natalia Cera-Bornacelli ◽

Juan S. Caldas ◽

Eliel Petro ◽

Maria C. Rebolledo ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Vegetation Indices ◽

Ground Truth ◽

Leaf Nitrogen ◽

Growth And Yield ◽

Support Vector ◽

Image Processing Algorithm ◽

Stabilization Control ◽

Leaf N Content

Leaf nitrogen (N) directly correlates to chlorophyll production, affecting crop growth and yield. Farmers use soil plant analysis development (SPAD) devices to calculate the amount of chlorophyll present in plants. However, monitoring large-scale crops using SPAD is prohibitively time-consuming and demanding. This paper presents an unmanned aerial vehicle (UAV) solution for estimating leaf N content in rice crops, from multispectral imagery. Our contribution is twofold: (i) a novel trajectory control strategy to reduce the angular wind-induced perturbations that affect image sampling accuracy during UAV flight, and (ii) machine learning models to estimate the canopy N via vegetation indices (VIs) obtained from the aerial imagery. This approach integrates an image processing algorithm using the GrabCut segmentation method with a guided filtering refinement process, to calculate the VIs according to the plots of interest. Three machine learning methods based on multivariable linear regressions (MLR), support vector machines (SVM), and neural networks (NN), were applied and compared through the entire phonological cycle of the crop: vegetative (V), reproductive (R), and ripening (Ri). Correlations were obtained by comparing our methods against an assembled ground-truth of SPAD measurements. The higher N correlations were achieved with NN: 0.98 (V), 0.94 (R), and 0.89 (Ri). We claim that the proposed UAV stabilization control algorithm significantly improves on the N-to-SPAD correlations by minimizing wind perturbations in real-time and reducing the need for offline image corrections.

Download Full-text

Large-Scale Data Learning Method for Anomaly Detection using Machine Learning for Monitoring Vibration in Vehicle Equipment

IEEJ Transactions on Industry Applications ◽

10.1541/ieejias.140.480 ◽

2020 ◽

Vol 140 (6) ◽

pp. 480-487

Author(s):

Minoru Kondo

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Large Scale ◽

Learning Method ◽

Large Scale Data ◽

Scale Data

Download Full-text

Coded Computing: Mitigating Fundamental Bottlenecks in Large-Scale Distributed Computing and Machine Learning

10.1561/9781680837056 ◽

2020 ◽

Author(s):

Songze Li ◽

Salman Avestimehr

Keyword(s):

Machine Learning ◽

Distributed Computing ◽

Large Scale

Download Full-text

Evolution of Metastable Structures in Bimetallic Catalysts from Microscopy and Machine-Learning Molecular Dynamics

10.26434/chemrxiv.11811660.v1 ◽

2020 ◽

Author(s):

Jin Soo Lim ◽

Jonathan Vandermause ◽

Matthijs A. van Spronsen ◽

Albert Musaelian ◽

Christopher R. O’Connor ◽

...

Keyword(s):

Machine Learning ◽

Molecular Dynamics ◽

Large Scale ◽

Materials Science ◽

Complete Characterization ◽

Layer By Layer ◽

Surface Restructuring ◽

Metastable Structures ◽

Mechanistic Investigation ◽

Underlying Mechanisms

Restructuring of interface plays a crucial role in materials science and heterogeneous catalysis. Bimetallic systems, in particular, often adopt very different composition and morphology at surfaces compared to the bulk. For the first time, we reveal a detailed atomistic picture of the long-timescale restructuring of Pd deposited on Ag, using microscopy, spectroscopy, and novel simulation methods. Encapsulation of Pd by Ag always precedes layer-by-layer dissolution of Pd, resulting in significant Ag migration out of the surface and extensive vacancy pits. These metastable structures are of vital catalytic importance, as Ag-encapsulated Pd remains much more accessible to reactants than bulk-dissolved Pd. The underlying mechanisms are uncovered by performing fast and large-scale machine-learning molecular dynamics, followed by our newly developed method for complete characterization of atomic surface restructuring events. Our approach is broadly applicable to other multimetallic systems of interest and enables the previously impractical mechanistic investigation of restructuring dynamics.

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.

Download Full-text