Handling Missing Data in Large-Scale MODIS AOD Products Using a Two-Step Model

Aerosol optical depth (AOD) is a key parameter that reflects the characteristics of aerosols, and is of great help in predicting the concentration of pollutants in the atmosphere. At present, remote sensing inversion has become an important method for obtaining the AOD on a large scale. However, AOD data acquired by satellites are often missing, and this has gradually become a popular topic. In recent years, a large number of AOD recovery algorithms have been proposed. Many AOD recovery methods are not application-oriented. These methods focus mainly on to the accuracy of AOD recovery and neglect the AOD recovery ratio. As a result, the AOD recovery accuracy and recovery ratio cannot be balanced. To solve these problems, a two-step model (TWS) that combines multisource AOD data and AOD spatiotemporal relationships is proposed. We used the light gradient boosting (LightGBM) model under the framework of the gradient boosting machine (GBM) to fit the multisource AOD data to fill in the missing AOD between data sources. Spatial interpolation and spatiotemporal interpolation methods are limited by buffer factors. We recovered the missing AOD in a moving window. We used TWS to recover AOD from Terra Satellite’s 2018 AOD product (MOD AOD). The results show that the MOD AOD, after a 3 × 3 moving window TWS recovery, was closely related to the AOD of the Aerosol Robotic Network (AERONET) (R = 0.87, RMSE = 0.23). In addition, the MOD AOD missing rate after a 3 × 3 window TWS recovery was greatly reduced (from 0.88 to 0.1). In addition, the spatial distribution characteristics of the monthly and annual averages of the recovered MOD AOD were consistent with the original MOD AOD. The results show that TWS is reliable. This study provides a new method for the restoration of MOD AOD, and is of great significance for studying the spatial distribution of atmospheric pollutants.

Download Full-text

Monitoring of Urban Black-Odor Water Based on Nemerow Index and Gradient Boosting Decision Tree Regression Using UAV-Borne Hyperspectral Imagery

Remote Sensing ◽

10.3390/rs11202402 ◽

2019 ◽

Vol 11 (20) ◽

pp. 2402 ◽

Cited By ~ 6

Author(s):

Wei ◽

Huang ◽

Wang ◽

Zhou ◽

...

Keyword(s):

Spatial Distribution ◽

Decision Tree ◽

Regression Models ◽

Large Scale ◽

Hyperspectral Imagery ◽

Action Plan ◽

Training Dataset ◽

Gradient Boosting ◽

Pollution Level ◽

Urban Habitat

The formation of black-odor water in urban rivers has a long history. It not only seriously affects the image of the city, but also easily breeds germs and damages the urban habitat. The prevention and treatment of urban black-odor water have long been important topics nationwide. “Action Plan for Prevention and Control of Water Pollution” issued by the State Council shows Chinese government’s high attention to this issue. However, treatment and monitoring are inextricably linked. There are few studies on the large-scale monitoring of black-odor water, especially the cases of using unmanned aerial vehicle (UAV) to efficiently and accurately monitor the spatial distribution of urban river pollution. Therefore, in order to get rid of the limitations of traditional ground sampling to evaluate the point source pollution of rivers, the UAV-borne hyperspectral imagery was applied in this paper. It is hoped to grasp the pollution status of the entire river as soon as possible from the surface. However, the retrieval of multiple water quality parameters will lead to cumulative errors, so the Nemerow comprehensive pollution index (NCPI) is introduced to characterize the pollution level of urban water. In the paper, the retrieval results of six regression models including gradient boosting decision tree regression (GBDTR) were compared, trying to find a regression model for the retrieval NCPI in the current scenario. In the first study area, the retrieval accuracy of the training dataset (adjusted_R2 = 0.978), and test dataset (adjusted_R2 = 0.974) was higher than that of the other regression models. Although the retrieval effect of random forest is similar to that of GBDTR in both training accuracy and image inversion, it is more computationally expensive. Finally, the spatial distribution graphs of NCPI and its technical feasibility in monitoring pollution sources were investigated, in combination with field observations.

Download Full-text

Predicting Hard Rock Pillar Stability Using GBDT, XGBoost, and LightGBM Algorithms

Mathematics ◽

10.3390/math8050765 ◽

2020 ◽

Vol 8 (5) ◽

pp. 765 ◽

Cited By ~ 6

Author(s):

Weizhang Liang ◽

Suizhi Luo ◽

Guoyan Zhao ◽

Hao Wu

Keyword(s):

Large Scale ◽

Prediction Models ◽

Hard Rock ◽

Gradient Boosting ◽

Pillar Stability ◽

Rock Pillar ◽

Light Gradient ◽

Gradient Boosting Machine ◽

Extreme Gradient Boosting ◽

Hard Rock Mines

Predicting pillar stability is a vital task in hard rock mines as pillar instability can cause large-scale collapse hazards. However, it is challenging because the pillar stability is affected by many factors. With the accumulation of pillar stability cases, machine learning (ML) has shown great potential to predict pillar stability. This study aims to predict hard rock pillar stability using gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) algorithms. First, 236 cases with five indicators were collected from seven hard rock mines. Afterwards, the hyperparameters of each model were tuned using a five-fold cross validation (CV) approach. Based on the optimal hyperparameters configuration, prediction models were constructed using training set (70% of the data). Finally, the test set (30% of the data) was adopted to evaluate the performance of each model. The precision, recall, and F1 indexes were utilized to analyze prediction results of each level, and the accuracy and their macro average values were used to assess the overall prediction performance. Based on the sensitivity analysis of indicators, the relative importance of each indicator was obtained. In addition, the safety factor approach and other ML algorithms were adopted as comparisons. The results showed that GBDT, XGBoost, and LightGBM algorithms achieved a better comprehensive performance, and their prediction accuracies were 0.8310, 0.8310, and 0.8169, respectively. The average pillar stress and ratio of pillar width to pillar height had the most important influences on prediction results. The proposed methodology can provide a reliable reference for pillar design and stability risk management.

Download Full-text

Pressure of different gases injected into large-scale coal matrix: Analysis of time–space dependence and prediction using light gradient boosting machine

Fuel ◽

10.1016/j.fuel.2020.118448 ◽

2020 ◽

Vol 279 ◽

pp. 118448

Author(s):

Bin Zhou ◽

Jiang Xu ◽

Feng Han ◽

Fazhi Yan ◽

Shoujian Peng ◽

...

Keyword(s):

Large Scale ◽

Matrix Analysis ◽

Gradient Boosting ◽

Coal Matrix ◽

Light Gradient ◽

Time Space ◽

Gradient Boosting Machine ◽

Space Dependence ◽

Different Gases

Download Full-text

A novel methodology using MODIS and CERES for assessing the daily radiative forcing of smoke aerosols in large scale over the Amazonia

Atmospheric Chemistry and Physics Discussions ◽

10.5194/acpd-14-31515-2014 ◽

2014 ◽

Vol 14 (22) ◽

pp. 31515-31550

Author(s):

E. T. Sena ◽

P. Artaxo

Keyword(s):

Remote Sensing ◽

Spatial Distribution ◽

Radiative Transfer ◽

Biomass Burning ◽

Satellite Remote Sensing ◽

Radiative Forcing ◽

Large Scale ◽

High Temporal Resolution ◽

Modis Aod ◽

The Impact

Abstract. A new methodology was developed for obtaining daily retrievals of the direct radiative forcing of aerosols (24h-DARF) at the top of the atmosphere (TOA) using satellite remote sensing. For that, simultaneous CERES (Clouds and Earth's Radiant Energy System) shortwave flux at the top of the atmosphere (TOA) and MODIS (Moderate Resolution Spectroradiometer) aerosol optical depth (AOD) retrievals were used. This methodology is applied over a large region of Brazilian Amazonia. We focused our studies on the peak of the biomass burning season (August to September) from 2000 to 2009 to analyse the impact of forest smoke on the radiation balance. To assess the spatial distribution of the DARF, background scenes without biomass burning impacts, were defined as scenes with MODIS AOD < 0.1. The fluxes at the TOA retrieved by CERES for those clean conditions (Fcl) were estimated as a function of the illumination geometry (θ0) for each 0.5° × 0.5° grid cell. The instantaneous DARF was obtained as the difference between clean Fcl (θ0) and the polluted mean flux at the TOA measured by CERES in each cell (Fpol (θ0)). The radiative transfer code SBDART (Santa Barbara DISORT Radiative Transfer model) was used to expand instantaneous DARFs to 24 h averages. With this methodology it is possible to assess the DARF both at large scale and at high temporal resolution. This new methodology also showed to be more robust, because it considerably reduces statistical sources of uncertainties in the estimates of the DARF, when compared to previous assessments of the DARF using satellite remote sensing. The spatial distribution of the 24h-DARF shows that, for some cases, the mean 24h-DARF presents local values as high as −30 W m−2. The temporal variability of the 24h-DARF along the biomass burning season was also studied and showed large intraseasonal and interannual variability. In an attempt to validate the radiative forcing obtained in this work using CERES and MODIS, those results were compared to coincident AERONET ground based estimates of the DARF. This analysis showed that CERES-MODIS and AERONET 24h-DARF are related as DARFCERES-MODIS24 h = (1.07 ± 0.04)DARFAERONET24 h −(0.0 ± 0.6). This is a significant result, considering that the 24h-DARF retrievals were obtained by applying completely different methodologies, and using different instruments. The instantaneous CERES-MODIS DARF was also compared with radiative transfer evaluations of the forcing. To validate the aerosol and surface models used in the simulations, downward shortwave fluxes at the surface evaluated using SBDART and measured by pyranometers were compared. The simulated and measured downward fluxes are related through FBOAPYRANOMETER = (1.00 ± 0.04)FBOASBDART −(20 ± 27), indicating that the models and parameters used in the simulations were consistent. The relationship between CERES-MODIS instantaneous DARF and calculated SBDART forcing was satisfactory, with DARFCERES-MODIS = (0.86 ± 0.06)DARFSBDART −(6 ± 2). Those analysis showed a good agreement between satellite remote sensing, ground-based and radiative transfer evaluated DARF, demonstrating the robustness of the new proposed methodology for calculated radiative forcing for biomass burning aerosols. To our knowledge, this was the first time satellite remote sensing assessments of the DARF were compared with ground based DARF estimates.

Download Full-text

An Ensemble Learning-Based Method for Inferring Drug-Target Interactions Combining Protein Sequences and Drug Fingerprints

BioMed Research International ◽

10.1155/2021/9933873 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Zheng-Yang Zhao ◽

Wen-Zhun Huang ◽

Xin-Ke Zhan ◽

Jie Pan ◽

Yu-An Huang ◽

...

Keyword(s):

Drug Target ◽

Large Scale ◽

Biological Evolution ◽

Gradient Boosting ◽

Support Vector ◽

Data Sets ◽

Standard Data ◽

Light Gradient ◽

Golden Standard ◽

Drug Reposition

Identifying the interactions of the drug-target is central to the cognate areas including drug discovery and drug reposition. Although the high-throughput biotechnologies have made tremendous progress, the indispensable clinical trials remain to be expensive, laborious, and intricate. Therefore, a convenient and reliable computer-aided method has become the focus on inferring drug-target interactions (DTIs). In this research, we propose a novel computational model integrating a pyramid histogram of oriented gradients (PHOG), Position-Specific Scoring Matrix (PSSM), and rotation forest (RF) classifier for identifying DTIs. Specifically, protein primary sequences are first converted into PSSMs to describe the potential biological evolution information. After that, PHOG is employed to mine the highly representative features of PSSM from multiple pyramid levels, and the complete describers of drug-target pairs are generated by combining the molecular substructure fingerprints and PHOG features. Finally, we feed the complete describers into the RF classifier for effective prediction. The experiments of 5-fold Cross-Validations (CV) yield mean accuracies of 88.96%, 86.37%, 82.88%, and 76.92% on four golden standard data sets (enzyme, ion channel, G protein-coupled receptors (GPCRs), and nuclear receptor, respectively). Moreover, the paper also conducts the state-of-art light gradient boosting machine (LGBM) and support vector machine (SVM) to further verify the performance of the proposed model. The experimental outcomes substantiate that the established model is feasible and reliable to predict DTIs. There is an excellent prospect that our model is capable of predicting DTIs as an efficient tool on a large scale.

Download Full-text

Large-scale mapping of the catenas vegetation in Subarctic tundra

Geobotanical mapping ◽

10.31111/geobotmap/1993.3 ◽

1995 ◽

pp. 3-21

Author(s):

S. S. Kholod

Keyword(s):

Spatial Distribution ◽

Vegetation Cover ◽

Large Scale ◽

Block Diagram ◽

Abiotic Factors ◽

Ecological Factors ◽

Spatial Arrangement ◽

Geological Time ◽

Ecological Barriers ◽

Functional Zones

One of the most difficult tasks in large-scale vegetation mapping is the clarification of mechanisms of the internal integration of vegetation cover territorial units. Traditional way of searching such mechanisms is the study of ecological factors controlling the space heterogeneity of vegetation cover. In essence, this is autecological analysis of vegetation. We propose another way of searching the mechanisms of territorial integration of vegetation. It is connected with intracoenotic interrelation, in particular, with the changing role of edificator synusium in a community along the altitudinal gradient. This way of searching is illustrated in the model-plot in subarctic tundra of Central Chukotka. Our further suggestion concerns the way of depicting these mechanisms on large-scale vegetation map. As a model object we chose the catena, that is the landscape formation including all geomorphjc positions of a slope, joint by the process of moving the material down the slope. The process of peneplanation of a mountain system for a long geological time favours to the levelling the lower (accumulative) parts of slopes. The colonization of these parts of the slope by the vegetation variants, corresponding to the lowest part of catena is the result of peneplanation. Vegetation of this part of catena makes a certain biogeocoenotic work which is the levelling of the small infralandscape limits and of the boundaries in vegetation cover. This process we name as the continualization on catena. In this process the variants of vegetation in the lower part of catena are being broken into separate synusiums. This is the process of decumbation of layers described by V. B. Sochava. Up to the slope the edificator power of the shrub synusiums sharply decreases. Moss and herb synusium have "to seek" the habitats similar to those under the shrub canopy. The competition between the synusium arises resulting in arrangement of a certain spatial assemblage of vegetation cover elements. In such assemblage the position of each element is determined by both biotic (interrelation with other coenotic elements) and abiotic (presence of appropriate habitats) factors. Taking into account the biogeocoenotic character of the process of continualization on catena we name such spatial assemblage an exolutionary-biogeocoenotic series. The space within each evolutionary-biogeocoenotic series is divided by ecological barriers into some functional zones. In each of the such zones the struggle between synusiums has its individual expression and direction. In the start zone of catena (extensive pediment) the interrelations of synusiums and layers control the mutual spatial arrangement of these elements at the largest extent. Here, as a rule, there predominate edificator synusiums of low and dwarfshrubs. In the first order limit zone (the bend of pediment to the above part of the slope) one-species herb and moss synusiums, oftenly substituting each other in similar habitats, get prevalence. In the zone of active colonization of slope (denudation slope) the coenotic factor has the least role in the spatial distribution of the vegetation cover elements. In particular, phytocoenotic interactions take place only within separate microcoenoses of herbs, mosses and lichens. In the zone of the attenuation of continualization process (the upper most parts of slope, crests) phytocoenotic interactions are almost absent and the spatial distribution of vegetation cover elements depends exclusively on the abiotic factors. The principal scheme of the distribution of vegetation cover elements and the disposition of functional zones on catena are shown on block-diagram (fig. 1).

Download Full-text

Modeling Spatiotemporal Population Changes by Integrating DMSP-OLS and NPP-VIIRS Nighttime Light Data in Chongqing, China

Remote Sensing ◽

10.3390/rs13020284 ◽

2021 ◽

Vol 13 (2) ◽

pp. 284

Author(s):

Dan Lu ◽

Yahui Wang ◽

Qingyuan Yang ◽

Kangchuan Su ◽

Haozhe Zhang ◽

...

Keyword(s):

Spatial Distribution ◽

Relative Error ◽

Urban Areas ◽

Large Scale ◽

Population Distribution ◽

Spatial Optimization ◽

Distribution Data ◽

Mountainous Areas ◽

Mean Relative Error ◽

Nighttime Light

The sustained growth of non-farm wages has led to large-scale migration of rural population to cities in China, especially in mountainous areas. It is of great significance to study the spatial and temporal pattern of population migration mentioned above for guiding population spatial optimization and the effective supply of public services in the mountainous areas. Here, we determined the spatiotemporal evolution of population in the Chongqing municipality of China from 2000–2018 by employing multi-period spatial distribution data, including nighttime light (NTL) data from the Defense Meteorological Satellite Program’s Operational Linescan System (DMSP-OLS) and the Suomi National Polar-orbiting Partnership Visible Infrared Imaging Radiometer Suite (NPP-VIIRS). There was a power function relationship between the two datasets at the pixel scale, with a mean relative error of NTL integration of 8.19%, 4.78% less than achieved by a previous study at the provincial scale. The spatial simulations of population distribution achieved a mean relative error of 26.98%, improved the simulation accuracy for mountainous population by nearly 20% and confirmed the feasibility of this method in Chongqing. During the study period, the spatial distribution of Chongqing’s population has increased in the west and decreased in the east, while also increased in low-altitude areas and decreased in medium-high altitude areas. Population agglomeration was common in all of districts and counties and the population density of central urban areas and its surrounding areas significantly increased, while that of non-urban areas such as northeast Chongqing significantly decreased.

Download Full-text

A Multi-Class Automatic Sleep Staging Method Based on Photoplethysmography Signals

Entropy ◽

10.3390/e23010116 ◽

2021 ◽

Vol 23 (1) ◽

pp. 116

Author(s):

Xiangfa Zhao ◽

Guobing Sun

Keyword(s):

Time Domain ◽

Single Channel ◽

Kappa Statistic ◽

Gradient Boosting ◽

Sleep Staging ◽

Challenging Problem ◽

Sleep State ◽

Light Gradient ◽

Gradient Boosting Machine ◽

The Time Domain

Automatic sleep staging with only one channel is a challenging problem in sleep-related research. In this paper, a simple and efficient method named PPG-based multi-class automatic sleep staging (PMSS) is proposed using only a photoplethysmography (PPG) signal. Single-channel PPG data were obtained from four categories of subjects in the CAP sleep database. After the preprocessing of PPG data, feature extraction was performed from the time domain, frequency domain, and nonlinear domain, and a total of 21 features were extracted. Finally, the Light Gradient Boosting Machine (LightGBM) classifier was used for multi-class sleep staging. The accuracy of the multi-class automatic sleep staging was over 70%, and the Cohen’s kappa statistic k was over 0.6. This also showed that the PMSS method can also be applied to stage the sleep state for patients with sleep disorders.

Download Full-text

A Review of Light Gradient Boosting Machine Method for Hate Speech Classification on Twitter

2020 2nd International Conference on Electrical, Control and Instrumentation Engineering (ICECIE) ◽

10.1109/icecie50279.2020.9309565 ◽

2020 ◽

Author(s):

Muhammad Hafizh Abdurrahman ◽

Budhi Irawan ◽

Casi Setianingsih

Keyword(s):

Hate Speech ◽

Gradient Boosting ◽

Machine Method ◽

Light Gradient ◽

Gradient Boosting Machine ◽

Speech Classification

Download Full-text

Spatial Agglomeration of China’s Forest Products Manufacturing Industry: Measurement, Characteristics and Determinants

Forests ◽

10.3390/f12081006 ◽

2021 ◽

Vol 12 (8) ◽

pp. 1006

Author(s):

Zhenhuan Chen ◽

Hongge Zhu ◽

Wencheng Zhao ◽

Menghan Zhao ◽

Yutong Zhang

Keyword(s):

Spatial Distribution ◽

Spatial Data ◽

Large Scale ◽

Manufacturing Industry ◽

Economies Of Scale ◽

Forest Products ◽

Spatial Data Analysis ◽

Forest Protection ◽

Spatial Agglomeration ◽

Continuous Space

China’s forest products manufacturing industry is experiencing the dual pressure of forest protection policies and wood scarcity and, therefore, it is of great significance to reveal the spatial agglomeration characteristics and evolution drivers of this industry to enhance its sustainable development. Based on the perspective of large-scale agglomeration in a continuous space, in this study, we used the spatial Gini coefficient and standard deviation ellipse method to investigate the spatial agglomeration degree and location distribution characteristics of China’s forest products manufacturing industry, and we used exploratory spatial data analysis to investigate its spatial agglomeration pattern. The results show that: (1) From 1988 to 2018, the degree of spatial agglomeration of China’s forest products manufacturing industry was relatively low, and the industry was characterized by a very pronounced imbalance in its spatial distribution. (2) The industry has a very clear core–periphery structure, the spatial distribution exhibits a “northeast-southwest” pattern, and the barycenter of the industrial distribution has tended to move south. (3) The industry mainly has a high–high and low–low spatial agglomeration pattern. The provinces with high–high agglomeration are few and concentrated in the southeast coastal area. (4) The spatial agglomeration and evolution characteristics of China’s forest products manufacturing industry may be simultaneously affected by forest protection policies, sources of raw materials, international trade and the degree of marketization. In the future, China’s forest products manufacturing industry should further increase the level of spatial agglomeration to fully realize the economies of scale.

Download Full-text