Monitoring of Urban Black-Odor Water Based on Nemerow Index and Gradient Boosting Decision Tree Regression Using UAV-Borne Hyperspectral Imagery

The formation of black-odor water in urban rivers has a long history. It not only seriously affects the image of the city, but also easily breeds germs and damages the urban habitat. The prevention and treatment of urban black-odor water have long been important topics nationwide. “Action Plan for Prevention and Control of Water Pollution” issued by the State Council shows Chinese government’s high attention to this issue. However, treatment and monitoring are inextricably linked. There are few studies on the large-scale monitoring of black-odor water, especially the cases of using unmanned aerial vehicle (UAV) to efficiently and accurately monitor the spatial distribution of urban river pollution. Therefore, in order to get rid of the limitations of traditional ground sampling to evaluate the point source pollution of rivers, the UAV-borne hyperspectral imagery was applied in this paper. It is hoped to grasp the pollution status of the entire river as soon as possible from the surface. However, the retrieval of multiple water quality parameters will lead to cumulative errors, so the Nemerow comprehensive pollution index (NCPI) is introduced to characterize the pollution level of urban water. In the paper, the retrieval results of six regression models including gradient boosting decision tree regression (GBDTR) were compared, trying to find a regression model for the retrieval NCPI in the current scenario. In the first study area, the retrieval accuracy of the training dataset (adjusted_R2 = 0.978), and test dataset (adjusted_R2 = 0.974) was higher than that of the other regression models. Although the retrieval effect of random forest is similar to that of GBDTR in both training accuracy and image inversion, it is more computationally expensive. Finally, the spatial distribution graphs of NCPI and its technical feasibility in monitoring pollution sources were investigated, in combination with field observations.

Download Full-text

Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework

Journal of Diabetes Research ◽

10.1155/2020/6873891 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Mingyue Xue ◽

Yinxia Su ◽

Chen Li ◽

Shuxia Wang ◽

Hua Yao

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Decision Tree ◽

Type Ii Diabetes ◽

Large Scale ◽

Systolic Pressure ◽

Gradient Boosting ◽

Significant Feature ◽

Type Ii ◽

Extreme Gradient Boosting

Background. An estimated 425 million people globally have diabetes, accounting for 12% of the world’s health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas. Methods. A total of 584,168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified by p values and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables’ importance scores of T2DM. Results. The results indicated that XGBoost had the best performance (accuracy=0.906, precision=0.910, recall=0.902, F‐1=0.906, and AUC=0.968). The degree of variables’ importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving). Conclusions. We proposed a classifier based on LR-XGBoost which used fourteen variables of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of T2DM. The classifier can accurately screen the risk of diabetes in the early phrase, and the degree of variables’ importance scores gives a clue to prevent diabetes occurrence.

Download Full-text

Large-Scale Gastric Cancer Susceptibility Gene Identification Based on Gradient Boosting Decision Tree

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.815243 ◽

2022 ◽

Vol 8 ◽

Author(s):

Qing Chen ◽

Ji Zhang ◽

Banghe Bao ◽

Fan Zhang ◽

Jie Zhou

Keyword(s):

Gastric Cancer ◽

Decision Tree ◽

Large Scale ◽

Clinical Symptoms ◽

Cancer Susceptibility ◽

Susceptibility Gene ◽

Interaction Network ◽

Gene Interaction ◽

Gradient Boosting ◽

Gene Interaction Network

The early clinical symptoms of gastric cancer are not obvious, and metastasis may have occurred at the time of treatment. Poor prognosis is one of the important reasons for the high mortality of gastric cancer. Therefore, the identification of gastric cancer-related genes can be used as relevant markers for diagnosis and treatment to improve diagnosis precision and guide personalized treatment. In order to further reveal the pathogenesis of gastric cancer at the gene level, we proposed a method based on Gradient Boosting Decision Tree (GBDT) to identify the susceptible genes of gastric cancer through gene interaction network. Based on the known genes related to gastric cancer, we collected more genes which can interact with them and constructed a gene interaction network. Random Walk was used to extract network association of each gene and we used GBDT to identify the gastric cancer-related genes. To verify the AUC and AUPR of our algorithm, we implemented 10-fold cross-validation. GBDT achieved AUC as 0.89 and AUPR as 0.81. We selected four other methods to compare with GBDT and found GBDT performed best.

Download Full-text

Retrieval of Chlorophyll-a Concentrations in the Coastal Waters of the Beibu Gulf in Guangxi Using a Gradient-Boosting Decision Tree Model

Applied Sciences ◽

10.3390/app11177855 ◽

2021 ◽

Vol 11 (17) ◽

pp. 7855

Author(s):

Huanmei Yao ◽

Yi Huang ◽

Yiming Wei ◽

Weiping Zhong ◽

Ke Wen

Keyword(s):

Remote Sensing ◽

Water Quality ◽

Spatial Distribution ◽

Decision Tree ◽

Chlorophyll A ◽

Coastal Waters ◽

Gradient Boosting ◽

Beibu Gulf ◽

Landsat 8 ◽

Chl A

Remote sensing for the monitoring of chlorophyll-a (Chl-a) is essential to compensate for the shortcomings of traditional water quality monitoring, strengthen red tide disaster monitoring and early warnings, and reduce marine environmental risks. In this study, a machine learning approach called the Gradient-Boosting Decision Tree (GBDT) was employed to develop an algorithm for estimating the Chl-a concentrations of the coastal waters of the Beibu Gulf in Guangxi, using Landsat 8 OLI image data as the image source in combination with field measurements of Chl-a concentrations. The GBDT model with B4, B3 + B4, B3, B1 − B4, B2 + B4, B1 + B4, and B2 − B4 as input features exhibited higher accuracy (MAE = 0.998 μg/L, MAPE = 19.413%, and RMSE = 1.626 μg/L) compared with different physics models, providing a new method for remote sensing inversion of water quality parameters. The GBDT model was used to study the spatial distribution and temporal variation of Chl-a concentrations in the coastal sea surface of the Beibu Gulf of Guangxi from 2013 to 2020. The results showed a spatial distribution with high concentrations in nearshore waters and low concentrations in offshore waters. The Chl-a concentration exhibited seasonal changes (concentration in summer > autumn > spring ≈ winter).

Download Full-text

Handling Missing Data in Large-Scale MODIS AOD Products Using a Two-Step Model

Remote Sensing ◽

10.3390/rs12223786 ◽

2020 ◽

Vol 12 (22) ◽

pp. 3786

Author(s):

Yufeng Chi ◽

Zhifeng Wu ◽

Kuo Liao ◽

Yin Ren

Keyword(s):

Spatial Distribution ◽

Large Scale ◽

Atmospheric Pollutants ◽

Gradient Boosting ◽

Recovery Ratio ◽

Moving Window ◽

Light Gradient ◽

Spatiotemporal Interpolation ◽

Step Model ◽

Modis Aod

Aerosol optical depth (AOD) is a key parameter that reflects the characteristics of aerosols, and is of great help in predicting the concentration of pollutants in the atmosphere. At present, remote sensing inversion has become an important method for obtaining the AOD on a large scale. However, AOD data acquired by satellites are often missing, and this has gradually become a popular topic. In recent years, a large number of AOD recovery algorithms have been proposed. Many AOD recovery methods are not application-oriented. These methods focus mainly on to the accuracy of AOD recovery and neglect the AOD recovery ratio. As a result, the AOD recovery accuracy and recovery ratio cannot be balanced. To solve these problems, a two-step model (TWS) that combines multisource AOD data and AOD spatiotemporal relationships is proposed. We used the light gradient boosting (LightGBM) model under the framework of the gradient boosting machine (GBM) to fit the multisource AOD data to fill in the missing AOD between data sources. Spatial interpolation and spatiotemporal interpolation methods are limited by buffer factors. We recovered the missing AOD in a moving window. We used TWS to recover AOD from Terra Satellite’s 2018 AOD product (MOD AOD). The results show that the MOD AOD, after a 3 × 3 moving window TWS recovery, was closely related to the AOD of the Aerosol Robotic Network (AERONET) (R = 0.87, RMSE = 0.23). In addition, the MOD AOD missing rate after a 3 × 3 window TWS recovery was greatly reduced (from 0.88 to 0.1). In addition, the spatial distribution characteristics of the monthly and annual averages of the recovered MOD AOD were consistent with the original MOD AOD. The results show that TWS is reliable. This study provides a new method for the restoration of MOD AOD, and is of great significance for studying the spatial distribution of atmospheric pollutants.

Download Full-text

Non-Blind Image Deconvolution Based on “Ringing” Removal Using Convolutional Neural Network

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.10.ipas-180 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 181-1-181-7

Author(s):

Takahiro Kudo ◽

Takanori Fujisawa ◽

Takuro Yamaguchi ◽

Masaaki Ikehara

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Network Architecture ◽

Large Scale ◽

Blind Deconvolution ◽

Training Dataset ◽

Image Deconvolution ◽

Classic Problem ◽

Key Points ◽

Blind Image

Image deconvolution has been an important issue recently. It has two kinds of approaches: non-blind and blind. Non-blind deconvolution is a classic problem of image deblurring, which assumes that the PSF is known and does not change universally in space. Recently, Convolutional Neural Network (CNN) has been used for non-blind deconvolution. Though CNNs can deal with complex changes for unknown images, some CNN-based conventional methods can only handle small PSFs and does not consider the use of large PSFs in the real world. In this paper we propose a non-blind deconvolution framework based on a CNN that can remove large scale ringing in a deblurred image. Our method has three key points. The first is that our network architecture is able to preserve both large and small features in the image. The second is that the training dataset is created to preserve the details. The third is that we extend the images to minimize the effects of large ringing on the image borders. In our experiments, we used three kinds of large PSFs and were able to observe high-precision results from our method both quantitatively and qualitatively.

Download Full-text

Large-scale mapping of the catenas vegetation in Subarctic tundra

Geobotanical mapping ◽

10.31111/geobotmap/1993.3 ◽

1995 ◽

pp. 3-21

Author(s):

S. S. Kholod

Keyword(s):

Spatial Distribution ◽

Vegetation Cover ◽

Large Scale ◽

Block Diagram ◽

Abiotic Factors ◽

Ecological Factors ◽

Spatial Arrangement ◽

Geological Time ◽

Ecological Barriers ◽

Functional Zones

One of the most difficult tasks in large-scale vegetation mapping is the clarification of mechanisms of the internal integration of vegetation cover territorial units. Traditional way of searching such mechanisms is the study of ecological factors controlling the space heterogeneity of vegetation cover. In essence, this is autecological analysis of vegetation. We propose another way of searching the mechanisms of territorial integration of vegetation. It is connected with intracoenotic interrelation, in particular, with the changing role of edificator synusium in a community along the altitudinal gradient. This way of searching is illustrated in the model-plot in subarctic tundra of Central Chukotka. Our further suggestion concerns the way of depicting these mechanisms on large-scale vegetation map. As a model object we chose the catena, that is the landscape formation including all geomorphjc positions of a slope, joint by the process of moving the material down the slope. The process of peneplanation of a mountain system for a long geological time favours to the levelling the lower (accumulative) parts of slopes. The colonization of these parts of the slope by the vegetation variants, corresponding to the lowest part of catena is the result of peneplanation. Vegetation of this part of catena makes a certain biogeocoenotic work which is the levelling of the small infralandscape limits and of the boundaries in vegetation cover. This process we name as the continualization on catena. In this process the variants of vegetation in the lower part of catena are being broken into separate synusiums. This is the process of decumbation of layers described by V. B. Sochava. Up to the slope the edificator power of the shrub synusiums sharply decreases. Moss and herb synusium have "to seek" the habitats similar to those under the shrub canopy. The competition between the synusium arises resulting in arrangement of a certain spatial assemblage of vegetation cover elements. In such assemblage the position of each element is determined by both biotic (interrelation with other coenotic elements) and abiotic (presence of appropriate habitats) factors. Taking into account the biogeocoenotic character of the process of continualization on catena we name such spatial assemblage an exolutionary-biogeocoenotic series. The space within each evolutionary-biogeocoenotic series is divided by ecological barriers into some functional zones. In each of the such zones the struggle between synusiums has its individual expression and direction. In the start zone of catena (extensive pediment) the interrelations of synusiums and layers control the mutual spatial arrangement of these elements at the largest extent. Here, as a rule, there predominate edificator synusiums of low and dwarfshrubs. In the first order limit zone (the bend of pediment to the above part of the slope) one-species herb and moss synusiums, oftenly substituting each other in similar habitats, get prevalence. In the zone of active colonization of slope (denudation slope) the coenotic factor has the least role in the spatial distribution of the vegetation cover elements. In particular, phytocoenotic interactions take place only within separate microcoenoses of herbs, mosses and lichens. In the zone of the attenuation of continualization process (the upper most parts of slope, crests) phytocoenotic interactions are almost absent and the spatial distribution of vegetation cover elements depends exclusively on the abiotic factors. The principal scheme of the distribution of vegetation cover elements and the disposition of functional zones on catena are shown on block-diagram (fig. 1).

Download Full-text

A gradient-boosting decision-tree approach for firm failure prediction: an empirical model evaluation of Chinese listed companies

The Journal of Risk Model Validation ◽

10.21314/jrmv.2017.170 ◽

2017 ◽

Vol 11 (2) ◽

pp. 43-64 ◽

Cited By ~ 6

Author(s):

Jiaming Liu ◽

Chong Wu

Keyword(s):

Decision Tree ◽

Empirical Model ◽

Model Evaluation ◽

Failure Prediction ◽

Listed Companies ◽

Gradient Boosting ◽

Chinese Listed Companies ◽

Tree Approach

Download Full-text

DeepSSPred: A Deep Learning Based Sulfenylation site predictor via a novel n-segmented optimize federated feature encoder

Protein and Peptide Letters ◽

10.2174/0929866527666201202103411 ◽

2020 ◽

Vol 27 ◽

Author(s):

Zaheer Ullah Khan ◽

Dechang Pi

Keyword(s):

Large Scale ◽

Computational Models ◽

Research Work ◽

Training Data ◽

Training Dataset ◽

Validation Dataset ◽

Cytokine Signaling ◽

Minority Class ◽

Independent Dataset ◽

Feature Encoding

Background: S-sulfenylation (S-sulphenylation, or sulfenic acid) proteins, are special kinds of post-translation modification, which plays an important role in various physiological and pathological processes such as cytokine signaling, transcriptional regulation, and apoptosis. Despite these aforementioned significances, and by complementing existing wet methods, several computational models have been developed for sulfenylation cysteine sites prediction. However, the performance of these models was not satisfactory due to inefficient feature schemes, severe imbalance issues, and lack of an intelligent learning engine. Objective: In this study, our motivation is to establish a strong and novel computational predictor for discrimination of sulfenylation and non-sulfenylation sites. Methods: In this study, we report an innovative bioinformatics feature encoding tool, named DeepSSPred, in which, resulting encoded features is obtained via n-segmented hybrid feature, and then the resampling technique called synthetic minority oversampling was employed to cope with the severe imbalance issue between SC-sites (minority class) and non-SC sites (majority class). State of the art 2DConvolutional Neural Network was employed over rigorous 10-fold jackknife cross-validation technique for model validation and authentication. Results: Following the proposed framework, with a strong discrete presentation of feature space, machine learning engine, and unbiased presentation of the underline training data yielded into an excellent model that outperforms with all existing established studies. The proposed approach is 6% higher in terms of MCC from the first best. On an independent dataset, the existing first best study failed to provide sufficient details. The model obtained an increase of 7.5% in accuracy, 1.22% in Sn, 12.91% in Sp and 13.12% in MCC on the training data and12.13% of ACC, 27.25% in Sn, 2.25% in Sp, and 30.37% in MCC on an independent dataset in comparison with 2nd best method. These empirical analyses show the superlative performance of the proposed model over both training and Independent dataset in comparison with existing literature studies. Conclusion : In this research, we have developed a novel sequence-based automated predictor for SC-sites, called DeepSSPred. The empirical simulations outcomes with a training dataset and independent validation dataset have revealed the efficacy of the proposed theoretical model. The good performance of DeepSSPred is due to several reasons, such as novel discriminative feature encoding schemes, SMOTE technique, and careful construction of the prediction model through the tuned 2D-CNN classifier. We believe that our research work will provide a potential insight into a further prediction of S-sulfenylation characteristics and functionalities. Thus, we hope that our developed predictor will significantly helpful for large scale discrimination of unknown SC-sites in particular and designing new pharmaceutical drugs in general.

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text

Modeling Spatiotemporal Population Changes by Integrating DMSP-OLS and NPP-VIIRS Nighttime Light Data in Chongqing, China

Remote Sensing ◽

10.3390/rs13020284 ◽

2021 ◽

Vol 13 (2) ◽

pp. 284

Author(s):

Dan Lu ◽

Yahui Wang ◽

Qingyuan Yang ◽

Kangchuan Su ◽

Haozhe Zhang ◽

...

Keyword(s):

Spatial Distribution ◽

Relative Error ◽

Urban Areas ◽

Large Scale ◽

Population Distribution ◽

Spatial Optimization ◽

Distribution Data ◽

Mountainous Areas ◽

Mean Relative Error ◽

Nighttime Light

The sustained growth of non-farm wages has led to large-scale migration of rural population to cities in China, especially in mountainous areas. It is of great significance to study the spatial and temporal pattern of population migration mentioned above for guiding population spatial optimization and the effective supply of public services in the mountainous areas. Here, we determined the spatiotemporal evolution of population in the Chongqing municipality of China from 2000–2018 by employing multi-period spatial distribution data, including nighttime light (NTL) data from the Defense Meteorological Satellite Program’s Operational Linescan System (DMSP-OLS) and the Suomi National Polar-orbiting Partnership Visible Infrared Imaging Radiometer Suite (NPP-VIIRS). There was a power function relationship between the two datasets at the pixel scale, with a mean relative error of NTL integration of 8.19%, 4.78% less than achieved by a previous study at the provincial scale. The spatial simulations of population distribution achieved a mean relative error of 26.98%, improved the simulation accuracy for mountainous population by nearly 20% and confirmed the feasibility of this method in Chongqing. During the study period, the spatial distribution of Chongqing’s population has increased in the west and decreased in the east, while also increased in low-altitude areas and decreased in medium-high altitude areas. Population agglomeration was common in all of districts and counties and the population density of central urban areas and its surrounding areas significantly increased, while that of non-urban areas such as northeast Chongqing significantly decreased.

Download Full-text