Building Function Mapping Using Multisource Geospatial Big Data: A Case Study in Shenzhen, China

Building function labelling plays an important role in understanding human activities inside buildings. This study develops a method of function label classification using integrated features derived from remote sensing and crowdsensing data with an extreme gradient boosting tree (XGBoost). The classification framework is verified based on a dataset from Shenzhen, China. An extended label system for six building types (residential, commercial, office, industrial, public facilities, and others) was applied, and various social functions were considered. The overall classification accuracies were 88.15% (kappa index = 0.72) and 85.56% (kappa index = 0.69). The importance of features was evaluated using the occurrence frequency of features at decision nodes. In the six-category classification system, the basic building attributes (22.99%) and POIs (46.74%) contributed most to the classification process; moreover, the building footprint (7.40%) and distance to roads (11.76%) also made notable contributions. The result shows that it is feasible to extract building environments from POI labels and building footprint geometry with a dimensional reduction model using an autoencoder. Additionally, crowdsensing data (e.g., POI and distance to roads) will become increasingly important as classification tasks become more complicated and the importance of basic building attributes declines.

Download Full-text

Exploring the relationship between 2D/3D landscape pattern and land surface temperature based on explainable eXtreme Gradient Boosting tree: A case study of Shanghai, China

The Science of The Total Environment ◽

10.1016/j.scitotenv.2020.138229 ◽

2020 ◽

Vol 725 ◽

pp. 138229 ◽

Cited By ~ 2

Author(s):

Siyi Yu ◽

Zuoqi Chen ◽

Bailang Yu ◽

Lei Wang ◽

Bin Wu ◽

...

Keyword(s):

Surface Temperature ◽

Land Surface Temperature ◽

Landscape Pattern ◽

Land Surface ◽

Gradient Boosting ◽

Extreme Gradient Boosting ◽

The Relationship

Download Full-text

Comparison of Approaches for Urban Functional Zones Classification Based on Multi-Source Geospatial Data: A Case Study in Yuzhong District, Chongqing, China

Sustainability ◽

10.3390/su11030660 ◽

2019 ◽

Vol 11 (3) ◽

pp. 660 ◽

Cited By ~ 6

Author(s):

Kai Cao ◽

Hui Guo ◽

Ye Zhang

Keyword(s):

Rapid Development ◽

Geospatial Data ◽

Classification Model ◽

Gradient Boosting ◽

Support Vector ◽

Night Time ◽

Heat Map ◽

Extreme Gradient Boosting ◽

Functional Zones

Accurate and timely classification and monitoring of urban functional zones prove to be significant in rapidly developing cities, to better understand the real and varying urban functions of cities to support urban planning and management. Many efforts have been undertaken to identify urban functional zones using various classification approaches and multi-source geospatial datasets. The complexity of this category of classification poses tremendous challenges to these studies especially in terms of classification accuracy, but on the opposite, the rapid development of machine learning technologies provides us with new opportunities. In this study, a set of commonly used urban functional zones classification approaches, including Multinomial Logistic Regression, K-Nearest Neighbors, Decision Tree, Support Vector Machine (SVM), and Random Forest, are examined and compared with the newly developed eXtreme Gradient Boosting (XGBoost) model, using the case study of Yuzhong District, Chongqing, China. The investigation is based on multi-variate geospatial data, including night-time imagery, geotagged Weibo data, points of interest (POI) from Gaode, and Baidu Heat Map. This study is the first endeavor of implementing the XGBoost model in the field of urban functional zones classification. The results suggest that the XGBoost classification model performed the best and was able to achieve an accuracy of 88.05%, which is significantly higher than the other commonly used approaches. In addition, the integration of night-time imagery, geotagged Weibo data, POI from Gaode, and Baidu Heat Map has also demonstrated their values for the classification of urban functional zones in this case study.

Download Full-text

Short-term prediction of building energy consumption employing an improved extreme gradient boosting model: A case study of an intake tower

Energy ◽

10.1016/j.energy.2020.117756 ◽

2020 ◽

Vol 203 ◽

pp. 117756 ◽

Cited By ~ 2

Author(s):

Hongfang Lu ◽

Feifei Cheng ◽

Xin Ma ◽

Gang Hu

Keyword(s):

Energy Consumption ◽

Building Energy ◽

Gradient Boosting ◽

Building Energy Consumption ◽

Short Term ◽

Term Prediction ◽

Extreme Gradient Boosting ◽

Short Term Prediction

Download Full-text

How to Use Advanced Fleet Management System to Promote Energy Saving in Transportation: A Survey of Drivers’ Awareness of Fuel-Saving Factors

Journal of Advanced Transportation ◽

10.1155/2021/9987101 ◽

2021 ◽

Vol 2021 ◽

pp. 1-19

Author(s):

Changjian Zhang ◽

Jie He ◽

Chunguang Bai ◽

Xintong Yan ◽

Jian Gong ◽

...

Keyword(s):

Engine Speed ◽

Learning Algorithm ◽

Reliability And Validity ◽

Third Party ◽

Fleet Management ◽

Gradient Boosting ◽

Third Party Logistics ◽

Fuel Saving ◽

Extreme Gradient Boosting

Despite the broad application of advanced fleet management systems (FMSs) in third-party logistics (3PL) companies, there is a marginally limited understanding of how to employ them to enhance transport energy efficiency. In a case study of a Chinese 3PL company, this paper analyzed data obtained from the online FMS to assess drivers’ awareness of fuel-saving factors. A questionnaire was primarily designed to investigate the drivers’ awareness of fuel-saving factors based on the reliability and validity test. Then, Extreme Gradient Boosting (XGBoost), a machine learning algorithm, was utilized to explore the intrinsic impacts of various factors on fuel consumption with the outputs providing the evaluation basis for individual awareness of the drivers. The results show a significant deviation in the driver’s awareness of fuel-saving factors, among which the three indicators of engine speed, idling condition, and rolling without engine load are seriously underestimated, while the indicators related to the environment are seriously overestimated due to social expectations. In addition, the average speed was found to be the most important fuel-saving indicator besides the load. Based on these findings, this paper recommends that the 3PL companies choose a route with more freeways when planning, and the mileage should be controlled within 800 km as far as possible.

Download Full-text

Technology Integration and Analysis Using Boosting and Ensemble

Journal of Open Innovation Technology Market and Complexity ◽

10.3390/joitmc7010027 ◽

2021 ◽

Vol 7 (1) ◽

pp. 27

Author(s):

Sunghae Jun

Keyword(s):

Autonomous Driving ◽

Gradient Boosting ◽

Patent Data ◽

Integrated Technology ◽

Technology Analysis ◽

Technological Field ◽

Extreme Gradient Boosting ◽

Real Domain ◽

Technological Integration

Most of the studies related to technology analysis have focused on one specific technological field such as autonomous driving or blockchain. Most technologies have large and small relationships with each other. Therefore, it is necessary not only to perform technology analysis focusing on one target technology, but also to analyze several integrated technologies at the same time. In this paper, we propose a methodology for integrating technologies and analyzing the integrated technologies. We integrate patent big data for technological integration and use text mining, boosting, and ensemble for integrated technology analysis. To evaluate the performance of proposed method, we search the patent documents related to disaster artificial intelligence (AI) and extended reality (XR). In our case study, we integrate the patent data from disaster AI and XR technologies and analyze the integrated patent data using regression trees, random forest, extreme gradient boosting, and ensemble models. Therefore, we illustrate how our proposed method can be applied to the real domain.

Download Full-text

Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China

Energy Conversion and Management ◽

10.1016/j.enconman.2018.02.087 ◽

2018 ◽

Vol 164 ◽

pp. 102-111 ◽

Cited By ~ 101

Author(s):

Junliang Fan ◽

Xiukang Wang ◽

Lifeng Wu ◽

Hanmi Zhou ◽

Fucang Zhang ◽

...

Keyword(s):

Support Vector Machine ◽

Solar Radiation ◽

Global Solar Radiation ◽

Gradient Boosting ◽

Support Vector ◽

Temperature And Precipitation ◽

Extreme Gradient Boosting

Download Full-text

Predicting Undesired Treatment Outcome in Mental Healthcare: Machine Learning Study (Preprint)

10.2196/preprints.17235 ◽

2019 ◽

Author(s):

Kasper Van Mens ◽

Joran Lokkerbol ◽

Richard Janssen ◽

Robert de Lange ◽

Bea Tiemens

Keyword(s):

Machine Learning ◽

Treatment Outcome ◽

Mental Health Treatment ◽

Mental Healthcare ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Trade Off ◽

Trade Offs ◽

Outcome Monitoring ◽

Extreme Gradient Boosting

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.

Download Full-text

XGBoost and Network Analysis for Prediction of Proteins Affecting Insulin based on Protein Protein Interactions

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i4.1076 ◽

2020 ◽

pp. 253-262

Author(s):

Mohammad Hamim Zajuli Al Faroby ◽

Mohammad Isa Irawan ◽

Ni Nyoman Tri Puspaningsih

Keyword(s):

Protein Interactions ◽

Interaction Analysis ◽

Synthesis Process ◽

Gradient Boosting ◽

Protein Protein Interactions ◽

Central Function ◽

Extreme Gradient Boosting ◽

Main Protein ◽

The Right ◽

Roc Score

Protein Interaction Analysis (PPI) can be used to identify proteins that have a supporting function on the main protein, especially in the synthesis process. Insulin is synthesized by proteins that have the same molecular function covering different but mutually supportive roles. To identify this function, the translation of Gene Ontology (GO) gives certain characteristics to each protein. This study purpose to predict proteins that interact with insulin using the centrality method as a feature extractor and extreme gradient boosting as a classification algorithm. Characteristics using the centralized method produces features as a central function of protein. Classification results are measured using measurements, precision, recall and ROC scores. Optimizing the model by finding the right parameters produces an accuracy of and a ROC score of . The prediction model produced by XGBoost has capabilities above the average of other machine learning methods.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

Computational Intelligence-Based Model for Mortality Rate Prediction in COVID-19 Patients

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18126429 ◽

2021 ◽

Vol 18 (12) ◽

pp. 6429

Author(s):

Irfan Ullah Khan ◽

Nida Aslam ◽

Malak Aljabri ◽

Sumayh S. Aljameel ◽

Mariam Moataz Aly Kamaleldin ◽

...

Keyword(s):

Mortality Rate ◽

Computational Intelligence ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Detection And Identification ◽

Proposed Model ◽

Extreme Gradient Boosting ◽

The World ◽

Detection And Diagnosis

The COVID-19 outbreak is currently one of the biggest challenges facing countries around the world. Millions of people have lost their lives due to COVID-19. Therefore, the accurate early detection and identification of severe COVID-19 cases can reduce the mortality rate and the likelihood of further complications. Machine Learning (ML) and Deep Learning (DL) models have been shown to be effective in the detection and diagnosis of several diseases, including COVID-19. This study used ML algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and K-Nearest Neighbor (KNN) and DL model (containing six layers with ReLU and output layer with sigmoid activation), to predict the mortality rate in COVID-19 cases. Models were trained using confirmed COVID-19 patients from 146 countries. Comparative analysis was performed among ML and DL models using a reduced feature set. The best results were achieved using the proposed DL model, with an accuracy of 0.97. Experimental results reveal the significance of the proposed model over the baseline study in the literature with the reduced feature set.

Download Full-text