scholarly journals Building Function Mapping Using Multisource Geospatial Big Data: A Case Study in Shenzhen, China

2021 ◽  
Vol 13 (23) ◽  
pp. 4751
Author(s):  
Jionghua Wang ◽  
Haowen Luo ◽  
Wenyu Li ◽  
Bo Huang

Building function labelling plays an important role in understanding human activities inside buildings. This study develops a method of function label classification using integrated features derived from remote sensing and crowdsensing data with an extreme gradient boosting tree (XGBoost). The classification framework is verified based on a dataset from Shenzhen, China. An extended label system for six building types (residential, commercial, office, industrial, public facilities, and others) was applied, and various social functions were considered. The overall classification accuracies were 88.15% (kappa index = 0.72) and 85.56% (kappa index = 0.69). The importance of features was evaluated using the occurrence frequency of features at decision nodes. In the six-category classification system, the basic building attributes (22.99%) and POIs (46.74%) contributed most to the classification process; moreover, the building footprint (7.40%) and distance to roads (11.76%) also made notable contributions. The result shows that it is feasible to extract building environments from POI labels and building footprint geometry with a dimensional reduction model using an autoencoder. Additionally, crowdsensing data (e.g., POI and distance to roads) will become increasingly important as classification tasks become more complicated and the importance of basic building attributes declines.

2019 ◽  
Vol 11 (3) ◽  
pp. 660 ◽  
Author(s):  
Kai Cao ◽  
Hui Guo ◽  
Ye Zhang

Accurate and timely classification and monitoring of urban functional zones prove to be significant in rapidly developing cities, to better understand the real and varying urban functions of cities to support urban planning and management. Many efforts have been undertaken to identify urban functional zones using various classification approaches and multi-source geospatial datasets. The complexity of this category of classification poses tremendous challenges to these studies especially in terms of classification accuracy, but on the opposite, the rapid development of machine learning technologies provides us with new opportunities. In this study, a set of commonly used urban functional zones classification approaches, including Multinomial Logistic Regression, K-Nearest Neighbors, Decision Tree, Support Vector Machine (SVM), and Random Forest, are examined and compared with the newly developed eXtreme Gradient Boosting (XGBoost) model, using the case study of Yuzhong District, Chongqing, China. The investigation is based on multi-variate geospatial data, including night-time imagery, geotagged Weibo data, points of interest (POI) from Gaode, and Baidu Heat Map. This study is the first endeavor of implementing the XGBoost model in the field of urban functional zones classification. The results suggest that the XGBoost classification model performed the best and was able to achieve an accuracy of 88.05%, which is significantly higher than the other commonly used approaches. In addition, the integration of night-time imagery, geotagged Weibo data, POI from Gaode, and Baidu Heat Map has also demonstrated their values for the classification of urban functional zones in this case study.


2021 ◽  
Vol 2021 ◽  
pp. 1-19
Author(s):  
Changjian Zhang ◽  
Jie He ◽  
Chunguang Bai ◽  
Xintong Yan ◽  
Jian Gong ◽  
...  

Despite the broad application of advanced fleet management systems (FMSs) in third-party logistics (3PL) companies, there is a marginally limited understanding of how to employ them to enhance transport energy efficiency. In a case study of a Chinese 3PL company, this paper analyzed data obtained from the online FMS to assess drivers’ awareness of fuel-saving factors. A questionnaire was primarily designed to investigate the drivers’ awareness of fuel-saving factors based on the reliability and validity test. Then, Extreme Gradient Boosting (XGBoost), a machine learning algorithm, was utilized to explore the intrinsic impacts of various factors on fuel consumption with the outputs providing the evaluation basis for individual awareness of the drivers. The results show a significant deviation in the driver’s awareness of fuel-saving factors, among which the three indicators of engine speed, idling condition, and rolling without engine load are seriously underestimated, while the indicators related to the environment are seriously overestimated due to social expectations. In addition, the average speed was found to be the most important fuel-saving indicator besides the load. Based on these findings, this paper recommends that the 3PL companies choose a route with more freeways when planning, and the mileage should be controlled within 800 km as far as possible.


Author(s):  
Sunghae Jun

Most of the studies related to technology analysis have focused on one specific technological field such as autonomous driving or blockchain. Most technologies have large and small relationships with each other. Therefore, it is necessary not only to perform technology analysis focusing on one target technology, but also to analyze several integrated technologies at the same time. In this paper, we propose a methodology for integrating technologies and analyzing the integrated technologies. We integrate patent big data for technological integration and use text mining, boosting, and ensemble for integrated technology analysis. To evaluate the performance of proposed method, we search the patent documents related to disaster artificial intelligence (AI) and extended reality (XR). In our case study, we integrate the patent data from disaster AI and XR technologies and analyze the integrated patent data using regression trees, random forest, extreme gradient boosting, and ensemble models. Therefore, we illustrate how our proposed method can be applied to the real domain.


2019 ◽  
Author(s):  
Kasper Van Mens ◽  
Joran Lokkerbol ◽  
Richard Janssen ◽  
Robert de Lange ◽  
Bea Tiemens

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.


Author(s):  
Mohammad Hamim Zajuli Al Faroby ◽  
Mohammad Isa Irawan ◽  
Ni Nyoman Tri Puspaningsih

Protein Interaction Analysis (PPI) can be used to identify proteins that have a supporting function on the main protein, especially in the synthesis process. Insulin is synthesized by proteins that have the same molecular function covering different but mutually supportive roles. To identify this function, the translation of Gene Ontology (GO) gives certain characteristics to each protein. This study purpose to predict proteins that interact with insulin using the centrality method as a feature extractor and extreme gradient boosting as a classification algorithm. Characteristics using the centralized method produces  features as a central function of protein. Classification results are measured using measurements, precision, recall and ROC scores. Optimizing the model by finding the right parameters produces an accuracy of  and a ROC score of . The prediction model produced by XGBoost has capabilities above the average of other machine learning methods.


2021 ◽  
Vol 13 (5) ◽  
pp. 1021
Author(s):  
Hu Ding ◽  
Jiaming Na ◽  
Shangjing Jiang ◽  
Jie Zhu ◽  
Kai Liu ◽  
...  

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.


Author(s):  
Irfan Ullah Khan ◽  
Nida Aslam ◽  
Malak Aljabri ◽  
Sumayh S. Aljameel ◽  
Mariam Moataz Aly Kamaleldin ◽  
...  

The COVID-19 outbreak is currently one of the biggest challenges facing countries around the world. Millions of people have lost their lives due to COVID-19. Therefore, the accurate early detection and identification of severe COVID-19 cases can reduce the mortality rate and the likelihood of further complications. Machine Learning (ML) and Deep Learning (DL) models have been shown to be effective in the detection and diagnosis of several diseases, including COVID-19. This study used ML algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and K-Nearest Neighbor (KNN) and DL model (containing six layers with ReLU and output layer with sigmoid activation), to predict the mortality rate in COVID-19 cases. Models were trained using confirmed COVID-19 patients from 146 countries. Comparative analysis was performed among ML and DL models using a reduced feature set. The best results were achieved using the proposed DL model, with an accuracy of 0.97. Experimental results reveal the significance of the proposed model over the baseline study in the literature with the reduced feature set.


Sign in / Sign up

Export Citation Format

Share Document