Confidence intervals of prediction accuracy measures for multivariable prediction models based on the bootstrap‐based optimism correction methods

This study aims to develop a new approach based on machine learning techniques to assess sustainability performance. Two main dimensions of sustainability, ecological sustainability, and human sustainability, were considered in this study. A set of sustainability indicators was used, and the research method in this study was developed using cluster analysis and prediction learning techniques. A Self-Organizing Map (SOM) was applied for data clustering, while Classification and Regression Trees (CART) were applied to assess sustainability performance. The proposed method was evaluated through Sustainability Assessment by Fuzzy Evaluation (SAFE) dataset, which comprises various indicators of sustainability performance in 128 countries. Eight clusters from the data were found through the SOM clustering technique. A prediction model was found in each cluster through the CART technique. In addition, an ensemble of CART was constructed in each cluster of SOM to increase the prediction accuracy of CART. All prediction models were assessed through the adjusted coefficient of determination approach. The results demonstrated that the prediction accuracy values were high in all CART models. The results indicated that the method developed by ensembles of CART and clustering provide higher prediction accuracy than individual CART models. The main advantage of integrating the proposed method is its ability to automate decision rules from big data for prediction models. The method proposed in this study could be implemented as an effective tool for sustainability performance assessment.

Download Full-text

Assessment for Thermal Conductivity of Frozen Soil Based on Nonlinear Regression and Support Vector Regression Methods

Advances in Civil Engineering ◽

10.1155/2020/8898126 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Fu-Qing Cui ◽

Wei Zhang ◽

Zhi-Yun Liu ◽

Wei Wang ◽

Jian-bing Chen ◽

...

Keyword(s):

Thermal Conductivity ◽

Support Vector Regression ◽

Nonlinear Regression ◽

Prediction Accuracy ◽

Prediction Models ◽

Frozen Soil ◽

Support Vector ◽

Soil Thermal Conductivity ◽

Fine Grained ◽

Fine Grained Soil

The comprehensive understanding of the variation law of soil thermal conductivity is the prerequisite of design and construction of engineering applications in permafrost regions. Compared with the unfrozen soil, the specimen preparation and experimental procedures of frozen soil thermal conductivity testing are more complex and challengeable. In this work, considering for essentially multiphase and porous structural characteristic information reflection of unfrozen soil thermal conductivity, prediction models of frozen soil thermal conductivity using nonlinear regression and Support Vector Regression (SVR) methods have been developed. Thermal conductivity of multiple types of soil samples which are sampled from the Qinghai-Tibet Engineering Corridor (QTEC) are tested by the transient plane source (TPS) method. Correlations of thermal conductivity between unfrozen and frozen soil has been analyzed and recognized. Based on the measurement data of unfrozen soil thermal conductivity, the prediction models of frozen soil thermal conductivity for 7 typical soils in the QTEC are proposed. To further facilitate engineering applications, the prediction models of two soil categories (coarse and fine-grained soil) have also been proposed. The results demonstrate that, compared with nonideal prediction accuracy of using water content and dry density as the fitting parameter, the ternary fitting model has a higher thermal conductivity prediction accuracy for 7 types of frozen soils (more than 98% of the soil specimens’ relative error are within 20%). The SVR model can further improve the frozen soil thermal conductivity prediction accuracy and more than 98% of the soil specimens’ relative error are within 15%. For coarse and fine-grained soil categories, the above two models still have reliable prediction accuracy and determine coefficient (R2) ranges from 0.8 to 0.91, which validates the applicability for small sample soils. This study provides feasible prediction models for frozen soil thermal conductivity and guidelines of the thermal design and freeze-thaw damage prevention for engineering structures in cold regions.

Download Full-text

Research on Non-Landslide Selection Method for Landslide Hazard Mapping

10.21203/rs.3.rs-270737/v1 ◽

2021 ◽

Author(s):

Xia Li ◽

Jiulong Cheng ◽

Dehao Yu ◽

Yangchun Han

Keyword(s):

Prediction Accuracy ◽

Prediction Models ◽

Selection Method ◽

Landslide Hazard ◽

River Channel ◽

Information Value ◽

Hazard Mapping ◽

Landslide Prediction ◽

Conditioning Factors ◽

Landslide Hazard Mapping

Abstract Most landslide prediction models need to select non-landslides. At present, non-landslides mainly use subjective inference or random selection method, which makes it easy to select non-landslides in high-risk areas. To solve this problem and improve the accuracy of landslide prediction, the method of selecting non-landslide by Information value (IV) is proposed in this study. Firstly, 230 historical landslides and 10 landslide conditioning factors are extracted and interpreted by using Remote Sensing (RS) image, Geographic Information System (GIS) and field survey. Secondly, random, buffer, river channel or slope, and IV methods are used to obtain non-landslides, and the obtained non-landslides are applied to the popular SVM model for landslide hazard mapping (LHM) in western area of Tumen City. The landslide hazard map based on the river channel or slope method is seriously inconsistent with the actual situation of study area, Therefore, the three methods of random, buffer, and IV are verified and compared by accuracy, receiver operating characteristic (ROC) curve and the area under curves (AUC). The results show that the landslide prediction accuracy of the three methods is more than 80%, and the prediction accuracy is high, but the IV is higher. In addition, IV can identify the very high hazard regions with smaller area. Therefore, it is more reasonable to use IV to select non-landslides, and IV method is more practical in landslide prevention and engineering construction. The research results may be useful to provide basic information of landslide hazard for decision makers and planners.

Download Full-text

Predicting Corporate Financial Sustainability Using Novel Business Analytics

Sustainability ◽

10.3390/su11010064 ◽

2018 ◽

Vol 11 (1) ◽

pp. 64 ◽

Cited By ~ 5

Author(s):

Kyoung-jae Kim ◽

Kichun Lee ◽

Hyunchul Ahn

Keyword(s):

Financial Distress ◽

Prediction Accuracy ◽

Prediction Models ◽

Support Vector ◽

Model Parameters ◽

Financial Sustainability ◽

Business Analytics ◽

Financial Distress Prediction ◽

Proposed Model ◽

Distress Prediction

Measuring and managing the financial sustainability of the borrowers is crucial to financial institutions for their risk management. As a result, building an effective corporate financial distress prediction model has been an important research topic for a long time. Recently, researchers are exerting themselves to improve the accuracy of financial distress prediction models by applying various business analytics approaches including statistical and artificial intelligence methods. Among them, support vector machines (SVMs) are becoming popular. SVMs require only small training samples and have little possibility of overfitting if model parameters are properly tuned. Nonetheless, SVMs generally show high prediction accuracy since it can deal with complex nonlinear patterns. Despite of these advantages, SVMs are often criticized because their architectural factors are determined by heuristics, such as the parameters of a kernel function and the subsets of appropriate features and instances. In this study, we propose globally optimized SVMs, denoted by GOSVM, a novel hybrid SVM model designed to optimize feature selection, instance selection, and kernel parameters altogether. This study introduces genetic algorithm (GA) in order to simultaneously optimize multiple heterogeneous design factors of SVMs. Our study applies the proposed model to the real-world case for predicting financial distress. Experiments show that the proposed model significantly improves the prediction accuracy of conventional SVMs.

Download Full-text

Prediction of Eye, Hair and Skin Color in Admixed Populations of Latin America

10.1101/2020.12.09.415901 ◽

2020 ◽

Author(s):

Sagnik Palmal ◽

Kaustubh Adhikari ◽

Javier Mendoza-Revilla ◽

Macarena Fuentes-Guajardo ◽

Caio C. Silva de Cerqueira ◽

...

Keyword(s):

Native American ◽

Skin Color ◽

Prediction Accuracy ◽

Prediction Models ◽

Skin Pigmentation ◽

Genetic Ancestry ◽

Learning Approaches ◽

Latin Americans ◽

Eye Color ◽

The Impact

AbstractWe report an evaluation of prediction accuracy for eye, hair and skin pigmentation based on genomic and phenotypic data for over 6,500 admixed Latin Americans (the CANDELA dataset). We examined the impact on prediction accuracy of three main factors: (i) The methods of prediction, including classical statistical methods and machine learning approaches, (ii) The inclusion of non-genetic predictors, continental genetic ancestry and pigmentation SNPs in the prediction models, and (iii) Compared two sets of pigmentation SNPs: the commonly-used HIrisPlex-S set (developed in Europeans) and novel SNP sets we defined here based on genome-wide association results in the CANDELA sample. We find that Random Forest or regression are globally the best performing methods. Although continental genetic ancestry has substantial power for prediction of pigmentation in Latin Americans, the inclusion of pigmentation SNPs increases prediction accuracy considerably, particularly for skin color. For hair and eye color, HIrisPlex-S has a similar performance to the CANDELA-specific prediction SNP sets. However, for skin pigmentation the performance of HIrisPlex-S is markedly lower than the SNP set defined here, including predictions in an independent dataset of Native American data. These results reflect the relatively high variation in hair and eye color among Europeans for whom HIrisPlex-S was developed, whereas their variation in skin pigmentation is comparatively lower. Furthermore, we show that the dataset used in the training of prediction models strongly impacts on the portability of these models across Europeans and Native Americans.

Download Full-text

A Robust UWSN Handover Prediction System Using Ensemble Learning

Sensors ◽

10.3390/s21175777 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5777

Author(s):

Esraa Eldesouky ◽

Mahmoud Bekhit ◽

Ahmed Fathalla ◽

Ahmad Salah ◽

Ahmed Ali

Keyword(s):

Decision Tree ◽

Ensemble Learning ◽

Prediction Accuracy ◽

Performance Metrics ◽

Prediction Models ◽

Sensor Nodes ◽

Wireless Sensor ◽

Water Current ◽

Gradient Boosting ◽

Marine Data

The use of underwater wireless sensor networks (UWSNs) for collaborative monitoring and marine data collection tasks is rapidly increasing. One of the major challenges associated with building these networks is handover prediction; this is because the mobility model of the sensor nodes is different from that of ground-based wireless sensor network (WSN) devices. Therefore, handover prediction is the focus of the present work. There have been limited efforts in addressing the handover prediction problem in UWSNs and in the use of ensemble learning in handover prediction for UWSNs. Hence, we propose the simulation of the sensor node mobility using real marine data collected by the Korea Hydrographic and Oceanographic Agency. These data include the water current speed and direction between data. The proposed simulation consists of a large number of sensor nodes and base stations in a UWSN. Next, we collected the handover events from the simulation, which were utilized as a dataset for the handover prediction task. Finally, we utilized four machine learning prediction algorithms (i.e., gradient boosting, decision tree (DT), Gaussian naive Bayes (GNB), and K-nearest neighbor (KNN)) to predict handover events based on historically collected handover events. The obtained prediction accuracy rates were above 95%. The best prediction accuracy rate achieved by the state-of-the-art method was 56% for any UWSN. Moreover, when the proposed models were evaluated on performance metrics, the measured evolution scores emphasized the high quality of the proposed prediction models. While the ensemble learning model outperformed the GNB and KNN models, the performance of ensemble learning and decision tree models was almost identical.

Download Full-text