A Predictive Model of Cost Growth in Construction Projects Using Feature Selection

Author(s):  
Negar Tajziyehchi ◽  
Mohammad Moshirpour ◽  
George Jergeas ◽  
Farnaz Sadeghpour
Author(s):  
Ricco Rakotomalala ◽  
Faouzi Mhamdi

In this chapter, we are interested in proteins classification starting from their primary structures. The goal is to automatically affect proteins sequences to their families. The main originality of the approach is that we directly apply the text categorization framework for the protein classification with very minor modifications. The main steps of the task are clearly identified: we must extract features from the unstructured dataset, we use the fixed length n-grams descriptors; we select and combine the most relevant one for the learning phase; and then, we select the most promising learning algorithm in order to produce accurate predictive model. We obtain essentially two main results. First, the approach is credible, giving accurate results with only 2-grams descriptors length. Second, in our context where many irrelevant descriptors are automatically generated, we must combine aggressive feature selection algorithms and low variance classifiers such as SVM (Support Vector Machine).


Information ◽  
2019 ◽  
Vol 10 (6) ◽  
pp. 187
Author(s):  
Rattanawadee Panthong ◽  
Anongnart Srivihok

Liver cancer data always consist of a large number of multidimensional datasets. A dataset that has huge features and multiple classes may be irrelevant to the pattern classification in machine learning. Hence, feature selection improves the performance of the classification model to achieve maximum classification accuracy. The aims of the present study were to find the best feature subset and to evaluate the classification performance of the predictive model. This paper proposed a hybrid feature selection approach by combining information gain and sequential forward selection based on the class-dependent technique (IGSFS-CD) for the liver cancer classification model. Two different classifiers (decision tree and naïve Bayes) were used to evaluate feature subsets. The liver cancer datasets were obtained from the Cancer Hospital Thailand database. Three ensemble methods (ensemble classifiers, bagging, and AdaBoost) were applied to improve the performance of classification. The IGSFS-CD method provided good accuracy of 78.36% (sensitivity 0.7841 and specificity 0.9159) on LC_dataset-1. In addition, LC_dataset II delivered the best performance with an accuracy of 84.82% (sensitivity 0.8481 and specificity 0.9437). The IGSFS-CD method achieved better classification performance compared to the class-independent method. Furthermore, the best feature subset selection could help reduce the complexity of the predictive model.


Author(s):  
Martin Oloruntobi Dada

Purpose – Using projects executed with both traditional and integrated procurement methods, the study sought to investigate relationships that exist among project participants and the influence of those relationships on cost growth. The paper aims to discuss these issues. Design/methodology/approach – Questionnaires were administered among 274 construction projects located in 12 states including the Federal Capital Territory of Nigeria. Responses were obtained from 96 projects. Data were subjected to both descriptive and inferential analyses. Findings – In terms of cordiality, relationships between client and contractors ranked highest, while those among in-house project teams ranked lowest. Cost growth or cost overrun is significantly correlated with client-contractor relationship, consultant-contractor relationship, client-consultant-contractor relationship and in-house team relationships. No association between procurement method and cost growth was found. Research limitations/implications – The limitation of generalizability of results due to the sampling method used is acknowledged. One implication of the findings is that in the context of this research, any explanation for cost growth has to be found outside procurement methods. Practical implications – Findings may assist project participants on variables to consider in anticipating, preventing or managing cost growth in building construction projects, beyond formularization of contracts and structures. Originality/value – The research has uniquely investigated the association between intangible project team relationships and tangible variable of cost growth.


2021 ◽  
Vol 36 ◽  
pp. 01014
Author(s):  
Fung Yuen Chin ◽  
Yong Kheng Goh

Feature selection is a process of selecting a group of relevant features by removing unnecessary features for use in constructing the predictive model. However, high dimensional data increases the difficulty of feature selection due to the curse of dimensionality. From the past research, the performance of the predictive model is always compared with the existing results. When attempting to model a new dataset, the current practice is to benchmark for the dataset obtained by including all the features, including redundant features and noise. Here we propose a new optimal baseline for the dataset by mean of ranked features using a mutual information score. The quality of a dataset depends on the information contained in the dataset, and the more information contains in the dataset, the better the performance of the predictive model. The number of features to achieve this new optimal baseline will be obtained at the same time, and serve as the guideline on the number of features needed in a feature selection method. We will also show some experimental results that the proposed method provides a better baseline with fewer features compared to the existing benchmark using all the features.


2017 ◽  
Vol 2017 ◽  
pp. 1-18 ◽  
Author(s):  
Andrea Bommert ◽  
Jörg Rahnenführer ◽  
Michel Lang

Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy.


2020 ◽  
Vol 53 (1-2) ◽  
pp. 104-118 ◽  
Author(s):  
Songrong Luo ◽  
Wenxian Yang ◽  
Hongbin Tang

Effective and efficient incipient fault diagnosis is vital to the maintenance and safe application of large-scale key mechanical system. Variable predictive model–based class discrimination is a recently developed multiclass discrimination method and has been proved to be potential tool for multi-fault detection. However, the vibration signals from dynamic mechanical system always present non-normal distribution so that the original variable predictive model–based class discrimination might produce the inaccurate outcomes. An improved variable predictive model–based class discrimination method is introduced at first in this work. At the same time, variable predictive model–based class discrimination will suffer computation difficulty in the case of high-dimension input features. Therefore, a novel feature selection method based on similarity-fuzzy entropy is presented to boost the performance of the variable predictive model–based class discrimination classifier. In this method, the ideal feature vectors are optimized to acquire more accurate similarity-fuzzy entropies for the input features. And, the one with the largest similarity-fuzzy entropy value is removed to refine input feature subset. Moreover, the optimal input features are repeatedly evaluated using the improved variable predictive model–based class discrimination classifier until the expected results are achieved. Finally, the incipient multi-fault diagnosis model for a hydraulic piston pump is established and verified by experimental test. Some comparisons with commonly used methods were made, and the results indicate that the proposed method is more effective and efficient.


Sign in / Sign up

Export Citation Format

Share Document