SAFETY RISK EVALUATIONS OF DEEP FOUNDATION CONSTRUCTION SCHEMES BASED ON IMBALANCED DATA SETS

Safety risk evaluations of deep foundation construction schemes are important to ensure safety. However, the amount of knowledge on these evaluations is large, and the historical data of deep foundation engineering is imbalanced. Some adverse factors influence the quality and efficiency of evaluations using traditional manual evaluation tools. Machine learning guarantees the quality of imbalanced data classifications. In this study, three strategies are proposed to improve the classification accuracy of imbalanced data sets. First, data set information redundancy is reduced using a binary particle swarm optimization algorithm. Then, a classification algorithm is modified using an Adaboost-enhanced support vector machine classifier. Finally, a new classification evaluation standard, namely, the area under the ROC curve, is adopted to ensure the classifier to be impartial to the minority. A transverse comparison experiment using multiple classification algorithms shows that the proposed integrated classification algorithm can overcome difficulties associated with correctly classifying minority samples in imbalanced data sets. The algorithm can also improve construction safety management evaluations, relieve the pressure from the lack of experienced experts accompanying rapid infrastructure construction, and facilitate knowledge reuse in the field of architecture, engineering, and construction.

Download Full-text

Classification with Local Clustering in Imbalanced Data Sets

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.219-220.151 ◽

2011 ◽

Vol 219-220 ◽

pp. 151-155 ◽

Cited By ~ 2

Author(s):

Hua Ji ◽

Hua Xiang Zhang

Keyword(s):

Data Distribution ◽

Imbalanced Data ◽

Support Vector ◽

Data Sets ◽

Data Set ◽

Imbalanced Data Sets ◽

Local Clustering ◽

Rare Class ◽

Novel Method ◽

The Cost

In many real-world domains, learning from imbalanced data sets is always confronted. Since the skewed class distribution brings the challenge for traditional classifiers because of much lower classification accuracy on rare classes, we propose the novel method on classification with local clustering based on the data distribution of the imbalanced data sets to solve this problem. At first, we divide the whole data set into several data groups based on the data distribution. Then we perform local clustering within each group both on the normal class and the disjointed rare class. For rare class, the subsequent over-sampling is employed according to the different rates. At last, we apply support vector machines (SVMS) for classification, by means of the traditional tactic of the cost matrix to enhance the classification accuracies. The experimental results on several UCI data sets show that this method can produces much higher prediction accuracies on the rare class than state-of-art methods.

Download Full-text

Imbalanced Data Classification Using Cost-Sensitive Support Vector Machine Based on Information Entropy

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.989-994.1756 ◽

2014 ◽

Vol 989-994 ◽

pp. 1756-1761 ◽

Cited By ~ 3

Author(s):

Wei Duan ◽

Liang Jing ◽

Xiang Yang Lu

Keyword(s):

Support Vector Machine ◽

Information Entropy ◽

Imbalanced Data ◽

Support Vector ◽

Data Sets ◽

Classification Problems ◽

Data Set ◽

Imbalanced Data Sets ◽

Penalty Factor ◽

Imbalanced Data Classification

As a supervised classification algorithm, Support Vector Machine (SVM) has an excellent ability in solving small samples, nonlinear and high dimensional classification problems. However, SVM is inefficient for imbalanced data sets classification. Therefore, a cost sensitive SVM (CSSVM) should be designed for imbalanced data sets classification. This paper proposes a method which constructed CSSVM based on information entropy, and in this method the information entropies of different classes of data set are used to determine the values of penalty factor of CSSVM.

Download Full-text

Imbalanced Data Detection Kernel Method in Closed Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.3652 ◽

2013 ◽

Vol 756-759 ◽

pp. 3652-3658

Author(s):

You Li Lu ◽

Jun Luo

Keyword(s):

Kernel Methods ◽

Kernel Method ◽

Imbalanced Data ◽

Data Detection ◽

Data Sets ◽

System Call ◽

Data Set ◽

Imbalanced Data Sets ◽

Lower Complexity ◽

Closed Systems

Under the study of Kernel Methods, this paper put forward two improved algorithm which called R-SVM & I-SVDD in order to cope with the imbalanced data sets in closed systems. R-SVM used K-means algorithm clustering space samples while I-SVDD improved the performance of original SVDD by imbalanced sample training. Experiment of two sets of system call data set shows that these two algorithms are more effectively and R-SVM has a lower complexity.

Download Full-text

Affinity and class probability-based fuzzy support vector machine for imbalanced data sets

Neural Networks ◽

10.1016/j.neunet.2019.10.016 ◽

2020 ◽

Vol 122 ◽

pp. 289-307 ◽

Cited By ~ 8

Author(s):

Xinmin Tao ◽

Qing Li ◽

Chao Ren ◽

Wenjie Guo ◽

Qing He ◽

...

Keyword(s):

Support Vector Machine ◽

Imbalanced Data ◽

Support Vector ◽

Data Sets ◽

Fuzzy Support Vector Machine ◽

Imbalanced Data Sets ◽

Class Probability

Download Full-text

Exemplar-Based Learning Classifier System with Dynamic Matching Range for Imbalanced Data

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2017.p0868 ◽

2017 ◽

Vol 21 (5) ◽

pp. 868-875

Author(s):

Hiroyasu Matsushima ◽

Keiki Takadama ◽

◽

Keyword(s):

Imbalanced Data ◽

Data Sets ◽

Sigmoid Function ◽

Learning Classifier System ◽

Data Set ◽

Imbalanced Data Sets ◽

Dynamic Matching ◽

Learning Classifier ◽

Stable Performance ◽

The Given

In this paper, we propose a method to improve ECS-DMR which enables appropriate output for imbalanced data sets. In order to control generalization of LCS in imbalanced data set, we propose a method of applying imbalance ratio of data set to a sigmoid function, and then, appropriately update the matching range. In comparison with our previous work (ECS-DMR), the proposed method can control the generalization of the appropriate matching range automatically to extract the exemplars that cover the given problem space, wchich consists of imbalanced data set. From the experimental results, it is suggested that the proposed method provides stable performance to imbalanced data set. The effect of the proposed method using the sigmoid function considering the data balance is shown.

Download Full-text

An Instance Selection Algorithm Based on ReliefF

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213019500015 ◽

2019 ◽

Vol 28 (01) ◽

pp. 1950001 ◽

Cited By ~ 2

Author(s):

Zeinab Abbasi ◽

Mohsen Rahmani

Keyword(s):

Missing Values ◽

Imbalanced Data ◽

Jaccard Index ◽

Instance Selection ◽

Data Sets ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Data Set ◽

Imbalanced Data Sets ◽

Numeric Data

Due to the increasing growth of data, many methods are proposed to extract useful data and remove noisy data. Instance selection is one of these methods which selects some instances of a data set and removes others. This paper proposes a new instance selection algorithm based on ReliefF, which is a feature selection algorithm. In the proposed algorithm, based on the Jaccard index, the nearest instances of each class are found for each instance. Then, based on the nearest neighbor’s set, the weight of each instance is calculated. Finally, only instances with more weights are selected. This algorithm can reduce data at a specified rate and have the ability to run parallel on the instances. It can work on a variety of data sets with nominal and numeric data with missing values and is also suitable for working with imbalanced data sets. The proposed algorithm tests on three data sets. Results show that the proposed algorithm can reduce the volume of data, without a significant change in classification accuracy of these datasets.

Download Full-text

Learning from imbalanced data sets with a Min-Max modular support vector machine

Frontiers of Electrical and Electronic Engineering in China ◽

10.1007/s11460-011-0127-1 ◽

2011 ◽

Vol 6 (1) ◽

pp. 56-71 ◽

Cited By ~ 8

Author(s):

Lu Bao-Liang ◽

Wang Xiao-Lin ◽

Yang Yang ◽

Zhao Hai

Keyword(s):

Support Vector Machine ◽

Imbalanced Data ◽

Support Vector ◽

Data Sets ◽

Imbalanced Data Sets

Download Full-text

Learning Imbalanced Data Sets with a Min-Max Modular Support Vector Machine

2007 International Joint Conference on Neural Networks ◽

10.1109/ijcnn.2007.4371209 ◽

2007 ◽

Cited By ~ 1

Author(s):

Zhi-Fei Ye ◽

Bao-Liang Lu

Keyword(s):

Support Vector Machine ◽

Imbalanced Data ◽

Support Vector ◽

Data Sets ◽

Imbalanced Data Sets

Download Full-text

Variant of Data Particle Geometrical Divide for Imbalanced Data Sets Classification by the Example of Occupancy Detection

Applied Sciences ◽

10.3390/app11114970 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4970

Author(s):

Łukasz Rybak ◽

Janusz Dudczyk

Keyword(s):

Learning Algorithm ◽

Imbalanced Data ◽

Evaluation Process ◽

Unbalanced Data ◽

Data Sets ◽

Classification Problems ◽

Mass Model ◽

Data Set ◽

Imbalanced Data Sets ◽

Occupancy Detection

The history of gravitational classification started in 1977. Over the years, the gravitational approaches have reached many extensions, which were adapted into different classification problems. This article is the next stage of the research concerning the algorithms of creating data particles by their geometrical divide. In the previous analyses it was established that the Geometrical Divide (GD) method outperforms the algorithm creating the data particles based on classes by a compound of 1 ÷ 1 cardinality. This occurs in the process of balanced data sets classification, in which class centroids are close to each other and the groups of objects, described by different labels, overlap. The purpose of the article was to examine the efficiency of the Geometrical Divide method in the unbalanced data sets classification, by the example of real case-occupancy detecting. In addition, in the paper, the concept of the Unequal Geometrical Divide (UGD) was developed. The evaluation of approaches was conducted on 26 unbalanced data sets-16 with the features of Moons and Circles data sets and 10 created based on real occupancy data set. In the experiment, the GD method and its unbalanced variant (UGD) as well as the 1CT1P approach, were compared. Each method was combined with three data particle mass determination algorithms-n-Mass Model (n-MM), Stochastic Learning Algorithm (SLA) and Bath-update Algorithm (BLA). k-fold cross validation method, precision, recall, F-measure, and number of used data particles were applied in the evaluation process. Obtained results showed that the methods based on geometrical divide outperform the 1CT1P approach in the imbalanced data sets classification. The article’s conclusion describes the observations and indicates the potential directions of further research and development of methods, which concern creating the data particle through its geometrical divide.

Download Full-text

Boosting Support Vector Machines for Imbalanced Data Sets

Lecture Notes in Computer Science - Foundations of Intelligent Systems ◽

10.1007/978-3-540-68123-6_4 ◽

2008 ◽

pp. 38-47 ◽

Cited By ~ 22

Author(s):

Benjamin X. Wang ◽

Nathalie Japkowicz

Keyword(s):

Support Vector Machines ◽

Imbalanced Data ◽

Support Vector ◽

Data Sets ◽

Imbalanced Data Sets ◽

Vector Machines

Download Full-text