Clustering Algorithms for Direct Current Track Coded Signals

2019 Joint Rail Conference ◽

10.1115/jrc2019-1300 ◽

2019 ◽

Author(s):

Song Qin ◽

Nenad Mijatovic ◽

Jeffrey Fries ◽

James Kiss

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

High Availability ◽

Digital Analysis ◽

Signaling Systems ◽

Track Circuits ◽

Service Conditions ◽

Track Circuit ◽

Fail Safe

Designed for detecting train presence on tracks, track circuits must maintain a level of high availability for railway signaling systems. Due to the fail-safe nature of these critical devices, any failures will result in a declaration of occupancy in a section of track which restricts train movements. It is possible to automatically diagnose and, in some cases, predict the failures of track circuits by performing analytics on the track signals. In order to perform these analytics, we need to study the coded signals transmitted to and received from the track. However, these signals consist of heterogeneous pulses that are noisy for data analysis. Thus, we need techniques which will automatically group homogeneous pulses into similar groups. In this paper, we present data cleansing techniques which will cluster pulses based on digital analysis and machine learning. We report the results of our evaluation of clustering algorithms that improve the quality of analytic data. The data were captured under revenue service conditions operated by Alstom. For clustering algorithm, we used the k-means algorithm to cluster heterogeneous pulses. By tailoring the parameters for this algorithm, we can control the pulses of the cluster, allowing for further analysis of the track circuit signals in order to gain insight regarding its performance.

Download Full-text

A SELF-ORGANIZING MAP FOR MIXED CONTINUOUS AND CATEGORICAL DATA

International Journal of Computing ◽

10.47839/ijc.10.1.733 ◽

2011 ◽

pp. 24-32 ◽

Cited By ~ 1

Author(s):

Nicoleta Rogovschi ◽

Mustapha Lebbah ◽

Younès Bennani

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Mixed Data ◽

Categorical Variables ◽

Data Sets ◽

Self Organizing Map ◽

Data Set ◽

Public Data ◽

Self Organizing

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.

Download Full-text

Fine-Tuning an Algorithm for Semantic Document Clustering Using a Similarity Graph

International Journal of Semantic Computing ◽

10.1142/s1793351x16400195 ◽

2016 ◽

Vol 10 (04) ◽

pp. 527-555

Author(s):

Lubomir Stanchev

Keyword(s):

English Language ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Document Clustering ◽

Fine Tuning ◽

Human Judgment ◽

Multiple Parameters ◽

Similarity Graph ◽

Multiple Metrics

In this article, we examine an algorithm for document clustering using a similarity graph. The graph stores words and common phrases from the English language as nodes and it can be used to compute the degree of semantic similarity between any two phrases. One application of the similarity graph is semantic document clustering, that is, grouping documents based on the meaning of the words in them. Since our algorithm for semantic document clustering relies on multiple parameters, we examine how fine-tuning these values affects the quality of the result. Specifically, we use the Reuters-21578 benchmark, which contains [Formula: see text] newswire stories that are grouped in 82 categories using human judgment. We apply the k-means clustering algorithm to group the documents using a similarity metric that is based on keywords matching and one that uses the similarity graph. We evaluate the results of the clustering algorithms using multiple metrics, such as precision, recall, f-score, entropy, and purity.

Download Full-text

A New Length-Based Algebraic Multigrid Clustering Algorithm

VLSI Design ◽

10.1155/2012/395260 ◽

2012 ◽

Vol 2012 ◽

pp. 1-14

Author(s):

L. Rakai ◽

A. Farshidi ◽

L. Behjat ◽

D. Westwick

Keyword(s):

Clustering Algorithm ◽

A Priori ◽

Clustering Algorithms ◽

Algebraic Multigrid ◽

Estimation Technique ◽

Wire Length ◽

Clustering Technique ◽

Length Estimation ◽

Made In

Clustering algorithms have been used to improve the speed and quality of placement. Traditionally, clustering focuses on the local connections between cells. In this paper, a new clustering algorithm that is based on the estimated lengths of circuit interconnects and the connectivity is proposed. In the proposed algorithm, first an a priori length estimation technique is used to estimate the lengths of nets. Then, the estimated lengths are used in a clustering framework to modify a clustering technique based on algebraic multigrid (AMG), that finds the cells with the highest connectivity. Finally, based on the results from the AMG-based process, clusters are made. In addition, a new physical unclustering technique is proposed. The results show a significant improvement, reductions of up to 40%, in wire length can be achieved when using the proposed technique with three academic placers on industry-based circuits. Moreover, the runtime is not significantly degraded and can even be improved.

Download Full-text

Algorithms Optimization for Intelligent IoV Applications

Handbook of Research on Innovations and Applications of AI, IoT, and Cognitive Technologies - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-6870-5.ch001 ◽

2021 ◽

pp. 1-25

Author(s):

Elmustafa Sayed Ali Ahmed ◽

Zahraa Tagelsir Mohammed ◽

Mona Bakri Hassan ◽

Rashid A. Saeed

Keyword(s):

Quality Of Service ◽

Internet Of Things ◽

Network Topology ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

The Internet ◽

Internet Of Vehicles ◽

Topology Changes ◽

The Internet Of Things

Internet of vehicles (IoV) has recently become an emerging promising field of research due to the increasing number of vehicles each day. It is a part of the internet of things (IoT) which deals with vehicle communications. As vehicular nodes are considered always in motion, they cause frequent changes in the network topology. These changes cause issues in IoV such as scalability, dynamic topology changes, and shortest path for routing. In this chapter, the authors will discuss different optimization algorithms (i.e., clustering algorithms, ant colony optimization, best interface selection [BIS] algorithm, mobility adaptive density connected clustering algorithm, meta-heuristics algorithms, and quality of service [QoS]-based optimization). These algorithms provide an important intelligent role to optimize the operation of IoV networks and promise to develop new intelligent IoV applications.

Download Full-text

MR-BIRCH: A scalable MapReduce-based birch clustering algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202079 ◽

2020 ◽

pp. 1-11

Author(s):

Yufeng Li ◽

HaiTian Jiang ◽

Jiyong Lu ◽

Xiaozhong Li ◽

Zhiwei Sun ◽

...

Keyword(s):

Big Data ◽

Real World ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Statistical Information ◽

Main Memory ◽

Acceptable Result ◽

Clustering Quality ◽

Synthetic Datasets

Many classical clustering algorithms have been fitted into MapReduce, which provides a novel solution for clustering big data. However, several iterations are required to reach an acceptable result in most of the algorithms. For each iteration, a new MapReduce job must be executed to load the dataset into main memory, which results in high I/O overhead and poor efficiency. BIRCH algorithm stores only the statistical information of objects with CF entries and CF tree to cluster big data, but with the increase of the tree nodes, the main memory will be insufficient to contain more objects. Hence, BIRCH has to reduce the tree, which will degrade the clustering quality and decelerate the whole execution efficiency. To deal with the problem, BIRCH was fitted into MapReduce called MR-BIRCH in this paper. In contrast to a great number of MapReduce-based algorithms, MR-BIRCH loads dataset only once, and the dataset is processed parallel in several machines. The complexity and scalability were analyzed to evaluate the quality of MR-BIRCH, and MR-BIRCH was compared with Python sklearn BIRCH and Apache Mahout k-means on real-world and synthetic datasets. Experimental results show, most of the time, MR-BIRCH was better or equal to sklearn BIRCH, and it was competitive to Mahout k-means.

Download Full-text

An improved ant algorithm with LDA-based representation for text document clustering

Journal of Information Science ◽

10.1177/0165551516638784 ◽

2016 ◽

Vol 43 (2) ◽

pp. 275-292 ◽

Cited By ~ 24

Author(s):

Aytug Onan ◽

Hasan Bulut ◽

Serdar Korukoglu

Keyword(s):

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Clustering Algorithms ◽

Document Clustering ◽

Clustering Methods ◽

Initial Value ◽

Text Document ◽

Clustering Quality ◽

Text Features

Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents.

Download Full-text

Enhanced K-Means Clustering Algorithm Using Collaborative Filtering Approach

Oriental journal of computer science and technology ◽

10.13005/ojcst/10.02.31 ◽

2017 ◽

Vol 10 (2) ◽

pp. 474-479

Author(s):

Ankush Saklecha ◽

Jagdish Raikwal

Keyword(s):

Collaborative Filtering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Clustering Approach ◽

Improving Accuracy ◽

Accuracy Performance ◽

Number Of Iterations ◽

Filtering Approach ◽

Selection Of

Clustering is well-known unsupervised learning method. In clustering a set of essentials is separated into uniform groups.K-means is one of the most popular partition based clustering algorithms in the area of research. But in the original K-means the quality of the resulting clusters mostly depends on the selection of initial centroids, so number of iterations is increase and take more time because of that it is computationally expensive. There are so many methods have been proposed for improving accuracy, performance and efficiency of the k-means clustering algorithm. This paper proposed enhanced K-Means Clustering approach in addition to Collaborative filtering approach to recommend quality content to its users. This research would help those users who have to scroll through pages of results to find important content.

Download Full-text

A SEQUENCE-ELEMENT-BASED HIERARCHICAL CLUSTERING ALGORITHM FOR CATEGORICAL SEQUENCE DATA

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622005001398 ◽

2005 ◽

Vol 04 (01) ◽

pp. 81-96 ◽

Cited By ~ 5

Author(s):

SEUNG-JOON OH ◽

JAE-YEARN KIM

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Sequence Data ◽

Clustering Algorithms ◽

Scientific Data ◽

Sequence Element ◽

Hierarchical Clustering Algorithm ◽

Synthetic Datasets ◽

Better Than

Recently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. However, few existing clustering algorithms consider sequentiality. In this paper, we study how to cluster these sequence datasets. We propose a new similarity measure to compute the similarity between two sequences. In the proposed measure, subsets of a sequence are considered, and the more identical subsets there are, the more similar the two sequences. In addition, we propose a hierarchical clustering algorithm and an efficient method for measuring similarity. Using a splice dataset and synthetic datasets, we show that the quality of clusters generated by our proposed approach is better than that of clusters produced by traditional clustering algorithms.

Download Full-text

Cross Breed Clustering Algorithm for High Dimensional Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a5313.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 5049-5052

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

High Dimensional Data ◽

High Dimensional ◽

Growing Domain ◽

Present World

Clustering plays a major role in machine learning and also in data mining. Deep learning is fast growing domain in present world. Improving the quality of the clustering results by adopting the deep learning algorithms. Many clustering algorithm process various datasets to get the better results. But for the high dimensional data clustering is still an issue to process and get the quality clustering results with the existing clustering algorithms. In this paper, the cross breed clustering algorithm for high dimensional data is utilized. Various datasets are used to get the results.

Download Full-text

Develop a dynamic DBSCAN algorithm for solving initial parameter selection problem of the DBSCAN algorithm

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v23.i3.pp1602-1610 ◽

2021 ◽

Vol 23 (3) ◽

pp. 1602

Author(s):

Md. Zakir Hossain ◽

Md. Jakirul Islam ◽

Md. Waliur Rahman Miah ◽

Jahid Hasan Rony ◽

Momotaz Begum

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Cluster Radius ◽

Dbscan Algorithm ◽

Clustering Quality ◽

Data Clusters ◽

Minimum Number ◽

The Given ◽

Clustering Problems

The amount of data has been increasing exponentially in every sector such as banking securities, healthcare, education, manufacturing, consumer-trade, transportation, and energy. Most of these data are noise, different in shapes, and outliers. In such cases, it is challenging to find the desired data clusters using conventional clustering algorithms. DBSCAN is a popular clustering algorithm which is widely used for noisy, arbitrary shape, and outlier data. However, its performance highly depends on the proper selection of cluster radius (Eps) and the minimum number of points (MinPts) that are required for forming clusters for the given dataset. In the case of real-world clustering problems, it is a difficult task to select the exact value of Eps and (MinPts) to perform the clustering on unknown datasets. To address these, this paper proposes a dynamic DBSCAN algorithm that calculates the suitable value for (Eps) and (MinPts) dynamically by which the clustering quality of the given problem will be increased. This paper evaluates the performance of the dynamic DBSCAN algorithm over seven challenging datasets. The experimental results confirm the effectiveness of the dynamic DBSCAN algorithm over the well-known clustering algorithms.

Download Full-text