Clustering Algorithms for Direct Current Track Coded Signals

Author(s):  
Song Qin ◽  
Nenad Mijatovic ◽  
Jeffrey Fries ◽  
James Kiss

Designed for detecting train presence on tracks, track circuits must maintain a level of high availability for railway signaling systems. Due to the fail-safe nature of these critical devices, any failures will result in a declaration of occupancy in a section of track which restricts train movements. It is possible to automatically diagnose and, in some cases, predict the failures of track circuits by performing analytics on the track signals. In order to perform these analytics, we need to study the coded signals transmitted to and received from the track. However, these signals consist of heterogeneous pulses that are noisy for data analysis. Thus, we need techniques which will automatically group homogeneous pulses into similar groups. In this paper, we present data cleansing techniques which will cluster pulses based on digital analysis and machine learning. We report the results of our evaluation of clustering algorithms that improve the quality of analytic data. The data were captured under revenue service conditions operated by Alstom. For clustering algorithm, we used the k-means algorithm to cluster heterogeneous pulses. By tailoring the parameters for this algorithm, we can control the pulses of the cluster, allowing for further analysis of the track circuit signals in order to gain insight regarding its performance.

2011 ◽  
pp. 24-32 ◽  
Author(s):  
Nicoleta Rogovschi ◽  
Mustapha Lebbah ◽  
Younès Bennani

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.


2016 ◽  
Vol 10 (04) ◽  
pp. 527-555
Author(s):  
Lubomir Stanchev

In this article, we examine an algorithm for document clustering using a similarity graph. The graph stores words and common phrases from the English language as nodes and it can be used to compute the degree of semantic similarity between any two phrases. One application of the similarity graph is semantic document clustering, that is, grouping documents based on the meaning of the words in them. Since our algorithm for semantic document clustering relies on multiple parameters, we examine how fine-tuning these values affects the quality of the result. Specifically, we use the Reuters-21578 benchmark, which contains [Formula: see text] newswire stories that are grouped in 82 categories using human judgment. We apply the k-means clustering algorithm to group the documents using a similarity metric that is based on keywords matching and one that uses the similarity graph. We evaluate the results of the clustering algorithms using multiple metrics, such as precision, recall, f-score, entropy, and purity.


VLSI Design ◽  
2012 ◽  
Vol 2012 ◽  
pp. 1-14
Author(s):  
L. Rakai ◽  
A. Farshidi ◽  
L. Behjat ◽  
D. Westwick

Clustering algorithms have been used to improve the speed and quality of placement. Traditionally, clustering focuses on the local connections between cells. In this paper, a new clustering algorithm that is based on the estimated lengths of circuit interconnects and the connectivity is proposed. In the proposed algorithm, first an a priori length estimation technique is used to estimate the lengths of nets. Then, the estimated lengths are used in a clustering framework to modify a clustering technique based on algebraic multigrid (AMG), that finds the cells with the highest connectivity. Finally, based on the results from the AMG-based process, clusters are made. In addition, a new physical unclustering technique is proposed. The results show a significant improvement, reductions of up to 40%, in wire length can be achieved when using the proposed technique with three academic placers on industry-based circuits. Moreover, the runtime is not significantly degraded and can even be improved.


Author(s):  
Elmustafa Sayed Ali Ahmed ◽  
Zahraa Tagelsir Mohammed ◽  
Mona Bakri Hassan ◽  
Rashid A. Saeed

Internet of vehicles (IoV) has recently become an emerging promising field of research due to the increasing number of vehicles each day. It is a part of the internet of things (IoT) which deals with vehicle communications. As vehicular nodes are considered always in motion, they cause frequent changes in the network topology. These changes cause issues in IoV such as scalability, dynamic topology changes, and shortest path for routing. In this chapter, the authors will discuss different optimization algorithms (i.e., clustering algorithms, ant colony optimization, best interface selection [BIS] algorithm, mobility adaptive density connected clustering algorithm, meta-heuristics algorithms, and quality of service [QoS]-based optimization). These algorithms provide an important intelligent role to optimize the operation of IoV networks and promise to develop new intelligent IoV applications.


2020 ◽  
pp. 1-11
Author(s):  
Yufeng Li ◽  
HaiTian Jiang ◽  
Jiyong Lu ◽  
Xiaozhong Li ◽  
Zhiwei Sun ◽  
...  

Many classical clustering algorithms have been fitted into MapReduce, which provides a novel solution for clustering big data. However, several iterations are required to reach an acceptable result in most of the algorithms. For each iteration, a new MapReduce job must be executed to load the dataset into main memory, which results in high I/O overhead and poor efficiency. BIRCH algorithm stores only the statistical information of objects with CF entries and CF tree to cluster big data, but with the increase of the tree nodes, the main memory will be insufficient to contain more objects. Hence, BIRCH has to reduce the tree, which will degrade the clustering quality and decelerate the whole execution efficiency. To deal with the problem, BIRCH was fitted into MapReduce called MR-BIRCH in this paper. In contrast to a great number of MapReduce-based algorithms, MR-BIRCH loads dataset only once, and the dataset is processed parallel in several machines. The complexity and scalability were analyzed to evaluate the quality of MR-BIRCH, and MR-BIRCH was compared with Python sklearn BIRCH and Apache Mahout k-means on real-world and synthetic datasets. Experimental results show, most of the time, MR-BIRCH was better or equal to sklearn BIRCH, and it was competitive to Mahout k-means.


2016 ◽  
Vol 43 (2) ◽  
pp. 275-292 ◽  
Author(s):  
Aytug Onan ◽  
Hasan Bulut ◽  
Serdar Korukoglu

Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents.


2017 ◽  
Vol 10 (2) ◽  
pp. 474-479
Author(s):  
Ankush Saklecha ◽  
Jagdish Raikwal

Clustering is well-known unsupervised learning method. In clustering a set of essentials is separated into uniform groups.K-means is one of the most popular partition based clustering algorithms in the area of research. But in the original K-means the quality of the resulting clusters mostly depends on the selection of initial centroids, so number of iterations is increase and take more time because of that it is computationally expensive. There are so many methods have been proposed for improving accuracy, performance and efficiency of the k-means clustering algorithm. This paper proposed enhanced K-Means Clustering approach in addition to Collaborative filtering approach to recommend quality content to its users. This research would help those users who have to scroll through pages of results to find important content.


Author(s):  
SEUNG-JOON OH ◽  
JAE-YEARN KIM

Recently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. However, few existing clustering algorithms consider sequentiality. In this paper, we study how to cluster these sequence datasets. We propose a new similarity measure to compute the similarity between two sequences. In the proposed measure, subsets of a sequence are considered, and the more identical subsets there are, the more similar the two sequences. In addition, we propose a hierarchical clustering algorithm and an efficient method for measuring similarity. Using a splice dataset and synthetic datasets, we show that the quality of clusters generated by our proposed approach is better than that of clusters produced by traditional clustering algorithms.


Clustering plays a major role in machine learning and also in data mining. Deep learning is fast growing domain in present world. Improving the quality of the clustering results by adopting the deep learning algorithms. Many clustering algorithm process various datasets to get the better results. But for the high dimensional data clustering is still an issue to process and get the quality clustering results with the existing clustering algorithms. In this paper, the cross breed clustering algorithm for high dimensional data is utilized. Various datasets are used to get the results.


Author(s):  
Md. Zakir Hossain ◽  
Md. Jakirul Islam ◽  
Md. Waliur Rahman Miah ◽  
Jahid Hasan Rony ◽  
Momotaz Begum

<p>The amount of data has been increasing exponentially in every sector such as banking securities, healthcare, education, manufacturing, consumer-trade, transportation, and energy. Most of these data are noise, different in shapes, and outliers. In such cases, it is challenging to find the desired data clusters using conventional clustering algorithms. DBSCAN is a popular clustering algorithm which is widely used for noisy, arbitrary shape, and outlier data. However, its performance highly depends on the proper selection of cluster radius <em>(Eps)</em> and the minimum number of points <em>(MinPts)</em> that are required for forming clusters for the given dataset. In the case of real-world clustering problems, it is a difficult task to select the exact value of Eps and <em>(MinPts)</em> to perform the clustering on unknown datasets. To address these, this paper proposes a dynamic DBSCAN algorithm that calculates the suitable value for <em>(Eps)</em> and <em>(MinPts)</em> dynamically by which the clustering quality of the given problem will be increased. This paper evaluates the performance of the dynamic DBSCAN algorithm over seven challenging datasets. The experimental results confirm the effectiveness of the dynamic DBSCAN algorithm over the well-known clustering algorithms.</p>


Sign in / Sign up

Export Citation Format

Share Document