A Graph Clustering Algorithm for the Homology Detection

2011 ◽  
Vol 52-54 ◽  
pp. 1981-1986
Author(s):  
Li Xiao ◽  
Jing Zhong Xiao

In order to detect a large number of source program samples which are homologous files (files with plagiarism), a new graph-based cluster detection algorithm is proposed,the algorithm is divided into two phases, in the first phase, proposed algorithm based on the keyword program to calculate pairwise similarity in the detected sample program files,in the second stage,by means of graph clustering algorithm, the results of the first phase is dectected, homologous files (files with plagiarism) will form a cluster. The simulation results shows that the algorithm improved detection rate compare with the traditional homologous files detection algorithm and can determine which files are homologous.

2020 ◽  
Vol 39 (6) ◽  
pp. 8139-8147
Author(s):  
Ranganathan Arun ◽  
Rangaswamy Balamurugan

In Wireless Sensor Networks (WSN) the energy of Sensor nodes is not certainly sufficient. In order to optimize the endurance of WSN, it is essential to minimize the utilization of energy. Head of group or Cluster Head (CH) is an eminent method to develop the endurance of WSN that aggregates the WSN with higher energy. CH for intra-cluster and inter-cluster communication becomes dependent. For complete, in WSN, the Energy level of CH extends its life of cluster. While evolving cluster algorithms, the complicated job is to identify the energy utilization amount of heterogeneous WSNs. Based on Chaotic Firefly Algorithm CH (CFACH) selection, the formulated work is named “Novel Distributed Entropy Energy-Efficient Clustering Algorithm”, in short, DEEEC for HWSNs. The formulated DEEEC Algorithm, which is a CH, has two main stages. In the first stage, the identification of temporary CHs along with its entropy value is found using the correlative measure of residual and original energy. Along with this, in the clustering algorithm, the rotating epoch and its entropy value must be predicted automatically by its sensor nodes. In the second stage, if any member in the cluster having larger residual energy, shall modify the temporary CHs in the direction of the deciding set. The target of the nodes with large energy has the probability to be CHs which is determined by the above two stages meant for CH selection. The MATLAB is required to simulate the DEEEC Algorithm. The simulated results of the formulated DEEEC Algorithm produce good results with respect to the energy and increased lifetime when it is correlated with the current traditional clustering protocols being used in the Heterogeneous WSNs.


2021 ◽  
Vol 11 (10) ◽  
pp. 4497
Author(s):  
Dongming Chen ◽  
Mingshuo Nie ◽  
Jie Wang ◽  
Yun Kong ◽  
Dongqi Wang ◽  
...  

Aiming at analyzing the temporal structures in evolutionary networks, we propose a community detection algorithm based on graph representation learning. The proposed algorithm employs a Laplacian matrix to obtain the node relationship information of the directly connected edges of the network structure at the previous time slice, the deep sparse autoencoder learns to represent the network structure under the current time slice, and the K-means clustering algorithm is used to partition the low-dimensional feature matrix of the network structure under the current time slice into communities. Experiments on three real datasets show that the proposed algorithm outperformed the baselines regarding effectiveness and feasibility.


2021 ◽  
Vol 18 (2) ◽  
pp. 172988142110087
Author(s):  
Qiao Huang ◽  
Jinlong Liu

The vision-based road lane detection technique plays a key role in driver assistance system. While existing lane recognition algorithms demonstrated over 90% detection rate, the validation test was usually conducted on limited scenarios. Significant gaps still exist when applied in real-life autonomous driving. The goal of this article was to identify these gaps and to suggest research directions that can bridge them. The straight lane detection algorithm based on linear Hough transform (HT) was used in this study as an example to evaluate the possible perception issues under challenging scenarios, including various road types, different weather conditions and shades, changed lighting conditions, and so on. The study found that the HT-based algorithm presented an acceptable detection rate in simple backgrounds, such as driving on a highway or conditions showing distinguishable contrast between lane boundaries and their surroundings. However, it failed to recognize road dividing lines under varied lighting conditions. The failure was attributed to the binarization process failing to extract lane features before detections. In addition, the existing HT-based algorithm would be interfered by lane-like interferences, such as guardrails, railways, bikeways, utility poles, pedestrian sidewalks, buildings and so on. Overall, all these findings support the need for further improvements of current road lane detection algorithms to be robust against interference and illumination variations. Moreover, the widely used algorithm has the potential to raise the lane boundary detection rate if an appropriate search range restriction and illumination classification process is added.


2020 ◽  
Vol 16 (4) ◽  
pp. 15-29
Author(s):  
Jayalakshmi D. ◽  
Dheeba J.

The incidence of skin cancer has been increasing in recent years and it can become dangerous if not detected early. Computer-aided diagnosis systems can help the dermatologists in assisting with skin cancer detection by examining the features more critically. In this article, a detailed review of pre-processing and segmentation methods is done on skin lesion images by investigating existing and prevalent segmentation methods for the diagnosis of skin cancer. The pre-processing stage is divided into two phases, in the first phase, a median filter is used to remove the artifact; and in the second phase, an improved K-means clustering with outlier removal (KMOR) algorithm is suggested. The proposed method was tested in a publicly available Danderm database. The improved cluster-based algorithm gives an accuracy of 92.8% with a sensitivity of 93% and specificity of 90% with an AUC value of 0.90435. From the experimental results, it is evident that the clustering algorithm has performed well in detecting the border of the lesion and is suitable for pre-processing dermoscopic images.


2020 ◽  
Vol 1 (2) ◽  
pp. 101-123
Author(s):  
Hiroaki Shiokawa ◽  
Yasunori Futamura

This paper addressed the problem of finding clusters included in graph-structured data such as Web graphs, social networks, and others. Graph clustering is one of the fundamental techniques for understanding structures present in the complex graphs such as Web pages, social networks, and others. In the Web and data mining communities, the modularity-based graph clustering algorithm is successfully used in many applications. However, it is difficult for the modularity-based methods to find fine-grained clusters hidden in large-scale graphs; the methods fail to reproduce the ground truth. In this paper, we present a novel modularity-based algorithm, \textit{CAV}, that shows better clustering results than the traditional algorithm. The proposed algorithm employs a cohesiveness-aware vector partitioning into the graph spectral analysis to improve the clustering accuracy. Additionally, this paper also presents a novel efficient algorithm \textit{P-CAV} for further improving the clustering speed of CAV; P-CAV is an extension of CAV that utilizes the thread-based parallelization on a many-core CPU. Our extensive experiments on synthetic and public datasets demonstrate the performance superiority of our approaches over the state-of-the-art approaches.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Yiwen Zhang ◽  
Yuanyuan Zhou ◽  
Xing Guo ◽  
Jintao Wu ◽  
Qiang He ◽  
...  

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.


2011 ◽  
Vol 225-226 ◽  
pp. 996-999
Author(s):  
Li Jun Sun ◽  
Shou Yong Zhang ◽  
Wei Sheng Wang ◽  
Xiao Ning Zhang

In an adaptive echo canceller, the detection algorithm able to distinguish echo path change (EPC) from double-talk (DT) is vital to ensure that adaptive filter tap coefficients are updated in case of EPC and frozen during the DT period. The paper presents a new echo cancel algorithm, which can protect the adaptive filter performance during double-talk in acoustic echo cancellation of teleconference without setting a detector. A judgment value can be directly used in the iteration formula to control the iteration speed of the filter, which composed of the correlation of the far-end signal and near-end received signal, the pre-correlation of the error signal. The computer simulation results verify that the mentioned algorithm has the good double talk protection performance, and it is very useful and efficient in distinguishing EPC from DT but with less computational complexity contrast to the congener algorithm.


2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Laura Millán-Roures ◽  
Irene Epifanio ◽  
Vicente Martínez

A functional data analysis (FDA) based methodology for detecting anomalous flows in urban water networks is introduced. Primary hydraulic variables are recorded in real-time by telecontrol systems, so they are functional data (FD). In the first stage, the data are validated (false data are detected) and reconstructed, since there could be not only false data, but also missing and noisy data. FDA tools are used such as tolerance bands for FD and smoothing for dense and sparse FD. In the second stage, functional outlier detection tools are used in two phases. In Phase I, the data are cleared of anomalies to ensure that data are representative of the in-control system. The objective of Phase II is system monitoring. A new functional outlier detection method is also proposed based on archetypal analysis. The methodology is applied and illustrated with real data. A simulated study is also carried out to assess the performance of the outlier detection techniques, including our proposal. The results are very promising.


Sign in / Sign up

Export Citation Format

Share Document