An Improved Version of K-medoid Algorithm using CRO

Clustering is the process of grouping a set of patterns into different disjoint clusters where each cluster contains the alike patterns. Many algorithms had been proposed before for clustering. K-medoid is a variant of k-mean that use an actual point in the cluster to represent it instead of the mean in the k-mean algorithm to get the outliers and reduce noise in the cluster. In order to enhance performance of k-medoid algorithm and get more accurate clusters, a hybrid algorithm is proposed which use CRO algorithm along with k-medoid. In this method, CRO is used to expand searching for the optimal medoid and enhance clustering by getting more precise results. The performance of the new algorithm is evaluated by comparing its results with five clustering algorithms, k-mean, k-medoid, DB/rand/1/bin, CRO based clustering algorithm and hybrid CRO-k-mean by using four real world datasets: Lung cancer, Iris, Breast cancer Wisconsin and Haberman’s survival from UCI machine learning data repository. The results were conducted and compared base on different metrics and show that proposed algorithm enhanced clustering technique by giving more accurate results.

Download Full-text

The Utilization of Physics Parameter to Classify Histopathology Types of Invasive Ductal Carcinoma (IDC) and Invasive Lobular Carcinoma (ILC) by using K-Nearest Neighbourhood (KNN) Method

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v8i4.pp2442-2450 ◽

2018 ◽

Vol 8 (4) ◽

pp. 2442

Author(s):

Anak Agung Ngurah Gunawan ◽

I Wayan Supardi ◽

S. Poniman ◽

Bagus G. Dharmawan

Keyword(s):

Breast Cancer ◽

Invasive Ductal Carcinoma ◽

Invasive Lobular Carcinoma ◽

Ductal Carcinoma ◽

Lobular Carcinoma ◽

Diagnose Breast Cancer ◽

Object Based ◽

Computer Aided ◽

The Mean ◽

Learning Data

<p>Medical imaging process has evolved since 1996 until now. The forming of Computer Aided Diagnostic (CAD) is very helpful to the radiologists to diagnose breast cancer. KNN method is a method to do classification toward the object based on the learning data which the range is nearest to the object. We analysed two types of cancers IDC dan ILC. 10 parameters were observed in 1-10 pixels distance in 145 IDC dan 7 ILC. We found that the Mean of Hm(yd,d) at 1-5 pixeis the only significant parameters that distingguish IDC and ILC. This parameter at 1-5 pixels should be applied in KNN method. This finding need to be tested in diffrerent areas before it will be applied in cancer diagnostic.</p>

Download Full-text

An Improved Spectral Clustering Community Detection Algorithm Based on Probability Matrix

Discrete Dynamics in Nature and Society ◽

10.1155/2020/4540302 ◽

2020 ◽

Vol 2020 ◽

pp. 1-6

Author(s):

Shuxia Ren ◽

Shubo Zhang ◽

Tao Wu

Keyword(s):

Community Detection ◽

Spectral Clustering ◽

Clustering Algorithm ◽

Transition Probability ◽

Clustering Algorithms ◽

Detection Algorithm ◽

Community Information ◽

The Mean ◽

Community Detection Algorithm ◽

Spectral Clustering Algorithm

The similarity graphs of most spectral clustering algorithms carry lots of wrong community information. In this paper, we propose a probability matrix and a novel improved spectral clustering algorithm based on the probability matrix for community detection. First, the Markov chain is used to calculate the transition probability between nodes, and the probability matrix is constructed by the transition probability. Then, the similarity graph is constructed with the mean probability matrix. Finally, community detection is achieved by optimizing the NCut objective function. The proposed algorithm is compared with SC, WT, FG, FluidC, and SCRW on artificial networks and real networks. Experimental results show that the proposed algorithm can detect communities more accurately and has better clustering performance.

Download Full-text

Adaptive Initialization Method for K-Means Algorithm

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.740817 ◽

2021 ◽

Vol 4 ◽

Author(s):

Jie Yang ◽

Yu-Kai Wang ◽

Xin Yao ◽

Chin-Teng Lin

Keyword(s):

Time Complexity ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Life ◽

Superior Performance ◽

Local Optima ◽

Initial Cluster ◽

Higher Dimensional ◽

Real World Datasets ◽

Random Method

The K-means algorithm is a widely used clustering algorithm that offers simplicity and efficiency. However, the traditional K-means algorithm uses a random method to determine the initial cluster centers, which make clustering results prone to local optima and then result in worse clustering performance. In this research, we propose an adaptive initialization method for the K-means algorithm (AIMK) which can adapt to the various characteristics in different datasets and obtain better clustering performance with stable results. For larger or higher-dimensional datasets, we even leverage random sampling in AIMK (name as AIMK-RS) to reduce the time complexity. 22 real-world datasets were applied for performance comparisons. The experimental results show AIMK and AIMK-RS outperform the current initialization methods and several well-known clustering algorithms. Specifically, AIMK-RS can significantly reduce the time complexity to O (n). Moreover, we exploit AIMK to initialize K-medoids and spectral clustering, and better performance is also explored. The above results demonstrate superior performance and good scalability by AIMK or AIMK-RS. In the future, we would like to apply AIMK to more partition-based clustering algorithms to solve real-life practical problems.

Download Full-text

An Improved Similarity-based Clustering Algorithm for Multi-database Mining

10.20944/preprints202104.0256.v1 ◽

2021 ◽

Author(s):

Salim Miloudi ◽

Yulin Wang ◽

Wenjia Ding

Keyword(s):

Clustering Algorithm ◽

Learning Algorithm ◽

Clustering Algorithms ◽

Back Propagation ◽

Mean Value ◽

Learning Rate ◽

Database Mining ◽

Gradient Based ◽

Series Of Experiments ◽

The Mean

Clustering algorithms for multi-database mining (MDM) rely on computing $(n^2-n)/2$ pairwise similarities between $n$ multiple databases to generate and evaluate $m\in[1, (n^2-n)/2]$ candidate clusterings in order to select the ideal partitioning which optimizes a predefined goodness measure. However, when these pairwise similarities are distributed around the mean value, the clustering algorithm becomes indecisive when choosing what database pairs are considered eligible to be grouped together. Consequently, a trivial result is produced by putting all the $n$ databases in one cluster or by returning $n$ singleton clusters. To tackle the latter problem, we propose a learning algorithm to reduce the fuzziness in the similarity matrix by minimizing a weighted binary entropy loss function via gradient descent and back-propagation. As a result, the learned model will improve the certainty of the clustering algorithm by correctly identifying the optimal database clusters. Additionally, in contrast to gradient-based clustering algorithms which are sensitive to the choice of the learning rate and require more iterations to converge, we propose a learning-rate-free algorithm to assess the candidate clusterings generated on the fly in a fewer upper-bounded iterations. Through a series of experiments on multiple database samples, we show that our algorithm outperforms the existing clustering algorithms for MDM.

Download Full-text

GNN-DBSCAN: A new density-based algorithm using grid and the nearest neighbor

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211922 ◽

2021 ◽

pp. 1-13

Author(s):

Li Yihong ◽

Wang Yunpeng ◽

Li Tao ◽

Lan Xiaolong ◽

Song Han

Keyword(s):

Mutual Information ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Adjusted Rand Index ◽

K Nearest Neighbors ◽

Normalized Mutual Information ◽

Core Samples ◽

Real World Datasets

DBSCAN (density-based spatial clustering of applications with noise) is one of the most widely used density-based clustering algorithms, which can find arbitrary shapes of clusters, determine the number of clusters, and identify noise samples automatically. However, the performance of DBSCAN is significantly limited as it is quite sensitive to the parameters of eps and MinPts. Eps represents the eps-neighborhood and MinPts stands for a minimum number of points. Additionally, a dataset with large variations in densities will probably trap the DBSCAN because its parameters are fixed. In order to overcome these limitations, we propose a new density-clustering algorithm called GNN-DBSCAN which uses an adaptive Grid to divide the dataset and defines local core samples by using the Nearest Neighbor. With the help of grid, the dataset space will be divided into a finite number of cells. After that, the nearest neighbor lying in every filled cell and adjacent filled cells are defined as the local core samples. Then, GNN-DBSCAN obtains global core samples by enhancing and screening local core samples. In this way, our algorithm can identify higher-quality core samples than DBSCAN. Lastly, give these global core samples and use dynamic radius based on k-nearest neighbors to cluster the datasets. Dynamic radius can overcome the problems of DBSCAN caused by its fixed parameter eps. Therefore, our method can perform better on dataset with large variations in densities. Experiments on synthetic and real-world datasets were conducted. The results indicate that the average Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI) and V-measure of our proposed algorithm outperform the existing algorithm DBSCAN, DPC, ADBSCAN, and HDBSCAN.

Download Full-text

Clustering Algorithm for Human Behavior Recognition Based on Biosignal Analysis

Human Behavior Recognition Technologies ◽

10.4018/978-1-4666-3682-8.ch010 ◽

2013 ◽

pp. 212-224

Author(s):

Neuza Nunes ◽

Diliana Rebelo ◽

Rodolfo Abreu ◽

Hugo Gamboa ◽

Ana Fred

Keyword(s):

Time Series ◽

Human Behavior ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Behavior Recognition ◽

Biosignal Analysis ◽

Synthetic Signal ◽

The Mean ◽

Cepstral Coefficients ◽

Human Behavior Recognition

Time series unsupervised clustering is accurate in various domains, and there is an increased interest in time series clustering algorithms for human behavior recognition. The authors have developed an algorithm for biosignals clustering, which captures the general morphology of a signal’s cycles in one mean wave. In this chapter, they further validate and consolidate it and make a quantitative comparison with a state-of-the-art algorithm that uses distances between data’s cepstral coefficients to cluster the same biosignals. They are able to successfully replicate the cepstral coefficients algorithm, and the comparison showed that the mean wave approach is more accurate for the type of signals analyzed, having a 19% higher accuracy value. They authors also test the mean wave algorithm with biosignals with three different activities in it, and achieve an accuracy of 96.9%. Finally, they perform a noise immunity test with a synthetic signal and notice that the algorithm remains stable for signal-to-noise ratios higher than 2, only decreasing its accuracy with noise of amplitude equal to the signal. The necessary validation tests performed in this study confirmed the high accuracy level of the developed clustering algorithm for biosignals that express human behavior.

Download Full-text

Applying Improved Clustering Algorithm into EC Environment Data Mining

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.596.951 ◽

2014 ◽

Vol 596 ◽

pp. 951-959 ◽

Cited By ~ 2

Author(s):

Yu Peng Ma ◽

Bo Ma ◽

Tong Hai Jiang

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Service Providers ◽

Clustering Algorithms ◽

Customer Segmentation ◽

Data Mining Technique ◽

Directed Learning ◽

On Line ◽

Browsing Behavior ◽

Learning Data

With the rising growth of electronic commerce (EC) customers, EC service providers are keen to analyze the on-line browsing behavior of the customers in their web site and learn their specific features. Clustering is a popular non-directed learning data mining technique for partitioning a dataset into a set of clusters. Although there are many clustering algorithms, none is superior for the task of customer segmentation. This suggests that a proper clustering algorithm should be generated for EC environment. In this paper we are concerned with the situation and proposed an improved k-means algorithm, which is effective to exclude the noisy data and improve the clustering accuracy. The experimental results performed on real EC environment are provided to demonstrate the effectiveness and feasibility of the proposed approach.

Download Full-text

Estimating the risk of lung cancer and cardiac mortality from doses to the lung and heart from modern tangent-only breast radiotherapy

Journal of Radiotherapy in Practice ◽

10.1017/s1460396918000080 ◽

2018 ◽

Vol 17 (3) ◽

pp. 260-265

Author(s):

Loukas A. Georgiou ◽

Adam F. Farmer

Keyword(s):

Breast Cancer ◽

Lung Cancer ◽

Heart Disease ◽

Collaborative Group ◽

Cardiac Mortality ◽

Breast Radiotherapy ◽

Heart Dose ◽

Treatment Techniques ◽

The Mean ◽

Whole Heart

AbstractPurposeThe Early Breast Cancer Trialists’ Collaborative Group (EBCTCG) reported that the risks of breast cancer treatment in woman smokers may outweigh the benefits. The data used doses from published reports using a variety of treatment techniques. In our study, the risks of lung cancer and heart disease were determined from a modern era tangential-only technique.Methods and materialsDoses to the lung and heart were obtained for tangential radiotherapy to the breast or chest wall. The risk of lung cancer incidence and cardiac mortality were calculated by taking the ratio of our doses to those published by the EBCTG.ResultsA total of 77 women were identified meeting our inclusion criteria. The mean combined whole lung dose was 2·0 Gy. The mean whole heart dose was 0·9 Gy. The estimated risk of lung cancer and cardiac mortality in a 50-year-old life-long smoker was estimated to be 1·5 and <1%, respectively.ConclusionsTangential only radiotherapy delivered substantially lower doses to the combined whole lung and whole heart than those reported by the EBCTCG. In this cohort, the risks of radiation induced lung cancer and heart disease are outweighed by the benefits of radiotherapy even in those that are smokers.

Download Full-text

Online Education and Wireless Network Coordination of Electronic Music Creation and Performance under Artificial Intelligence

Wireless Communications and Mobile Computing ◽

10.1155/2021/5999152 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Ning Xu ◽

Yuanyuan Zhao

Keyword(s):

Artificial Intelligence ◽

Online Education ◽

Wireless Network ◽

Electronic Music ◽

Clustering Algorithm ◽

Mean Squared Error ◽

Clustering Algorithms ◽

Music Performance ◽

The Mean ◽

And Performance

This paper is aimed at studying the online education and wireless network collaboration on electronic music creation and performance under artificial intelligence (AI). This paper uses a fuzzy clustering algorithm (FCA), designs the sensor network-related equipment, and uses AI to design an electronic music creation system. The analysis of simulation experiments suggests that under the premise of increasing the number of neighbors, the Mean Absolute Error (MAE) and Mean Squared Error (MSE) of collaborative filtering and fuzzy C -means clustering algorithms show a downward trend. However, with the same number of neighbors, the filtering matching algorithm is greater than FCA regarding the mean values of MAE and MSE. Meanwhile, on the electronic music performance system of AI, the digital module is designed and the sound data are imaged on the oscilloscope, and the collaboration of electronic music online education and wireless network is completed. The following conclusion is drawn: modularizing the creative mode of intelligent electronic music has achieved higher computational efficiency. Through the oscilloscope, the sound feature is converted into the image structure, and the corresponding sound and image mode is formed, which realizes the purpose of online electronic music intelligent matching and optimizes the effect of online education. In the AI environment, the matching degree of verification electronic music curriculum resources is better than traditional matching algorithms, and the accuracy is higher.

Download Full-text

Effect of the number of bone lesions on efficacy of zoledronic acid for prevention of skeletal-related events (SREs) in patients with bone metastases from solid tumors

Journal of Clinical Oncology ◽

10.1200/jco.2006.24.18_suppl.8529 ◽

2006 ◽

Vol 24 (18_suppl) ◽

pp. 8529-8529 ◽

Cited By ~ 2

Author(s):

N. Shirina ◽

R. E. Coleman ◽

Y. M. Chen

Keyword(s):

Breast Cancer ◽

Prostate Cancer ◽

Lung Cancer ◽

Zoledronic Acid ◽

Bone Metastases ◽

Cancer Patients ◽

Solid Tumors ◽

Morbidity Rate ◽

Bone Lesions ◽

The Mean

8529 Background: It has been postulated that greater numbers of bone metastases and thus greater tumor burden may lead to increased skeletal morbidity. To assess the effect that the number of baseline bone metastases may have on the efficacy of zoledronic acid in patients with solid tumors, we conducted a retrospective analysis of 3 large, randomized, controlled trials. Methods: Data were evaluated from the intent-to-treat population with breast cancer (n = 739), prostate cancer (n = 397), or lung cancer and other solid tumors (n = 480) who were treated with zoledronic acid 4 mg, pamidronate 90 mg, or placebo and had information available on number of baseline bone lesions. Patients were stratified into 2 groups: those with ≤ 3 bone lesions or > 3 lesions. Results: In general, patients with > 3 lesions had a higher skeletal morbidity rate (SMR) compared with patients with ≤ 3 lesions (Table 1), and zoledronic acid reduced SREs regardless of the number of bone lesions, but the benefit of zoledronic acid appeared greater in patients with > 3 lesions. In patients with lung cancer and other solid tumors who had > 3 bone lesions, zoledronic acid significantly reduced the mean SMR (P = .008) and significantly prolonged time to first SRE (median, 171 vs 84 day; P = .005) compared with placebo. In prostate cancer patients with > 3 bone lesions, zoledronic acid also significantly reduced the mean SMR compared with placebo (Table 1). In breast cancer patients with > 3 bone lesions, the mean SMRs were similar for zoledronic acid and pamidronate groups (Table 1). Conclusions: Patients with a greater number of bone lesions are at higher risk for skeletal complications and receive greater clinical benefit from treatment with zoledronic acid. [Table: see text] [Table: see text]

Download Full-text