Hierarchical and K-means Clustering in the Line Drawing Data Shape Using Procrustes Analysis

2021 ◽  
Vol 5 (3) ◽  
pp. 306
Author(s):  
Ridho Ananda ◽  
Agi Prasetiadi

One of the problems in the clustering process is that the objects under inquiry are multivariate measures containing geometrical information that requires shape clustering. Because Procrustes is a technique to obtaining the similarity measure of two shapes, it can become the solution. Therefore, this paper tried to use Procrustes as the main process in the clustering method. Several algorithms proposed for the shape clustering process using Procrustes were namely hierarchical the goodness-of-fit of Procrustes (HGoFP), k-means the goodness-of-fit of Procrustes (KMGoFP), hierarchical ordinary Procrustes analysis (HOPA), and k-means ordinary Procrustes analysis (KMOPA). Those algorithms were evaluated using Rand index, Jaccard index, F-measure, and Purity. Data used was the line drawing dataset that consisted of 180 drawings classified into six clusters. The results showed that the HGoFP, KMGoFP, HOPA and KMOPA algorithms were good enough in Rand index, F-measure, and Purity with 0.697 as a minimum value. Meanwhile, the good clustering results in the Jaccard index were only the HGoFP, KMGoFP, and HOPA algorithms with 0.561 as a minimum value. KMGoFP has the worst result in the Jaccard index that is about 0.300. In the time complexity, the fastest algorithm is the HGoFP algorithm; the time complexity is 4.733. Based on the results, the algorithms proposed in this paper particularly deserve to be proposed as new algorithms to cluster the objects in the line drawing dataset. Then, the HGoFP is suggested clustering the objects in the dataset used.

Author(s):  
Rui Zhang ◽  
Christian Walder ◽  
Marian-Andrei Rizoiu ◽  
Lexing Xie

In this paper, we develop an efficient non-parametric Bayesian estimation of the kernel function of Hawkes processes. The non-parametric Bayesian approach is important because it provides flexible Hawkes kernels and quantifies their uncertainty. Our method is based on the cluster representation of Hawkes processes. Utilizing the stationarity of the Hawkes process, we efficiently sample random branching structures and thus, we split the Hawkes process into clusters of Poisson processes. We derive two algorithms --- a block Gibbs sampler and a maximum a posteriori estimator based on expectation maximization --- and we show that our methods have a linear time complexity, both theoretically and empirically. On synthetic data, we show our methods to be able to infer flexible Hawkes triggering kernels. On two large-scale Twitter diffusion datasets, we show that our methods outperform the current state-of-the-art in goodness-of-fit and that the time complexity is linear in the size of the dataset. We also observe that on diffusions related to online videos, the learned kernels reflect the perceived longevity for different content types such as music or pets videos.


2021 ◽  
Author(s):  
E. Elakiya ◽  
R. Kanagaraj ◽  
N. Rajkumar

In every moment, there is a huge capacity of data and information communicated through social network. Analyzing huge amounts of text data is very tedious, time consuming, expensive and manual sorting leads to mistakes and inconsistency. Document dispensation phase is still not accomplished of extracting data as a human reader. Furthermore the significance of content in the text may also differ from one reader to another. The proposed Multiple Spider Hunting Algorithm has been used to diminish the time complexity in compare with single spider move with multiple spiders. The construction of spider is dynamic depends on the volume of a corpus. In some case tokens may related to more than one topic and there is a need to detect Topic on semantic way. Multiple Semantic Spider Hunting Algorithm is proposed based on the semantics among terms and association can be drawn between words using semantic lexicons. Topic or lists of opinions are generated from the knowledge graph. News articles are gathered from five dissimilar topics such as sports, business, education, tourism and media. Usefulness of the proposed algorithms have been calculated based on the factors precision, recall, f-measure, accuracy, true positive, false positive and topic detection percentage. Multiple Semantic Spider Hunting Algorithm produced good result. Topic detection percentage of Spider Hunting Algorithm has been compared to other algorithms Naïve bayes, Neural Network, Decision tree and Particle Swarm Optimization. Spider Hunting Algorithm produced more than 90% precise detection of topic and subtopic.


2018 ◽  
Vol 1 (01) ◽  
pp. 09-13
Author(s):  
Sufiatul Maryana ◽  
Lita Karlitasari

The library of Faculty of Mathematics and Natural Science (FMIPA) has a collection of books and other print media, total of 2,678 books with 7237 visitors and 2148 borrowers. The available book search system was very helpful for visitors to find the required books. Especially if the system has features recommended of books. In the provision of book recommendations used one of the data mining techniques, namely association rule mining techniques or excavation of association rules. In the development of this recommendation system, KDD (Knowledge Discovery from Database) model was used. The data used was the transaction history of borrowing book with the category of "chemistry", for the last 5 (five) months, that is September 2014 - February 2015. The excavation technique of this association rule has 2 (two) main process, they are: frequent patterns and rules. To find frequent patterns, a CT-PRO algorithm was used. The minimum value of support used was 1 and 2. Once the pattern is found, the confidence value of each pattern was calculated. The minimum value of confidence used ranges from 10% to 100%. The recommendation rule was based on calculating the value of this confidence. The comparison of minimum support values indicates that the greater value of minimum support then the less borrowing pattern was generated, and vice versa. The comparison of minimum confidence value shows that the greater of minimum confidence value then the less recommended rule given.. Keywords: Library, Recommended System, Knowledge Discovery from Database (KDD), Association Rule Mining, CT-PRO Algorithm


2020 ◽  
Vol 13 (4) ◽  
pp. 694-705
Author(s):  
K.R. Kosala Devi ◽  
V. Deepa

Background: Congenital Heart Disease is one of the abnormalities in your heart's structure. To predict the tetralogy of fallot in a heart is a difficult task. Cluster is the collection of data objects, which are similar to one another within the same group and are different from the objects in the other clusters. To detect the edges, the clustering mechanism improve its accuracy by using segmentation, Colour space conversion of an image implemented in Fuzzy c-Means with Edge and Local Information. Objective: To predict the tetralogy of fallot in a heart, the clustering mechanism is used. Fuzzy c-Means with Edge and Local Information gives an accuracy to detect the edges of a fallot to identify the congential heart disease in an efficient way. Methods: One of the finest image clustering methods, called as Fuzzy c-Means with Edge and Local Information which will introduce the weights for a pixel value to increase the edge detection accuracy value. It will identify the pixel value within its local neighbor windows to improve the exactness. For evaluation , the Adjusted rand index metrics used to achieve the accurate measurement. Results: The cluster metrics Adjusted rand index and jaccard index are used to evaluate the Fuzzy c- Means with Edge and Local Information. It gives an accurate results to identify the edges. By evaluating the clustering technique, the Adjusted Rand index, jaccard index gives the accurate values of 0.2, 0.6363, and 0.8333 compared to other clustering methods. Conclusion: Tetralogy of fallot accurately identified and gives the better performance to detect the edges. And also it will be useful to identify more defects in various heart diseases in a accurate manner. Fuzzy c-Means with Edge and Local Information and Gray level Co-occurrence matrix are more promising than other Clustering Techniques.


2012 ◽  
Vol 04 (01) ◽  
pp. 1250009
Author(s):  
TANAEEM M. MOOSA ◽  
M. SOHEL RAHMAN

In the point-set embeddability problem, we are given a plane graph G with n vertices and a point set S with the same number of points. Now the goal is to answer the question whether there exists a straight-line drawing of G such that each vertex is represented as a distinct point of S as well as to provide an embedding if one does exist. This problem has recently been solved in O(n2 log n) time for plane 3-trees. In this paper, we present a new efficient algorithm with time complexity O(n4/3+ϵ log n). We also present an O(nk4) time algorithm for the case when |S| = k > n. This is a significant improvement over the best algorithm for this case in the literature, which runs in O(nk8) time.


2019 ◽  
Vol 6 (4) ◽  
pp. 349
Author(s):  
Rimbun Siringoringo ◽  
Jamaluddin Jamaluddin

<p class="Abstrak"><span class="fontstyle01">Fuzzy C-Means (FCM) merupakan algoritma klastering  yang sangat baik dan lebih fleksibel dari algoritma klastering konvensional. Selain kelebihan tersebut, kelemahan utama algoritma ini adalah sensitif terhadap pusat klaster. Pusat klaster yang sensitif mengakibatkan hasil akhir sulit di kontrol dan FCM  mudah terjebak  pada optimum lokal. Untuk mengatasi masalah tersebut, penelitian ini memperbaiki kinerja FCM dengan menerapkan Particle Swarm Optimization (PSO) untuk menentukan pusat klaster yang lebih baik. Penelitian ini diterapkan pada klastering sentimen dengan menggunakan data berdimensi tinggi yaitu ulasan produk yang dikumpulkan dari beberapa situs toko </span><span class="fontstyle01"><em>online</em></span><span class="fontstyle01"> di Indonesia. Hasil penelitian menunjukkan bahwa penerapan PSO pada pembangkitan pusat klaster FCM dapat memperbaiki performa FCM serta memberikan luaran yang lebih sesuai. Performa klastering yang menjadi acuan  adalah </span><span class="fontstyle01"><em>Rand Index</em></span><span class="fontstyle01">, </span><span class="fontstyle01"><em>F-Measure</em></span><span class="fontstyle01"> dan </span><span class="fontstyle01"><em>Objective Function <span lang="IN">Value</span></em></span><span class="fontstyle01"> (OFV). Untuk keseluruhan performa tersebut, FCM-PSO memberikan hasil yang lebih baik dari FCM. Nilai OFV yang lebih baik menunjukkan bahwa FCM-PSO tersebut membutuhkan waktu konvergensi yang lebih cepat serta penanganan </span><span class="fontstyle01"><em>noise</em></span><span class="fontstyle01"> yang lebih baik.</span></p><p class="Abstrak"><span class="fontstyle01"><br /></span></p><p><strong><em>Abstract</em></strong></p><p><br /><em>Fuzzy C-Means (FCM) algorithm is one of the popular fuzzy clustering techniques. Compared with the hard clustering algorithm, FCM is more flexible and fair. However, FCM is significantly sensitive to the initial cluster center and easily trapped in a local optimum. To overcome this problem, this study proposes and improved FCM with Particle Swarm Optimization (PSO) algorithm to determine a better cluster center for high dimensional and unstructured sentiment clustering. This study uses product review data collected from several online shopping websites in Indonesia. Initial processing product review data consists of Case Folding, Non Alpha Numeric Removal, Stop Word Removal, and Stemming. PSO is applied for the determination of suite cluster center. Clustering performance criteria are Rand Index, F-Measure and Objective Function Value (OFV). The results showed that FCM-PSO can provide better performance compared to the conventional FCM in terms of Rand Index, F-measure and Objective Function Values (OFV). The better OFV value indicates that FCM-PSO requires faster convergence time and better noise handling.</em></p>


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ching-Wei Wang ◽  
Yi-An Liou ◽  
Yi-Jia Lin ◽  
Cheng-Chang Chang ◽  
Pei-Hsuan Chu ◽  
...  

AbstractEvery year cervical cancer affects more than 300,000 people, and on average one woman is diagnosed with cervical cancer every minute. Early diagnosis and classification of cervical lesions greatly boosts up the chance of successful treatments of patients, and automated diagnosis and classification of cervical lesions from Papanicolaou (Pap) smear images have become highly demanded. To the authors’ best knowledge, this is the first study of fully automated cervical lesions analysis on whole slide images (WSIs) of conventional Pap smear samples. The presented deep learning-based cervical lesions diagnosis system is demonstrated to be able to detect high grade squamous intraepithelial lesions (HSILs) or higher (squamous cell carcinoma; SQCC), which usually immediately indicate patients must be referred to colposcopy, but also to rapidly process WSIs in seconds for practical clinical usage. We evaluate this framework at scale on a dataset of 143 whole slide images, and the proposed method achieves a high precision 0.93, recall 0.90, F-measure 0.88, and Jaccard index 0.84, showing that the proposed system is capable of segmenting HSILs or higher (SQCC) with high precision and reaches sensitivity comparable to the referenced standard produced by pathologists. Based on Fisher’s Least Significant Difference (LSD) test (P < 0.0001), the proposed method performs significantly better than the two state-of-the-art benchmark methods (U-Net and SegNet) in precision, F-Measure, Jaccard index. For the run time analysis, the proposed method takes only 210 seconds to process a WSI and is 20 times faster than U-Net and 19 times faster than SegNet, respectively. In summary, the proposed method is demonstrated to be able to both detect HSILs or higher (SQCC), which indicate patients for further treatments, including colposcopy and surgery to remove the lesion, and rapidly processing WSIs in seconds for practical clinical usages.


2019 ◽  
Vol 16 (1) ◽  
pp. 87
Author(s):  
Ni Putu Mira Diantari Sadia ◽  
Agus Fredy Maradona

ABSTRACT               The purpose of this study was to determine the importance of the role of corporate value in the manufacturing industry of public companies in Indonesia. Specifically, this study intends to examine whether company size and ownership structure play a role in increasing the value of the company, especially through the capital structure. This study focuses on manufacturing companies in Indonesia that are listed on the Indonesia Stock Exchange (IDX). Determination of company samples in this study was carried out by purposive sampling method, with the criteria of manufacturing companies listed on the Indonesia Stock Exchange during the period of 2016, the data collection method used in this study was the method of documentation study. The data analysis method used is Path Analysis.               The results of the study, the minimum value of the company size was 12.74, while the maximum value was 30.87. The average value of company size is 23.98 with a standard deviation value of 4.87. Institutional ownership variables with a minimum value of 0.87 and a maximum value of 99.38 with an average value of 59.57 and a standard deviation of 29.22. The capital structure variable with a minimum value of 10.52 and the highest is 1658.82 with an average value of 133.98 and a standard standard deviation of 19.33. In the variable value of the company it is known that the minimum value is 0.14 and the highest is 62.78 with an average value of 3.35 and a standard deviation of 20.41. Noting the value of the cut-of-value and Goodness of fit results of the model, it appears that the criteria are good so that it is feasible to be used for further testing.


Sign in / Sign up

Export Citation Format

Share Document