Text Clustering Algorithm of Co-Occurrence Word Based on Association-Rule Mining

2014 ◽  
Vol 599-601 ◽  
pp. 1749-1752
Author(s):  
Chun Xia Jin ◽  
Hui Zhang ◽  
Qiu Chan Bai

According to the analysis of text feature, the document with co-occurrence words expresses very stronger and more accurately topic information. So this paper puts forward a text clustering algorithm of word co-occurrence based on association-rule mining. The method uses the association-rule mining to extract those word co-occurrences of expressing the topic information in the document. According to the co-occurrence words to build the modeling and co-occurrence word similarity measure, then this paper uses the hierarchical clustering algorithm based on word co-occurrence to realize text clustering. Experimental results show the method proposed in this paper improves the efficiency and accuracy of text clustering compared with other algorithms.

Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


Author(s):  
Meera Sharma ◽  
Abhishek Tandon ◽  
Madhu Kumari ◽  
V. B. Singh

Bug triaging is a process to decide what to do with newly coming bug reports. In this paper, we have mined association rules for the prediction of bug assignee of a newly reported bug using different bug attributes, namely, severity, priority, component and operating system. To deal with the problem of large data sets, we have taken subsets of data set by dividing the large data set using [Formula: see text]-means clustering algorithm. We have used an Apriori algorithm in MATLAB to generate association rules. We have extracted the association rules for top 5 assignees in each cluster. The proposed method has been empirically validated on 14,696 bug reports of Mozilla open source software project, namely, Seamonkey, Firefox and Bugzilla. In our approach, we observe that taking on these attributes (severity, priority, component and operating system) as antecedents, essential rules are more than redundant rules, whereas in [M. Sharma and V. B. Singh, Clustering-based association rule mining for bug assignee prediction, Int. J. Business Intell. Data Mining 11(2) (2017) 130–150.] essential rules are less than redundant rules in every cluster. The proposed method provides an improvement over the existing techniques for bug assignment problem.


2015 ◽  
Vol 6 (2) ◽  
Author(s):  
Rizal Setya Perdana ◽  
Umi Laili Yuhana

Kualitas perangkat lunak merupakan salah satu penelitian pada bidangrekayasa perangkat lunak yang memiliki peranan yang cukup besar dalamterbangunnya sistem perangkat lunak yang berkualitas baik. Prediksi defectperangkat lunak yang disebabkan karena terdapat penyimpangan dari prosesspesifikasi atau sesuatu yang mungkin menyebabkan kegagalan dalam operasionaltelah lebih dari 30 tahun menjadi topik riset penelitian. Makalah ini akandifokuskan pada prediksi defect yang terjadi pada kode program (code defect).Metode penanganan permasalahan defect pada kode program akan memanfaatkanpola-pola kode perangkat lunak yang berpotensi menimbulkan defect pada data setNASA untuk memprediksi defect. Metode yang digunakan dalam pencarian polaadalah memanfaatkan Association Rule Mining dengan Cumulative SupportThresholds yang secara otomatis menghasilkan nilai support dan nilai confidencepaling optimal tanpa membutuhkan masukan dari pengguna. Hasil pengujian darihasil pemrediksian defect kode perangkat lunak secara otomatis memiliki nilaiakurasi 82,35%.


Sign in / Sign up

Export Citation Format

Share Document