document cluster
Recently Published Documents


TOTAL DOCUMENTS

18
(FIVE YEARS 3)

H-INDEX

6
(FIVE YEARS 1)

2021 ◽  
pp. 594-607
Author(s):  
Chandrakant D. Patel ◽  
Jayeshkumar M. Patel

Text classification, document cluster, and similar endeavors area unit important analysis spaces underpinning a large vary of problems and applications within the area of internet intelligence. A typical approach, once cluster or otherwise classifying documents algorithmically, is to represent an online document as a vector of numbers derived from the frequencies of words therein document, wherever the words area unit taken from the complete assortment of documents into consideration. A tried approach is to use an inventory of “stop” words (also known as a ‘stop list’), that establish frequent words like ‘and’ and ‘or’ which might be unlikely to Words within the stop list area unit thus not enclosed within the vector that represents a document. Current stop lists area unit arguably noncurrent within the lightweight of fluctuations in word usage, and conjointly arguably innocent of sure candidate generic stop lists, thus delivery into question that we tend to explore this by developing new stop lists in an exceedingly rule based method.


2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Hua Dai ◽  
Xuelong Dai ◽  
Xiao Li ◽  
Xun Yi ◽  
Fu Xiao ◽  
...  

In the interest of privacy concerns, cloud service users choose to encrypt their personal data before outsourcing them to cloud. However, it is difficult to achieve efficient search over encrypted cloud data. Therefore, how to design an efficient and accurate search scheme over large-scale encrypted cloud data is a challenge. In this paper, we integrate bisecting k-means algorithm and multibranch tree structure and propose the α-filtering tree search scheme based on bisecting k-means clusters. The novel index tree is built from bottom-up, and a greedy depth first algorithm is used for filtering the nonrelevant document cluster by calculating the relevance score between the filtering vector and the query vector. The α-filtering tree can improve the efficiency without the loss of search accuracy. The experiment on a real-world dataset demonstrates the effectiveness of our scheme.


Author(s):  
Ravi Kumar Saidala ◽  
Nagaraju Devarakonda

This article proposes a new optimal data clustering method for finding optimal clusters of data by incorporating chaotic maps into the standard NOA. NOA, a newly developed optimization technique, has been shown to be efficient in generating optimal results with lowest solution cost. The incorporation of chaotic maps into metaheuristics enables algorithms to diversify the solution space into two phases: explore and exploit more. To make the NOA more efficient and avoid premature convergence, chaotic maps are incorporated in this work, termed as CNOAs. Ten different chaotic maps are incorporated individually into standard NOA for testing the optimization performance. The CNOA is first benchmarked on 23 standard functions. Secondly, testing was done on the numerical complexity of the new clustering method which utilizes CNOA, by solving 10 UCI data cluster problems and 4 web document cluster problems. The comparisons have been made with the help of obtaining statistical and graphical results. The superiority of the proposed optimal clustering algorithm is evident from the simulations and comparisons.


Author(s):  
Katsuhiro Honda ◽  
◽  
Takaya Nakano ◽  
Chi-Hyon Oh ◽  
Seiki Ubukata ◽  
...  

The interpretability of fuzzy co-cluster partitions were shown to be improved by introducing exclusive penalties on both object and item memberships although the conventional fuzzy co-clustering adopted exclusive natures only on object memberships. In real applications, however, fully exclusive constraints may bring inappropriate influences to some items, and partially exclusive penalties should be forced reflecting the characteristics of each item. For example, in customer-product analysis, the degree of popularity of each product may be a measure of compatibility in multiple customer groups, and exclusive penalties should be forced only to some specific products. In this paper, the conventional exclusive constraint model is further modified by forcing exclusive penalties only to some selected items, and the effects of partially exclusive partition are demonstrated from the view points of not only partition quality but also collaborative filtering applicability. In a document-keyword analysis experiment, word class is shown to be useful for exclusively selecting keywords so that the interpretability of document cluster is improved. In a collaborative filtering experiment, the recommendation capability is demonstrated to be improved by considering intrinsic differences of popularity of each product.


2011 ◽  
Vol 6 (1) ◽  
pp. 43-58
Author(s):  
Fabiano Fernandes dos Santos ◽  
Veronica Oliveira de Carvalho ◽  
Solange Oliveira Rezende

Author(s):  
Zhang Xiaodan ◽  
Hu Xiaohua ◽  
Xia Jiali ◽  
Zhou Xiaohua ◽  
Achananuparp Palakorn

In this article, we present a graph-based knowledge representation for biomedical digital library literature clustering. An efficient clustering method is developed to identify the ontology-enriched k-highest density term subgraphs that capture the core semantic relationship information about each document cluster. The distance between each document and the k term graph clusters is calculated. A document is then assigned to the closest term cluster. The extensive experimental results on two PubMed document sets (Disease10 and OHSUMED23) show that our approach is comparable to spherical k-means. The contributions of our approach are the following: (1) we provide two corpus-level graph representations to improve document clustering, a term co-occurrence graph and an abstract-title graph; (2) we develop an efficient and effective document clustering algorithm by identifying k distinguishable class-specific core term subgraphs using terms’ global and local importance information; and (3) the identified term clusters give a meaningful explanation for the document clustering results.


Sign in / Sign up

Export Citation Format

Share Document