scholarly journals Self-Tuned Descriptive Document Clustering using a Predictive Network

Author(s):  
K. Syed Kousar Niasi ◽  
P. Sidheshwari

Document network is defined as a collection of documents that are connected by links. Document clustering become ubiquitous nowadays due to the widespread use of online databases, such as academic search engines. Topic modeling has become a widely used tool for document management because of its superior performance. However, there are few topic models differentiate the importance of documents on different topics. In this survey, can implement text rank algorithms of documents to improve topic modeling and propose to incorporate link based ranking into topic modeling. Text summarization provides an important role in information retrieval. Snippets generated by web search engines for every query result is an application of text summarization. Existing text summarization techniques shows that the indexing is done on the basis of the words present in the document and consists of an array of the posting lists. Document features such as term frequency, text length are used to allocate indexing weight to words. Specifically, topical rank is used to compute the subject stage rating of files, which indicates the significance of documents on special topics. By taking flight the topical ranking of a file as the opportunity of the record concerned in corresponding subject matter, a generalized relation is created between ranking and subject matter modeling. In this thesis, can implement topic discovery model for large number of medical database. The datasets are trained and extract the key terms based text mining and fuzzy latent semantic analysis (FLSA), a novel approach in topic modeling using fuzzy perspective. FLSA can maintain health & medical corpora redundancy problem and provides a new method to estimate the number of topics.

2021 ◽  
Vol 172 ◽  
pp. 114652
Author(s):  
Nabil Alami ◽  
Mohammed Meknassi ◽  
Noureddine En-nahnahi ◽  
Yassine El Adlouni ◽  
Ouafae Ammor

2020 ◽  
Vol 13 (44) ◽  
pp. 4474-4482
Author(s):  
Vasantha Kumari Garbhapu ◽  

Objective: To compare the topic modeling techniques, as no free lunch theorem states that under a uniform distribution over search problems, all machine learning algorithms perform equally. Hence, here, we compare Latent Semantic Analysis (LSA) or Latent Dirichlet Allocation (LDA) to identify better performer for English bible data set which has not been studied yet. Methods: This comparative study divided into three levels: In the first level, bible data was extracted from the sources and preprocessed to remove the words and characters which were not useful to obtain the semantic structures or necessary patterns to make the meaningful corpus. In the second level, the preprocessed data were converted into a bag of words and numerical statistic TF-IDF (Term Frequency – Inverse Document Frequency) is used to assess how relevant a word is to a document in a corpus. In the third level, Latent Semantic analysis and Latent Dirichlet Allocations methods were applied over the resultant corpus to study the feasibility of the techniques. Findings: Based on our evaluation, we observed that the LDA achieves 60 to 75% superior performance when compared to LSA using document similarity within-corpus, document similarity with the unseen document. Additionally, LDA showed better coherence score (0.58018) than LSA (0.50395). Moreover, when compared to any word within-corpus, the word association showed better results with LDA. Some words have homonyms based on the context; for example, in the bible; bear has a meaning of punishment and birth. In our study, LDA word association results are almost near to human word associations when compared to LSA. Novelty: LDA was found to be the computationally efficient and interpretable method in adopting the English Bible dataset of New International Version that was not yet created. Keywords: Topic modeling; LSA; LDA; word association; document similarity;Bible data set


Author(s):  
Radha Guha

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.


Author(s):  
Tiffany Renteria-Vazquez ◽  
Warren S. Brown ◽  
Christine Kang ◽  
Mark Graves ◽  
Fulvia Castelli ◽  
...  

2018 ◽  
Vol 10 (11) ◽  
pp. 112
Author(s):  
Jialu Xu ◽  
Feiyue Ye

With the explosion of web information, search engines have become main tools in information retrieval. However, most queries submitted in web search are ambiguous and multifaceted. Understanding the queries and mining query intention is critical for search engines. In this paper, we present a novel query recommendation algorithm by combining query information and URL information which can get wide and accurate query relevance. The calculation of query relevance is based on query information by query co-concurrence and query embedding vector. Adding the ranking to query-URL pairs can calculate the strength between query and URL more precisely. Empirical experiments are performed based on AOL log. The results demonstrate the effectiveness of our proposed query recommendation algorithm, which achieves superior performance compared to other algorithms.


Author(s):  
Nina Rizun

In this chapter, the authors present the results of the development the text-mining methodology for increasing the reliability of the functioning of Socio-technical System (STS). Taking into account revealed strengths and weaknesses of Discriminant and Probabilistic approaches of Latent Semantic Relations analysis in of the abstracting and summarization projection, the Methodology of Two-level Single Document Summarization was developed. The Methodology assumes the following elements of novelty: based on obtaining a multi-level topical framework of the document (abstracting); uses the synergy effect of consistent usage the combination of two approaches for identification of conceptually significant elements of the text (summarization). The examples demonstrating the basic workability of proposed Methodology were presented. Such approaches should help human to increase the quality of supporting the decision-making processes of STS in real time.


2020 ◽  
Author(s):  
Mala Saraswat ◽  
Shampa Chakraverty

Abstract With the advent of e-commerce sites and social media, users express their preferences and tastes freely through user-generated content such as reviews and comments. In order to promote cross-selling, e-commerce sites such as eBay and Amazon regularly use such inputs from multiple domains and suggest items with which users may be interested. In this paper, we propose a topic coherence-based cross-domain recommender model. The core concept is to use topic modeling to extract topics from user-generated content such as reviews and combine them with reliable semantic coherence techniques to link different domains, using Wikipedia as a reference corpus. We experiment with different topic coherence methods such as pointwise mutual information (PMI) and explicit semantic analysis (ESA). Experimental results presented demonstrate that our approach, using PMI as topic coherence, yields 22.6% and using ESA yields 54.4% higher precision as compared with cross-domain recommender system based on semantic clustering.


2013 ◽  
Vol 22 (05) ◽  
pp. 1360008 ◽  
Author(s):  
PATRICIA J. CROSSNO ◽  
ANDREW T. WILSON ◽  
TIMOTHY M. SHEAD ◽  
WARREN L. DAVIS ◽  
DANIEL M. DUNLAVY

We present a new approach for analyzing topic models using visual analytics. We have developed TopicView, an application for visually comparing and exploring multiple models of text corpora, as a prototype for this type of analysis tool. TopicView uses multiple linked views to visually analyze conceptual and topical content, document relationships identified by models, and the impact of models on the results of document clustering. As case studies, we examine models created using two standard approaches: Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Conceptual content is compared through the combination of (i) a bipartite graph matching LSA concepts with LDA topics based on the cosine similarities of model factors and (ii) a table containing the terms for each LSA concept and LDA topic listed in decreasing order of importance. Document relationships are examined through the combination of (i) side-by-side document similarity graphs, (ii) a table listing the weights for each document's contribution to each concept/topic, and (iii) a full text reader for documents selected in either of the graphs or the table. The impact of LSA and LDA models on document clustering applications is explored through similar means, using proximities between documents and cluster exemplars for graph layout edge weighting and table entries. We demonstrate the utility of TopicView's visual approach to model assessment by comparing LSA and LDA models of several example corpora.


Sign in / Sign up

Export Citation Format

Share Document