scholarly journals Modified Firefly Algorithm and Fuzzy C-Mean Clustering Based Semantic Information Retrieval

Author(s):  
M. Subramaniam ◽  
A. Kathirvel ◽  
E. Sabitha ◽  
H. Anwar Basha

As enormous volume of electronic data increased gradually, searching as well as retrieving essential info from the internet is extremely difficult task.  Normally, the Information Retrieval (IR) systems present info dependent upon the user’s query keywords. At present, it is insufficient as large volume of online data and it contains less precision as the system takes syntactic level search into consideration. Furthermore, numerous previous search engines utilize a variety of techniques for semantic based document extraction and the relevancy between the documents has been measured using page ranking methods. On the other hand, it contains certain problems with searching time.   With the intention of enhancing the query searching time, the research system implemented a Modified Firefly Algorithm (MFA) adapted with Intelligent Ontology and Latent Dirichlet Allocation based Information Retrieval (IOLDAIR) model. In this recommended methodology, the set of web documents, Face book comments and tweets are taken as dataset.  By means of utilizing Tokenization process, the dataset pre-processing is carried out. Strong ontology is built dependent upon a lot of info collected by means of referring via diverse websites. Find out the keywords as well as carry out semantic analysis with user query by utilizing ontology matching by means of jaccard similarity. The feature extraction is carried out dependent upon the semantic analysis. After that, by means of Modified Firefly Algorithm (MFA), the ideal features are chosen. With the help of Fuzzy C-Mean (FCM) clustering, the appropriate documents are grouped and rank them. At last by using IOLDAIR model, the appropriate information’s are extracted. The major benefit of the research technique is the raise in relevancy, capability of dealing with big data as well as fast retrieval.  The experimentation outcomes prove that the presented method attains improved performance when matched up with the previous system.

Author(s):  
Radha Guha

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.


Natural Language Processing uses word embeddings to map words into vectors. Context vector is one of the techniques to map words into vectors. The context vector gives importance of terms in the document corpus. The derivation of context vector is done using various methods such as neural networks, latent semantic analysis, knowledge base methods etc. This paper proposes a novel system to devise an enhanced context vector machine called eCVM. eCVM is able to determine the context phrases and its importance. eCVM uses latent semantic analysis, existing context vector machine, dependency parsing, named entities, topics from latent dirichlet allocation and various forms of words like nouns, adjectives and verbs for building the context. eCVM uses context vector and Pagerank algorithm to find the importance of the term in document and is tested on BBC news dataset. Results of eCVM are compared with compared with the state of the art for context detrivation. The proposed system shows improved performance over existing systems for standard evaluation parameters.


2019 ◽  
Vol 12 (1) ◽  
pp. 105-116
Author(s):  
Qiuyu Zhu ◽  
Dongmei Li ◽  
Cong Dai ◽  
Qichen Han ◽  
Yi Lin

With the rapid development of the Internet, the information retrieval model based on the keywords matching algorithm has not met the requirements of users, because people with various query history always have different retrieval intentions. User query history often implies their interests. Therefore, it is of great importance to enhance the recall ratio and the precision ratio by applying query history into the judgment of retrieval intentions. For this sake, this article does research on user query history and proposes a method to construct user interest model utilizing query history. Coordinately, the authors design a model called PLSA-based Personalized Information Retrieval with Network Regularization. Finally, the model is applied into academic information retrieval and the authors compare it with Baidu Scholar and the personalized information retrieval model based on the probabilistic latent semantic analysis topic model. The experiment results prove that this model can effectively extract topics and retrieves back results more satisfied for users' requirements. Also, this model improves the effect of retrieval results apparently. In addition, the retrieval model can be utilized not only in the academic information retrieval, but also in the personalized information retrieval on microblog search, associate recommendation, etc.


Author(s):  
Junaid Rashid ◽  
Syed Muhammad Adnan Shah ◽  
Aun Irtaza

Topic modeling is an effective text mining and information retrieval approach to organizing knowledge with various contents under a specific topic. Text documents in form of news articles are increasing very fast on the web. Analysis of these documents is very important in the fields of text mining and information retrieval. Meaningful information extraction from these documents is a challenging task. One approach for discovering the theme from text documents is topic modeling but this approach still needs a new perspective to improve its performance. In topic modeling, documents have topics and topics are the collection of words. In this paper, we propose a new k-means topic modeling (KTM) approach by using the k-means clustering algorithm. KTM discovers better semantic topics from a collection of documents. Experiments on two real-world Reuters 21578 and BBC News datasets show that KTM performance is better than state-of-the-art topic models like LDA (Latent Dirichlet Allocation) and LSA (Latent Semantic Analysis). The KTM is also applicable for classification and clustering tasks in text mining and achieves higher performance with a comparison of its competitors LDA and LSA.


Information retrieval systems are used to retrieve documents based on the keyword search. Semantic-based information retrieval is beyond standard information retrieval and uses related information to get the documents from the corpus. But semantic retrieval based documents is not efficient enough in real time. Content from the user’s profile is used for searching the web documents. The documents which exactly matches the user requirement is retrieved and it improvises the personalized retrieval. In this paper, a methodology based on topic modelling is proposed to determine the retrieval of information for user to increase the accuracy of documents using Latent Dirichlet Allocation (LDA) and Weighted Nearest Neighbor (WNN) models. LDA model is developed to retrieve documents based on topics. The topic based retrieval is improvised using personalization technique which uses WNN model. Experimental analysis on building personalization and semantic retrieval of documents shows the improved precision compared to existing topic modeling.


Author(s):  
Priyanka R. Patil ◽  
Shital A. Patil

Similarity View is an application for visually comparing and exploring multiple models of text and collection of document. Friendbook finds ways of life of clients from client driven sensor information, measures the closeness of ways of life amongst clients, and prescribes companions to clients if their ways of life have high likeness. Roused by demonstrate a clients day by day life as life records, from their ways of life are separated by utilizing the Latent Dirichlet Allocation Algorithm. Manual techniques can't be utilized for checking research papers, as the doled out commentator may have lacking learning in the exploration disciplines. For different subjective views, causing possible misinterpretations. An urgent need for an effective and feasible approach to check the submitted research papers with support of automated software. A method like text mining method come to solve the problem of automatically checking the research papers semantically. The proposed method to finding the proper similarity of text from the collection of documents by using Latent Dirichlet Allocation (LDA) algorithm and Latent Semantic Analysis (LSA) with synonym algorithm which is used to find synonyms of text index wise by using the English wordnet dictionary, another algorithm is LSA without synonym used to find the similarity of text based on index. LSA with synonym rate of accuracy is greater when the synonym are consider for matching.


2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Anis Zouaghi ◽  
Mounir Zrigui ◽  
Georges Antoniadis ◽  
Laroussi Merhbene

We propose a new approach for determining the adequate sense of Arabic words. For that, we propose an algorithm based on information retrieval measures to identify the context of use that is the closest to the sentence containing the word to be disambiguated. The contexts of use represent a set of sentences that indicates a particular sense of the ambiguous word. These contexts are generated using the words that define the senses of the ambiguous words, the exact string-matching algorithm, and the corpus. We use the measures employed in the domain of information retrieval, Harman, Croft, and Okapi combined to the Lesk algorithm, to assign the correct sense of those proposed.


2021 ◽  
pp. 1-11
Author(s):  
V.S. Anoop ◽  
P. Deepak ◽  
S. Asharaf

Online social networks are considered to be one of the most disruptive platforms where people communicate with each other on any topic ranging from funny cat videos to cancer support. The widespread diffusion of mobile platforms such as smart-phones causes the number of messages shared in such platforms to grow heavily, thus more intelligent and scalable algorithms are needed for efficient extraction of useful information. This paper proposes a method for retrieving relevant information from social network messages using a distributional semantics-based framework powered by topic modeling. The proposed framework combines the Latent Dirichlet Allocation and distributional representation of phrases (Phrase2Vec) for effective information retrieval from online social networks. Extensive and systematic experiments on messages collected from Twitter (tweets) show this approach outperforms some state-of-the-art approaches in terms of precision and accuracy and better information retrieval is possible using the proposed method.


1984 ◽  
Vol 8 (2) ◽  
pp. 63-66 ◽  
Author(s):  
C.P.R. Dubois

The controlled vocabulary versus the free text approach to information retrieval is reviewed from the mid 1960s to the early 1980s. The dominance of the free text approach following the Cranfield tests is increasingly coming into question as a result of tests on existing online data bases and case studies. This is supported by two case studies on the Coffeeline data base. The differences and values of the two approaches are explored considering thesauri as semantic maps. It is suggested that the most appropriate evaluatory technique for indexing languages is to study the actual use made of various techniques in a wide variety of search environments. Such research is becoming more urgent. Economic and other reasons for the scarcity of online thesauri are reviewed and suggestions are made for methods to secure revenue from thesaurus display facilities. Finally, the promising outlook for renewed develop ment of controlled vocabularies with more effective online display techniques is mentioned, although such development must be based on firm research of user behaviour and needs.


Sign in / Sign up

Export Citation Format

Share Document