Modified Firefly Algorithm and Fuzzy C-Mean Clustering Based Semantic Information Retrieval

As enormous volume of electronic data increased gradually, searching as well as retrieving essential info from the internet is extremely difficult task. Normally, the Information Retrieval (IR) systems present info dependent upon the user’s query keywords. At present, it is insufficient as large volume of online data and it contains less precision as the system takes syntactic level search into consideration. Furthermore, numerous previous search engines utilize a variety of techniques for semantic based document extraction and the relevancy between the documents has been measured using page ranking methods. On the other hand, it contains certain problems with searching time. With the intention of enhancing the query searching time, the research system implemented a Modified Firefly Algorithm (MFA) adapted with Intelligent Ontology and Latent Dirichlet Allocation based Information Retrieval (IOLDAIR) model. In this recommended methodology, the set of web documents, Face book comments and tweets are taken as dataset. By means of utilizing Tokenization process, the dataset pre-processing is carried out. Strong ontology is built dependent upon a lot of info collected by means of referring via diverse websites. Find out the keywords as well as carry out semantic analysis with user query by utilizing ontology matching by means of jaccard similarity. The feature extraction is carried out dependent upon the semantic analysis. After that, by means of Modified Firefly Algorithm (MFA), the ideal features are chosen. With the help of Fuzzy C-Mean (FCM) clustering, the appropriate documents are grouped and rank them. At last by using IOLDAIR model, the appropriate information’s are extracted. The major benefit of the research technique is the raise in relevancy, capability of dealing with big data as well as fast retrieval. The experimentation outcomes prove that the presented method attains improved performance when matched up with the previous system.

Download Full-text

Designing a Chat-Bot for College Information using Information Retrieval and Automatic Text Summarization Techniques

Current Chinese Computer Science ◽

10.2174/2665997201999201022191540 ◽

2020 ◽

Vol 01 ◽

Author(s):

Radha Guha

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Text Summarization ◽

The Internet ◽

Specific Domain ◽

User Query ◽

College Information ◽

Chat Bot

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.

Download Full-text

Thematic Context Derivator Algorithm for Enhanced Context Vector Machine: eCVM

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4564.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 4872-4877

Keyword(s):

Language Processing ◽

Latent Semantic Analysis ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Named Entities ◽

Pagerank Algorithm ◽

Context Vector ◽

Improved Performance ◽

Evaluation Parameters ◽

Thematic Context

Natural Language Processing uses word embeddings to map words into vectors. Context vector is one of the techniques to map words into vectors. The context vector gives importance of terms in the document corpus. The derivation of context vector is done using various methods such as neural networks, latent semantic analysis, knowledge base methods etc. This paper proposes a novel system to devise an enhanced context vector machine called eCVM. eCVM is able to determine the context phrases and its importance. eCVM uses latent semantic analysis, existing context vector machine, dependency parsing, named entities, topics from latent dirichlet allocation and various forms of words like nouns, adjectives and verbs for building the context. eCVM uses context vector and Pagerank algorithm to find the importance of the term in document and is tested on BBC news dataset. Results of eCVM are compared with compared with the state of the art for context detrivation. The proposed system shows improved performance over existing systems for standard evaluation parameters.

Download Full-text

PLSA-Based Personalized Information Retrieval with Network Regularization

Journal of Information Technology Research ◽

10.4018/jitr.2019010108 ◽

2019 ◽

Vol 12 (1) ◽

pp. 105-116

Author(s):

Qiuyu Zhu ◽

Dongmei Li ◽

Cong Dai ◽

Qichen Han ◽

Yi Lin

Keyword(s):

Information Retrieval ◽

Semantic Analysis ◽

Topic Model ◽

Rapid Development ◽

Probabilistic Latent Semantic Analysis ◽

Retrieval Model ◽

User Interest ◽

Model Based ◽

User Query ◽

Academic Information

With the rapid development of the Internet, the information retrieval model based on the keywords matching algorithm has not met the requirements of users, because people with various query history always have different retrieval intentions. User query history often implies their interests. Therefore, it is of great importance to enhance the recall ratio and the precision ratio by applying query history into the judgment of retrieval intentions. For this sake, this article does research on user query history and proposes a method to construct user interest model utilizing query history. Coordinately, the authors design a model called PLSA-based Personalized Information Retrieval with Network Regularization. Finally, the model is applied into academic information retrieval and the authors compare it with Baidu Scholar and the personalized information retrieval model based on the probabilistic latent semantic analysis topic model. The experiment results prove that this model can effectively extract topics and retrieves back results more satisfied for users' requirements. Also, this model improves the effect of retrieval results apparently. In addition, the retrieval model can be utilized not only in the academic information retrieval, but also in the personalized information retrieval on microblog search, associate recommendation, etc.

Download Full-text

An Efficient Topic Modeling Approach for Text Mining and Information Retrieval through K-means Clustering

Mehran University Research Journal of Engineering and Technology ◽

10.22581/muet1982.2001.20 ◽

2020 ◽

Vol 39 (1) ◽

pp. 213-222

Author(s):

Junaid Rashid ◽

Syed Muhammad Adnan Shah ◽

Aun Irtaza

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Topic Modeling ◽

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

State Of The Art ◽

Text Documents ◽

New Perspective ◽

Better Than

Topic modeling is an effective text mining and information retrieval approach to organizing knowledge with various contents under a specific topic. Text documents in form of news articles are increasing very fast on the web. Analysis of these documents is very important in the fields of text mining and information retrieval. Meaningful information extraction from these documents is a challenging task. One approach for discovering the theme from text documents is topic modeling but this approach still needs a new perspective to improve its performance. In topic modeling, documents have topics and topics are the collection of words. In this paper, we propose a new k-means topic modeling (KTM) approach by using the k-means clustering algorithm. KTM discovers better semantic topics from a collection of documents. Experiments on two real-world Reuters 21578 and BBC News datasets show that KTM performance is better than state-of-the-art topic models like LDA (Latent Dirichlet Allocation) and LSA (Latent Semantic Analysis). The KTM is also applicable for classification and clustering tasks in text mining and achieves higher performance with a comparison of its competitors LDA and LSA.

Download Full-text

Semantic Retrieval of Web Documents using Topic Modeling Based Weighted Nearest Neighborhood Technique

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i7636.078919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 3178-3183

Keyword(s):

Information Retrieval ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Nearest Neighbor ◽

Keyword Search ◽

Semantic Retrieval ◽

User Requirement ◽

Web Documents ◽

Related Information ◽

Information Retrieval Systems

Information retrieval systems are used to retrieve documents based on the keyword search. Semantic-based information retrieval is beyond standard information retrieval and uses related information to get the documents from the corpus. But semantic retrieval based documents is not efficient enough in real time. Content from the user’s profile is used for searching the web documents. The documents which exactly matches the user requirement is retrieved and it improvises the personalized retrieval. In this paper, a methodology based on topic modelling is proposed to determine the retrieval of information for user to increase the accuracy of documents using Latent Dirichlet Allocation (LDA) and Weighted Nearest Neighbor (WNN) models. LDA model is developed to retrieve documents based on topics. The topic based retrieval is improvised using personalization technique which uses WNN model. Experimental analysis on building personalization and semantic retrieval of documents shows the improved precision compared to existing topic modeling.

Download Full-text

Similarity Detection Using Latent Semantic Analysis Algorithm

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i8.124 ◽

2018 ◽

Vol 6 (8) ◽

pp. 102

Author(s):

Priyanka R. Patil ◽

Shital A. Patil

Keyword(s):

Latent Semantic Analysis ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Mining Method ◽

Research Papers ◽

Information Measures ◽

Automated Software ◽

Day By Day ◽

Ways Of Life ◽

Dirichlet Allocation

Similarity View is an application for visually comparing and exploring multiple models of text and collection of document. Friendbook finds ways of life of clients from client driven sensor information, measures the closeness of ways of life amongst clients, and prescribes companions to clients if their ways of life have high likeness. Roused by demonstrate a clients day by day life as life records, from their ways of life are separated by utilizing the Latent Dirichlet Allocation Algorithm. Manual techniques can't be utilized for checking research papers, as the doled out commentator may have lacking learning in the exploration disciplines. For different subjective views, causing possible misinterpretations. An urgent need for an effective and feasible approach to check the submitted research papers with support of automated software. A method like text mining method come to solve the problem of automatically checking the research papers semantically. The proposed method to finding the proper similarity of text from the collection of documents by using Latent Dirichlet Allocation (LDA) algorithm and Latent Semantic Analysis (LSA) with synonym algorithm which is used to find synonyms of text index wise by using the English wordnet dictionary, another algorithm is LSA without synonym used to find the similarity of text based on index. LSA with synonym rate of accuracy is greater when the synonym are consider for matching.

Download Full-text

Contribution to Semantic Analysis of Arabic Language

Advances in Artificial Intelligence ◽

10.1155/2012/620461 ◽

2012 ◽

Vol 2012 ◽

pp. 1-8 ◽

Cited By ~ 6

Author(s):

Anis Zouaghi ◽

Mounir Zrigui ◽

Georges Antoniadis ◽

Laroussi Merhbene

Keyword(s):

Information Retrieval ◽

Semantic Analysis ◽

String Matching ◽

Ambiguous Word ◽

Arabic Language ◽

New Approach ◽

Matching Algorithm ◽

Ambiguous Words ◽

Lesk Algorithm ◽

Context Of Use

We propose a new approach for determining the adequate sense of Arabic words. For that, we propose an algorithm based on information retrieval measures to identify the context of use that is the closest to the sentence containing the word to be disambiguated. The contexts of use represent a set of sentences that indicates a particular sense of the ambiguous word. These contexts are generated using the words that define the senses of the ambiguous words, the exact string-matching algorithm, and the corpus. We use the measures employed in the domain of information retrieval, Harman, Croft, and Okapi combined to the Lesk algorithm, to assign the correct sense of those proposed.

Download Full-text

A distributional semantics-based information retrieval framework for online social networks

Intelligent Decision Technologies ◽

10.3233/idt-200001 ◽

2021 ◽

pp. 1-11

Author(s):

V.S. Anoop ◽

P. Deepak ◽

S. Asharaf

Keyword(s):

Social Networks ◽

Information Retrieval ◽

Online Social Networks ◽

Latent Dirichlet Allocation ◽

Relevant Information ◽

Distributional Semantics ◽

Scalable Algorithms ◽

Mobile Platforms ◽

Cancer Support ◽

Efficient Extraction

Online social networks are considered to be one of the most disruptive platforms where people communicate with each other on any topic ranging from funny cat videos to cancer support. The widespread diffusion of mobile platforms such as smart-phones causes the number of messages shared in such platforms to grow heavily, thus more intelligent and scalable algorithms are needed for efficient extraction of useful information. This paper proposes a method for retrieving relevant information from social network messages using a distributional semantics-based framework powered by topic modeling. The proposed framework combines the Latent Dirichlet Allocation and distributional representation of phrases (Phrase2Vec) for effective information retrieval from online social networks. Extensive and systematic experiments on messages collected from Twitter (tweets) show this approach outperforms some state-of-the-art approaches in terms of precision and accuracy and better information retrieval is possible using the proposed method.

Download Full-text

The use of thesauri in online retrieval

Journal of Information Science ◽

10.1177/016555158400800204 ◽

1984 ◽

Vol 8 (2) ◽

pp. 63-66 ◽

Cited By ~ 8

Author(s):

C.P.R. Dubois

Keyword(s):

Information Retrieval ◽

Data Base ◽

Case Studies ◽

Controlled Vocabulary ◽

Free Text ◽

Data Bases ◽

Online Data ◽

Controlled Vocabularies ◽

Semantic Maps ◽

Actual Use

The controlled vocabulary versus the free text approach to information retrieval is reviewed from the mid 1960s to the early 1980s. The dominance of the free text approach following the Cranfield tests is increasingly coming into question as a result of tests on existing online data bases and case studies. This is supported by two case studies on the Coffeeline data base. The differences and values of the two approaches are explored considering thesauri as semantic maps. It is suggested that the most appropriate evaluatory technique for indexing languages is to study the actual use made of various techniques in a wide variety of search environments. Such research is becoming more urgent. Economic and other reasons for the scarcity of online thesauri are reviewed and suggestions are made for methods to secure revenue from thesaurus display facilities. Finally, the promising outlook for renewed develop ment of controlled vocabularies with more effective online display techniques is mentioned, although such development must be based on firm research of user behaviour and needs.

Download Full-text

Kernel latent semantic analysis using an information retrieval based kernel

Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09 ◽

10.1145/1645953.1646214 ◽

2009 ◽

Cited By ~ 1

Author(s):

Laurence A.F. Park ◽

Kotagiri Ramamohanarao

Keyword(s):

Information Retrieval ◽

Latent Semantic Analysis ◽

Semantic Analysis

Download Full-text