Designing a Chat-Bot for College Information using Information Retrieval and Automatic Text Summarization Techniques

Current Chinese Computer Science ◽

10.2174/2665997201999201022191540 ◽

2020 ◽

Vol 01 ◽

Author(s):

Radha Guha

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Text Summarization ◽

The Internet ◽

Specific Domain ◽

User Query ◽

College Information ◽

Chat Bot

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.

Download Full-text

Modified Firefly Algorithm and Fuzzy C-Mean Clustering Based Semantic Information Retrieval

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2012 ◽

2021 ◽

Author(s):

M. Subramaniam ◽

A. Kathirvel ◽

E. Sabitha ◽

H. Anwar Basha

Keyword(s):

Information Retrieval ◽

Firefly Algorithm ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Online Data ◽

Web Documents ◽

User Query ◽

Fcm Clustering ◽

Improved Performance ◽

Modified Firefly Algorithm

As enormous volume of electronic data increased gradually, searching as well as retrieving essential info from the internet is extremely difficult task. Normally, the Information Retrieval (IR) systems present info dependent upon the user’s query keywords. At present, it is insufficient as large volume of online data and it contains less precision as the system takes syntactic level search into consideration. Furthermore, numerous previous search engines utilize a variety of techniques for semantic based document extraction and the relevancy between the documents has been measured using page ranking methods. On the other hand, it contains certain problems with searching time. With the intention of enhancing the query searching time, the research system implemented a Modified Firefly Algorithm (MFA) adapted with Intelligent Ontology and Latent Dirichlet Allocation based Information Retrieval (IOLDAIR) model. In this recommended methodology, the set of web documents, Face book comments and tweets are taken as dataset. By means of utilizing Tokenization process, the dataset pre-processing is carried out. Strong ontology is built dependent upon a lot of info collected by means of referring via diverse websites. Find out the keywords as well as carry out semantic analysis with user query by utilizing ontology matching by means of jaccard similarity. The feature extraction is carried out dependent upon the semantic analysis. After that, by means of Modified Firefly Algorithm (MFA), the ideal features are chosen. With the help of Fuzzy C-Mean (FCM) clustering, the appropriate documents are grouped and rank them. At last by using IOLDAIR model, the appropriate information’s are extracted. The major benefit of the research technique is the raise in relevancy, capability of dealing with big data as well as fast retrieval. The experimentation outcomes prove that the presented method attains improved performance when matched up with the previous system.

Download Full-text

BUILD KNOWLEDGE GRAPH FROM HETEROGENEOUS DOCUMENTS

Journal of Science and Technology - IUH ◽

10.46242/jst-iuh.v47i05.761 ◽

2021 ◽

Vol 47 (05) ◽

Author(s):

NGUYỄN CHÍ HIẾU

Keyword(s):

Information Retrieval ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Question Answering ◽

Semantic Analysis ◽

Knowledge Graph ◽

Question Answering Systems ◽

Knowledge Graphs

Knowledge Graphs are applied in many fields such as search engines, semantic analysis, and question answering in recent years. However, there are many obstacles for building knowledge graphs as methodologies, data and tools. This paper introduces a novel methodology to build knowledge graph from heterogeneous documents. We use the methodologies of Natural Language Processing and deep learning to build this graph. The knowledge graph can use in Question answering systems and Information retrieval especially in Computing domain

Download Full-text

Open-Ended Questions

Employee Surveys and Sensing ◽

10.1093/oso/9780190939717.003.0013 ◽

2020 ◽

pp. 202-218

Author(s):

Subhadra Dutta ◽

Eric M. O’Rourke

Keyword(s):

Machine Learning ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Written Language ◽

Text Data ◽

Employee Survey ◽

Trade Offs ◽

Word Relatedness ◽

Survey Responses

Natural language processing (NLP) is the field of decoding human written language. This chapter responds to the growing interest in using machine learning–based NLP approaches for analyzing open-ended employee survey responses. These techniques address scalability and the ability to provide real-time insights to make qualitative data collection equally or more desirable in organizations. The chapter walks through the evolution of text analytics in industrial–organizational psychology and discusses relevant supervised and unsupervised machine learning NLP methods for survey text data, such as latent Dirichlet allocation, latent semantic analysis, sentiment analysis, word relatedness methods, and so on. The chapter also lays out preprocessing techniques and the trade-offs of growing NLP capabilities internally versus externally, points the readers to available resources, and ends with discussing implications and future directions of these approaches.

Download Full-text

A brief review on text summarization methods

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.5.25070 ◽

2018 ◽

Vol 7 (4.5) ◽

pp. 728

Author(s):

Rasmita Rautray ◽

Lopamudra Swain ◽

Rasmita Dash ◽

Rajashree Dash

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Summarization ◽

Large Corpus

In present scenario, text summarization is a popular and active field of research in both the Information Retrieval (IR) and Natural Language Processing (NLP) communities. Summarization is important for IR since it is a means to identify useful information by condensing the document from large corpus of data in an efficient way. In this study, different aspects of text summarization methods with strength, limitation and gap within the methods are presented.

Download Full-text

Thematic Context Derivator Algorithm for Enhanced Context Vector Machine: eCVM

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4564.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 4872-4877

Keyword(s):

Language Processing ◽

Latent Semantic Analysis ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Named Entities ◽

Pagerank Algorithm ◽

Context Vector ◽

Improved Performance ◽

Evaluation Parameters ◽

Thematic Context

Natural Language Processing uses word embeddings to map words into vectors. Context vector is one of the techniques to map words into vectors. The context vector gives importance of terms in the document corpus. The derivation of context vector is done using various methods such as neural networks, latent semantic analysis, knowledge base methods etc. This paper proposes a novel system to devise an enhanced context vector machine called eCVM. eCVM is able to determine the context phrases and its importance. eCVM uses latent semantic analysis, existing context vector machine, dependency parsing, named entities, topics from latent dirichlet allocation and various forms of words like nouns, adjectives and verbs for building the context. eCVM uses context vector and Pagerank algorithm to find the importance of the term in document and is tested on BBC news dataset. Results of eCVM are compared with compared with the state of the art for context detrivation. The proposed system shows improved performance over existing systems for standard evaluation parameters.

Download Full-text

OBIRE

International Journal of Distributed Systems and Technologies ◽

10.4018/jdst.2010100105 ◽

2010 ◽

Vol 1 (4) ◽

pp. 58-73

Author(s):

Xiangyu Liu ◽

Maozhen Li ◽

Yang Liu ◽

Man Qi

Keyword(s):

Fuzzy Logic ◽

Information Retrieval ◽

Domain Knowledge ◽

P2p Network ◽

The Internet ◽

Bibliographic Record ◽

P2p Networks ◽

Bibliographic Information ◽

User Query ◽

Bibliographic Records

It has been widely recognized that bibliographic information plays an increasingly important role for scientific research. Peer-to-peer (P2P) networks provide an effective environment for people belonging to a community to share various resources on the Internet. This paper presents OBIRE, an ontology based P2P network for bibliographic information retrieval. For a user query, OBIRE computes the degree of matches to indicate the similarity of a published record to the query. When searching for information, users can incorporate their domain knowledge into their queries which guides OBIRE to discover the bibliographic records that are of most interest of users. In addition, fuzzy logic based user recommendations are used to compute the trustiness of a set of keywords used by a bibliographic record which assists users in selecting bibliographic records. OBIRE is evaluated from the aspects of precision and recall, and experimental results show the effectiveness of OBIRE in bibliographic information retrieval.

Download Full-text

College Information Chat-Bot System Based on Natural Language Processing

Journal of Xidian University ◽

10.37896/jxu14.5/086 ◽

2020 ◽

Vol 14 (5) ◽

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

College Information ◽

Chat Bot

Download Full-text

PLSA-Based Personalized Information Retrieval with Network Regularization

Journal of Information Technology Research ◽

10.4018/jitr.2019010108 ◽

2019 ◽

Vol 12 (1) ◽

pp. 105-116

Author(s):

Qiuyu Zhu ◽

Dongmei Li ◽

Cong Dai ◽

Qichen Han ◽

Yi Lin

Keyword(s):

Information Retrieval ◽

Semantic Analysis ◽

Topic Model ◽

Rapid Development ◽

Probabilistic Latent Semantic Analysis ◽

Retrieval Model ◽

User Interest ◽

Model Based ◽

User Query ◽

Academic Information

With the rapid development of the Internet, the information retrieval model based on the keywords matching algorithm has not met the requirements of users, because people with various query history always have different retrieval intentions. User query history often implies their interests. Therefore, it is of great importance to enhance the recall ratio and the precision ratio by applying query history into the judgment of retrieval intentions. For this sake, this article does research on user query history and proposes a method to construct user interest model utilizing query history. Coordinately, the authors design a model called PLSA-based Personalized Information Retrieval with Network Regularization. Finally, the model is applied into academic information retrieval and the authors compare it with Baidu Scholar and the personalized information retrieval model based on the probabilistic latent semantic analysis topic model. The experiment results prove that this model can effectively extract topics and retrieves back results more satisfied for users' requirements. Also, this model improves the effect of retrieval results apparently. In addition, the retrieval model can be utilized not only in the academic information retrieval, but also in the personalized information retrieval on microblog search, associate recommendation, etc.

Download Full-text

An Efficient Topic Modeling Approach for Text Mining and Information Retrieval through K-means Clustering

Mehran University Research Journal of Engineering and Technology ◽

10.22581/muet1982.2001.20 ◽

2020 ◽

Vol 39 (1) ◽

pp. 213-222

Author(s):

Junaid Rashid ◽

Syed Muhammad Adnan Shah ◽

Aun Irtaza

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Topic Modeling ◽

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

State Of The Art ◽

Text Documents ◽

New Perspective ◽

Better Than

Topic modeling is an effective text mining and information retrieval approach to organizing knowledge with various contents under a specific topic. Text documents in form of news articles are increasing very fast on the web. Analysis of these documents is very important in the fields of text mining and information retrieval. Meaningful information extraction from these documents is a challenging task. One approach for discovering the theme from text documents is topic modeling but this approach still needs a new perspective to improve its performance. In topic modeling, documents have topics and topics are the collection of words. In this paper, we propose a new k-means topic modeling (KTM) approach by using the k-means clustering algorithm. KTM discovers better semantic topics from a collection of documents. Experiments on two real-world Reuters 21578 and BBC News datasets show that KTM performance is better than state-of-the-art topic models like LDA (Latent Dirichlet Allocation) and LSA (Latent Semantic Analysis). The KTM is also applicable for classification and clustering tasks in text mining and achieves higher performance with a comparison of its competitors LDA and LSA.

Download Full-text

Developing a new approach to summarize Arabic text automatically using syntactic and semantic analysis

International Journal of Engineering & Technology ◽

10.14419/ijet.v9i2.30324 ◽

2020 ◽

Vol 9 (2) ◽

pp. 342

Author(s):

Amal Alkhudari

Keyword(s):

Language Processing ◽

Automatic System ◽

Semantic Analysis ◽

Text Summarization ◽

Original Text ◽

Arabic Text ◽

Wide Spread ◽

New Approach ◽

Automatic Text Summarization ◽

Automatic Text

Due to the wide spread information and the diversity of its sources, there is a need to produce an accurate text summary with the least time and effort. This summary must preserve key information content and overall meaning of the original text. Text summarization is one of the most important applications of Natural Language Processing (NLP). The goal of automatic text summarization is to create summaries that are similar to human-created ones. However, in many cases, the readability of created summaries is not satisfactory, because the summaries do not consider the meaning of the words and do not cover all the semantically relevant aspects of data. In this paper we use syntactic and semantic analysis to propose an automatic system of Arabic texts summarization. This system is capable of understanding the meaning of information and retrieves only the relevant part. The effectiveness and evaluation of the proposed work are demonstrated under EASC corpus using Rouge measure. The generated summaries will be compared against those done by human and precedent researches.

Download Full-text