Semantic Relatedness Estimation using the Layout Information of Wikipedia Articles

Patrick Chan; Yoshinori Hijikata; Toshiya Kuramochi; Shogo Nishida

doi:10.4018/ijcini.2013040103

Semantic Relatedness Estimation using the Layout Information of Wikipedia Articles

International Journal of Cognitive Informatics and Natural Intelligence ◽

10.4018/ijcini.2013040103 ◽

2013 ◽

Vol 7 (2) ◽

pp. 30-48 ◽

Cited By ~ 2

Author(s):

Patrick Chan ◽

Yoshinori Hijikata ◽

Toshiya Kuramochi ◽

Shogo Nishida

Keyword(s):

Information Retrieval ◽

Word Frequency ◽

Language Processing ◽

Semantic Analysis ◽

State Of The Art ◽

Empirical Evaluation ◽

Low Frequency ◽

Semantic Relatedness ◽

Conventional Approach ◽

Explicit Semantic Analysis

Computing the semantic relatedness between two words or phrases is an important problem in fields such as information retrieval and natural language processing. Explicit Semantic Analysis (ESA), a state-of-the-art approach to solve the problem uses word frequency to estimate relevance. Therefore, the relevance of words with low frequency cannot always be well estimated. To improve the relevance estimate of low-frequency words and concepts, the authors apply regression to word frequency, its location in an article, and its text style to calculate the relevance. The relevance value is subsequently used to compute semantic relatedness. Empirical evaluation shows that, for low-frequency words, the authors’ method achieves better estimate of semantic relatedness over ESA. Furthermore, when all words of the dataset are considered, the combination of the authors’ proposed method and the conventional approach outperforms the conventional approach alone.

Download Full-text

Towards optimize-ESA for text semantic similarity: A case study of biomedical text

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i3.pp2934-2943 ◽

2020 ◽

Vol 10 (3) ◽

pp. 2934

Author(s):

Khaoula Mrhar ◽

Mounia Abik

Keyword(s):

Semantic Similarity ◽

Language Processing ◽

Semantic Analysis ◽

Semantic Relatedness ◽

High Dimensional ◽

Specific Domain ◽

Large Matrix ◽

Index Matrix ◽

Explicit Semantic Analysis

Explicit Semantic Analysis (ESA) is an approach to measure the semantic relatedness between terms or documents based on similarities to documents of a references corpus usually Wikipedia. ESA usage has received tremendous attention in the field of natural language processing NLP and information retrieval. However, ESA utilizes a huge Wikipedia index matrix in its interpretation by multiplying a large matrix by a term vector to produce a high-dimensional vector. Consequently, the ESA process is too expensive in interpretation and similarity steps. Therefore, the efficiency of ESA will slow down because we lose a lot of time in unnecessary operations. This paper propose enhancements to ESA called optimize-ESA that reduce the dimension at the interpretation stage by computing the semantic similarity in a specific domain. The experimental results show clearly that our method correlates much better with human judgement than the full version ESA approach.

Download Full-text

Designing a Chat-Bot for College Information using Information Retrieval and Automatic Text Summarization Techniques

Current Chinese Computer Science ◽

10.2174/2665997201999201022191540 ◽

2020 ◽

Vol 01 ◽

Author(s):

Radha Guha

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Text Summarization ◽

The Internet ◽

Specific Domain ◽

User Query ◽

College Information ◽

Chat Bot

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.

Download Full-text

Report on the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries at SIGIR 2019

ACM SIGIR Forum ◽

10.1145/3458553.3458554 ◽

2019 ◽

Vol 53 (2) ◽

pp. 3-10

Author(s):

Muthu Kumar Chandrasekaran ◽

Philipp Mayr

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Research And Development ◽

Language Processing ◽

Digital Libraries ◽

State Of The Art ◽

Shared Task ◽

Processing Information ◽

Joint Workshop

The 4 th joint BIRNDL workshop was held at the 42nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) in Paris, France. BIRNDL 2019 intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. The workshop incorporated different paper sessions and the 5 th edition of the CL-SciSumm Shared Task.

Download Full-text

BUILD KNOWLEDGE GRAPH FROM HETEROGENEOUS DOCUMENTS

Journal of Science and Technology - IUH ◽

10.46242/jst-iuh.v47i05.761 ◽

2021 ◽

Vol 47 (05) ◽

Author(s):

NGUYỄN CHÍ HIẾU

Keyword(s):

Information Retrieval ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Question Answering ◽

Semantic Analysis ◽

Knowledge Graph ◽

Question Answering Systems ◽

Knowledge Graphs

Knowledge Graphs are applied in many fields such as search engines, semantic analysis, and question answering in recent years. However, there are many obstacles for building knowledge graphs as methodologies, data and tools. This paper introduces a novel methodology to build knowledge graph from heterogeneous documents. We use the methodologies of Natural Language Processing and deep learning to build this graph. The knowledge graph can use in Question answering systems and Information retrieval especially in Computing domain

Download Full-text

Concept-Based Information Retrieval Using Explicit Semantic Analysis

ACM Transactions on Information Systems ◽

10.1145/1961209.1961211 ◽

2011 ◽

Vol 29 (2) ◽

pp. 1-34 ◽

Cited By ~ 146

Author(s):

Ofer Egozi ◽

Shaul Markovitch ◽

Evgeniy Gabrilovich

Keyword(s):

Information Retrieval ◽

Semantic Analysis ◽

Explicit Semantic Analysis

Download Full-text

Explicit Semantic Analysis for computing semantic relatedness of biomedical text

2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence) ◽

10.1109/confluence.2014.6949325 ◽

2014 ◽

Author(s):

Ayush Jaiswal ◽

Anunay Bhargava

Keyword(s):

Semantic Analysis ◽

Semantic Relatedness ◽

Biomedical Text ◽

Explicit Semantic Analysis

Download Full-text

A Topological Method for Comparing Document Semantics

10.5121/csit.2020.101411 ◽

2020 ◽

Author(s):

Yuqi Kong ◽

Fanchao Meng ◽

Ben Carterette

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Language Processing ◽

State Of The Art ◽

Vector Space Model ◽

The Other ◽

Space Model ◽

Topological Persistence ◽

Art Methods ◽

Novel Algorithm

Comparing document semantics is one of the toughest tasks in both Natural Language Processing and Information Retrieval. To date, on one hand, the tools for this task are still rare. On the other hand, most relevant methods are devised from the statistic or the vector space model perspectives but nearly none from a topological perspective. In this paper, we hope to make a different sound. A novel algorithm based on topological persistence for comparing semantics similarity between two documents is proposed. Our experiments are conducted on a document dataset with human judges’ results. A collection of state-of-the-art methods are selected for comparison. The experimental results show that our algorithm can produce highly human-consistent results, and also beats most state-of-the-art methods though ties with NLTK.

Download Full-text

Extended Explicit Semantic Analysis for Calculating Semantic Relatedness of Web Resources

Sustaining TEL: From Innovation to Learning and Practice - Lecture Notes in Computer Science ◽

10.1007/978-3-642-16020-2_22 ◽

2010 ◽

pp. 324-339 ◽

Cited By ~ 8

Author(s):

Philipp Scholl ◽

Doreen Böhnstedt ◽

Renato Domínguez García ◽

Christoph Rensing ◽

Ralf Steinmetz

Keyword(s):

Semantic Analysis ◽

Semantic Relatedness ◽

Web Resources ◽

Explicit Semantic Analysis

Download Full-text

Lexical constraints in phonological acquisition

Journal of Child Language ◽

10.1017/s0305000999003797 ◽

1999 ◽

Vol 26 (2) ◽

pp. 261-294 ◽

Cited By ~ 41

Author(s):

JUDITH A. GIERUT ◽

MICHELE L. MORRISETTE ◽

ANNETTE HUST CHAMPION

Keyword(s):

Word Frequency ◽

Language Processing ◽

Low Frequency ◽

Dependent Measure ◽

Sound Change ◽

Phonological Acquisition ◽

Lexical Diffusion ◽

Phonological Change ◽

Lexical Constraints ◽

Neighbourhood Structure

Lexical diffusion, as characterized by interword variation in production, was examined in phonological acquisition. The lexical variables of word frequency and neighbourhood density were hypothesized to facilitate sound change to varying degrees. Twelve children with functional phonological delays, aged 3;0 to 7;4, participated in an alternating treatments experiment to promote sound change. Independent variables were crossed to yield all logically possible combinations of high/low frequency and high/low density in treatment; the dependent measure was generalization accuracy in production. Results indicated word frequency was most facilitative in sound change, whereas, dense neighbourhood structure was least facilitative. The salience of frequency and avoidance of high density are discussed relative to the type of phonological change being induced in children's grammars, either phonetic or phonemic, and to the nature of children's representations. Results are further interpreted with reference to interactive models of language processing and optimality theoretic accounts of linguistic structure.

Download Full-text

Semantic Diversity Accounts for the “Missing” Word Frequency Effect in Stroke Aphasia: Insights Using a Novel Method to Quantify Contextual Variability in Meaning

Journal of Cognitive Neuroscience ◽

10.1162/jocn.2011.21614 ◽

2011 ◽

Vol 23 (9) ◽

pp. 2432-2446 ◽

Cited By ~ 63

Author(s):

Paul Hoffman ◽

Timothy T. Rogers ◽

Matthew A. Lambon Ralph

Keyword(s):

Word Frequency ◽

Language Processing ◽

Conceptual Knowledge ◽

Computational Models ◽

Semantic Analysis ◽

Semantic Dementia ◽

Frequency Effect ◽

Word Frequency Effect ◽

Frequency Effects ◽

Control Processes

Word frequency is a powerful predictor of language processing efficiency in healthy individuals and in computational models. Puzzlingly, frequency effects are often absent in stroke aphasia, challenging the assumption that word frequency influences the behavior of any computational system. To address this conundrum, we investigated divergent effects of frequency in two comprehension-impaired patient groups. Patients with semantic dementia have degraded conceptual knowledge as a consequence of anterior temporal lobe atrophy and show strong frequency effects. Patients with multimodal semantic impairments following stroke (semantic aphasia [SA]), in contrast, show little or no frequency effect. Their deficits arise from impaired control processes that bias activation toward task-relevant aspects of knowledge. We hypothesized that high-frequency words exert greater demands on cognitive control because they are more semantically diverse—they tend to appear in a broader range of linguistic contexts and have more variable meanings. Using latent semantic analysis, we developed a new measure of semantic diversity that reflected the variability of a word's meaning across different context. Frequency, but not diversity, was a significant predictor of comprehension in semantic dementia, whereas diversity was the best predictor of performance in SA. Most importantly, SA patients did show typical frequency effects but only when the influence of diversity was taken into account. These results are consistent with the view that higher-frequency words place higher demands on control processes, so that when control processes are damaged the intrinsic processing advantages associated with higher-frequency words are masked.

Download Full-text