scholarly journals A Topological Method for Comparing Document Semantics

2020 ◽  
Author(s):  
Yuqi Kong ◽  
Fanchao Meng ◽  
Ben Carterette

Comparing document semantics is one of the toughest tasks in both Natural Language Processing and Information Retrieval. To date, on one hand, the tools for this task are still rare. On the other hand, most relevant methods are devised from the statistic or the vector space model perspectives but nearly none from a topological perspective. In this paper, we hope to make a different sound. A novel algorithm based on topological persistence for comparing semantics similarity between two documents is proposed. Our experiments are conducted on a document dataset with human judges’ results. A collection of state-of-the-art methods are selected for comparison. The experimental results show that our algorithm can produce highly human-consistent results, and also beats most state-of-the-art methods though ties with NLTK.

2019 ◽  
Vol 53 (2) ◽  
pp. 3-10
Author(s):  
Muthu Kumar Chandrasekaran ◽  
Philipp Mayr

The 4 th joint BIRNDL workshop was held at the 42nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) in Paris, France. BIRNDL 2019 intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. The workshop incorporated different paper sessions and the 5 th edition of the CL-SciSumm Shared Task.


Author(s):  
Davide Picca ◽  
Dominique Jaccard ◽  
Gérald Eberlé

In the last decades, Natural Language Processing (NLP) has obtained a high level of success. Interactions between NLP and Serious Games have started and some of them already include NLP techniques. The objectives of this paper are twofold: on the one hand, providing a simple framework to enable analysis of potential uses of NLP in Serious Games and, on the other hand, applying the NLP framework to existing Serious Games and giving an overview of the use of NLP in pedagogical Serious Games. In this paper we present 11 serious games exploiting NLP techniques. We present them systematically, according to the following structure:  first, we highlight possible uses of NLP techniques in Serious Games, second, we describe the type of NLP implemented in the each specific Serious Game and, third, we provide a link to possible purposes of use for the different actors interacting in the Serious Game.


2014 ◽  
Vol 22 (1) ◽  
pp. 73-95 ◽  
Author(s):  
GÁBOR BEREND

AbstractKeyphrases are the most important phrases of documents that make them suitable for improving natural language processing tasks, including information retrieval, document classification, document visualization, summarization and categorization. Here, we propose a supervised framework augmented by novel extra-textual information derived primarily from Wikipedia. Wikipedia is utilized in such an advantageous way that – unlike most other methods relying on Wikipedia – a full textual index of all the Wikipedia articles is not required by our approach, as we only exploit the category hierarchy and a list of multiword expressions derived from Wikipedia. This approach is not only less resource intensive, but also produces comparable or superior results compared to previous similar works. Our thorough evaluations also suggest that the proposed framework performs consistently well on multiple datasets, being competitive or even outperforming the results obtained by other state-of-the-art methods. Besides introducing features that incorporate extra-textual information, we also experimented with a novel way of representing features that are derived from the POS tagging of the keyphrase candidates.


Author(s):  
Fazel Keshtkar ◽  
Ledong Shi ◽  
Syed Ahmad Chan Bukhari

Finding our favorite dishes have became a hard task since restaurants are providing more choices and va- rieties. On the other hand, comments and reviews of restaurants are a good place to look for the answer. The purpose of this study is to use computational linguistics and natural language processing to categorise and find semantic relation in various dishes based on reviewers’ comments and menus description. Our goal is to imple- ment a state-of-the-art computational linguistics meth- ods such as, word embedding model, word2vec, topic modeling, PCA, classification algorithm. For visualiza- tions, t-Distributed Stochastic Neighbor Embedding (t- SNE) was used to explore the relation within dishes and their reviews. We also aim to extract the common pat- terns between different dishes among restaurants and reviews comment, and in reverse, explore the dishes with a semantics relations. A dataset of articles related to restaurant and located dishes within articles used to find comment patterns. Then we applied t-SNE visual- izations to identify the root of each feature of the dishes. As a result, to find a dish our model is able to assist users by several words of description and their inter- est. Our dataset contains 1,000 articles from food re- views agency on a variety of dishes from different cul- tures: American, i.e. ’steak’, hamburger; Chinese, i.e. ’stir fry’, ’dumplings’; Japanese, i.e., ’sushi’.


Author(s):  
Yasufumi Takama ◽  
◽  
Kaoru Hirota

We propose a new concept of intelligent support systems for topic-based information retrieval. As information retrieval (IR) on the World Wide Web (WWW) becomes widespread, new types of tools and systems that do not only find specific pages the user wants, but also and helping the user learn about a particular field of interest are increasingly needed. Two systems based on this consideration are introduced in this paper. One is the Fish View system for supporting document-ordering. It focuses on the user’s document-ordering (making diagrams) while reading, and the user’s viewpoint is represented by a combination of a small number of concepts taken from the existing concept structure dictionary. The extracted viewpoint can be used for measuring the similarity among documents, using fisheye matching, the extended Vector Space Model. The other is the query network for visualization of the topic distribution through WWW IR, and its concept employing the Immune Network model is introduced with preliminary experiments.


2013 ◽  
Vol 07 (03) ◽  
pp. 257-290 ◽  
Author(s):  
KE HAO ◽  
PHILLIP C-Y SHEU ◽  
HIROSHI YAMAGUCHI

This paper addresses semantic search of Web services using natural language processing. First we survey various existing approaches, focusing on the fact that the expensive costs of current semantic annotation frameworks result in limited use of semantic search for large scale applications. We then propose a service search framework based on the vector space model to combine the traditional frequency weighted term-document matrix, the syntactical information extracted from a lexical database and a dependency grammar parser. In particular, instead of using terms as the rows in a term-document matrix, we propose using synsets from WordNet to distinguish different meanings of a word under different contexts as well as clustering different words with similar meanings. Also based on the characteristics of Web services descriptions, we propose an approach to identifying semantically important terms to adjust weightings. Our experiments show that our approach achieves its goal well.


Author(s):  
Francisco Claude ◽  
Daniil Galaktionov ◽  
Roberto Konow ◽  
Susana Ladra ◽  
Óscar Pedreira

Author profiling consists in determining some demographic attributes — such as gender, age, nationality, language, religion, and others — of an author for a given document. This task, which has applications in fields such as forensics, security, or marketing, has been approached from different areas, especially from linguistics and natural language processing, by extracting different types of features from training documents, usually content — and style-based features. In this paper we address the problem by using several compression-inspired strategies that generate different models without analyzing or extracting specific features from the textual content, making them style-oblivious approaches. We analyze the behavior of these techniques, combine them and compare them with other state-of-the-art methods. We show that they can be competitive in terms of accuracy, giving the best predictions for some domains, and they are efficient in time performance.


2021 ◽  
Author(s):  
Sohrab Ferdowsi ◽  
Nikolay Borissov ◽  
Elham Kashani ◽  
David Vicente Alvarez ◽  
Jenny Copara ◽  
...  

AbstractIn the context of searching for COVID-19 related scientific literature, we present an information retrieval methodology for effectively finding relevant publications for different information needs. We discuss different components of our architecture consisting of traditional information retrieval models, as well as modern neural natural language processing algorithms. We present recipes to better adapt these components to the case of an infodemic, where, from one hand, the number of publications has an exponential growth and, from the other hand, the topics of interest evolve as the pandemic progresses. The methodology was evaluated in the TREC-COVID challenge, achieving competitive results with top ranking teams participating in the competition. In retrospect to this challenge, we provide additional insights with further useful impacts.


2017 ◽  
Vol 13 (4) ◽  
pp. 89-108 ◽  
Author(s):  
Santosh Kumar Bharti ◽  
Ramkrushna Pradhan ◽  
Korra Sathya Babu ◽  
Sanjay Kumar Jena

In Natural Language Processing (NLP), sarcasm analysis in the text is considered as the most challenging task. It has been broadly researched in recent years. The property of sarcasm that makes it harder to detect is the gap between the literal and its intended meaning. It is a particular kind of sentiment which is capable of flipping the entire sense of a text. Sarcasm is often expressed verbally through the use of high pitch with heavy tonal stress. The other clues of sarcasm are the usage of various gestures such as gently sloping of eyes, hands movements, shaking heads, etc. However, the appearances of these clues for sarcasm are absent in textual data which makes the detection of sarcasm dependent upon several other factors. In this article, six algorithms were proposed to analyze the sarcasm in tweets of Twitter. These algorithms are based on the possible occurrences of sarcasm in tweets. Finally, the experimental results of the proposed algorithms were compared with some of the existing state-of-the-art.


Sign in / Sign up

Export Citation Format

Share Document