A Topological Method for Comparing Document Semantics

Mapping Intimacies ◽

10.5121/csit.2020.101411 ◽

2020 ◽

Author(s):

Yuqi Kong ◽

Fanchao Meng ◽

Ben Carterette

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Language Processing ◽

State Of The Art ◽

Vector Space Model ◽

The Other ◽

Space Model ◽

Topological Persistence ◽

Art Methods ◽

Novel Algorithm

Comparing document semantics is one of the toughest tasks in both Natural Language Processing and Information Retrieval. To date, on one hand, the tools for this task are still rare. On the other hand, most relevant methods are devised from the statistic or the vector space model perspectives but nearly none from a topological perspective. In this paper, we hope to make a different sound. A novel algorithm based on topological persistence for comparing semantics similarity between two documents is proposed. Our experiments are conducted on a document dataset with human judges’ results. A collection of state-of-the-art methods are selected for comparison. The experimental results show that our algorithm can produce highly human-consistent results, and also beats most state-of-the-art methods though ties with NLTK.

Download Full-text

Report on the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries at SIGIR 2019

ACM SIGIR Forum ◽

10.1145/3458553.3458554 ◽

2019 ◽

Vol 53 (2) ◽

pp. 3-10

Author(s):

Muthu Kumar Chandrasekaran ◽

Philipp Mayr

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Research And Development ◽

Language Processing ◽

Digital Libraries ◽

State Of The Art ◽

Shared Task ◽

Processing Information ◽

Joint Workshop

The 4 th joint BIRNDL workshop was held at the 42nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) in Paris, France. BIRNDL 2019 intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. The workshop incorporated different paper sessions and the 5 th edition of the CL-SciSumm Shared Task.

Download Full-text

Natural Language Processing in Serious Games: A state of the art.

International Journal of Serious Games ◽

10.17083/ijsg.v2i3.87 ◽

2015 ◽

Vol 2 (3) ◽

Cited By ~ 5

Author(s):

Davide Picca ◽

Dominique Jaccard ◽

Gérald Eberlé

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Serious Games ◽

State Of The Art ◽

Serious Game ◽

The Other ◽

Other Hand ◽

The One ◽

High Level

In the last decades, Natural Language Processing (NLP) has obtained a high level of success. Interactions between NLP and Serious Games have started and some of them already include NLP techniques. The objectives of this paper are twofold: on the one hand, providing a simple framework to enable analysis of potential uses of NLP in Serious Games and, on the other hand, applying the NLP framework to existing Serious Games and giving an overview of the use of NLP in pedagogical Serious Games. In this paper we present 11 serious games exploiting NLP techniques. We present them systematically, according to the following structure: first, we highlight possible uses of NLP techniques in Serious Games, second, we describe the type of NLP implemented in the each specific Serious Game and, third, we provide a link to possible purposes of use for the different actors interacting in the Serious Game.

Download Full-text

Exploiting extra-textual and linguistic information in keyphrase extraction

Natural Language Engineering ◽

10.1017/s1351324914000126 ◽

2014 ◽

Vol 22 (1) ◽

pp. 73-95 ◽

Cited By ~ 6

Author(s):

GÁBOR BEREND

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Language Processing ◽

State Of The Art ◽

Keyphrase Extraction ◽

Textual Information ◽

Multiword Expressions ◽

Pos Tagging ◽

Multiple Datasets ◽

Document Visualization

AbstractKeyphrases are the most important phrases of documents that make them suitable for improving natural language processing tasks, including information retrieval, document classification, document visualization, summarization and categorization. Here, we propose a supervised framework augmented by novel extra-textual information derived primarily from Wikipedia. Wikipedia is utilized in such an advantageous way that – unlike most other methods relying on Wikipedia – a full textual index of all the Wikipedia articles is not required by our approach, as we only exploit the category hierarchy and a list of multiword expressions derived from Wikipedia. This approach is not only less resource intensive, but also produces comparable or superior results compared to previous similar works. Our thorough evaluations also suggest that the proposed framework performs consistently well on multiple datasets, being competitive or even outperforming the results obtained by other state-of-the-art methods. Besides introducing features that incorporate extra-textual information, we also experimented with a novel way of representing features that are derived from the POS tagging of the keyphrase candidates.

Download Full-text

The Semantics and Collocations Relation in Food Reviews

The International FLAIRS Conference Proceedings ◽

10.32473/flairs.v34i1.128372 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Fazel Keshtkar ◽

Ledong Shi ◽

Syed Ahmad Chan Bukhari

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Topic Modeling ◽

State Of The Art ◽

Semantic Relation ◽

The Other ◽

Good Place ◽

The Common

Finding our favorite dishes have became a hard task since restaurants are providing more choices and va- rieties. On the other hand, comments and reviews of restaurants are a good place to look for the answer. The purpose of this study is to use computational linguistics and natural language processing to categorise and find semantic relation in various dishes based on reviewers’ comments and menus description. Our goal is to imple- ment a state-of-the-art computational linguistics meth- ods such as, word embedding model, word2vec, topic modeling, PCA, classification algorithm. For visualiza- tions, t-Distributed Stochastic Neighbor Embedding (t- SNE) was used to explore the relation within dishes and their reviews. We also aim to extract the common pat- terns between different dishes among restaurants and reviews comment, and in reverse, explore the dishes with a semantics relations. A dataset of articles related to restaurant and located dishes within articles used to find comment patterns. Then we applied t-SNE visual- izations to identify the root of each feature of the dishes. As a result, to find a dish our model is able to assist users by several words of description and their inter- est. Our dataset contains 1,000 articles from food re- views agency on a variety of dishes from different cul- tures: American, i.e. ’steak’, hamburger; Chinese, i.e. ’stir fry’, ’dumplings’; Japanese, i.e., ’sushi’.

Download Full-text

Topic-based Intelligent Support System for Information Retrieval

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2000.p0457 ◽

2000 ◽

Vol 4 (6) ◽

pp. 457-463

Author(s):

Yasufumi Takama ◽

◽

Kaoru Hirota

Keyword(s):

Information Retrieval ◽

World Wide ◽

Vector Space Model ◽

The Other ◽

Immune Network ◽

Intelligent Support ◽

Space Model ◽

The World ◽

Topic Distribution ◽

Immune Network Model

We propose a new concept of intelligent support systems for topic-based information retrieval. As information retrieval (IR) on the World Wide Web (WWW) becomes widespread, new types of tools and systems that do not only find specific pages the user wants, but also and helping the user learn about a particular field of interest are increasingly needed. Two systems based on this consideration are introduced in this paper. One is the Fish View system for supporting document-ordering. It focuses on the user’s document-ordering (making diagrams) while reading, and the user’s viewpoint is represented by a combination of a small number of concepts taken from the existing concept structure dictionary. The extracted viewpoint can be used for measuring the similarity among documents, using fisheye matching, the extended Vector Space Model. The other is the query network for visualization of the topic distribution through WWW IR, and its concept employing the Immune Network model is introduced with preliminary experiments.

Download Full-text

SEMANTIC SEARCH OF SERVICES

International Journal of Semantic Computing ◽

10.1142/s1793351x13500049 ◽

2013 ◽

Vol 07 (03) ◽

pp. 257-290 ◽

Cited By ~ 1

Author(s):

KE HAO ◽

PHILLIP C-Y SHEU ◽

HIROSHI YAMAGUCHI

Keyword(s):

Natural Language Processing ◽

Web Services ◽

Vector Space ◽

Language Processing ◽

Large Scale ◽

Semantic Annotation ◽

Vector Space Model ◽

Semantic Search ◽

Lexical Database ◽

Space Model

This paper addresses semantic search of Web services using natural language processing. First we survey various existing approaches, focusing on the fact that the expensive costs of current semantic annotation frameworks result in limited use of semantic search for large scale applications. We then propose a service search framework based on the vector space model to combine the traditional frequency weighted term-document matrix, the syntactical information extracted from a lexical database and a dependency grammar parser. In particular, instead of using terms as the rows in a term-document matrix, we propose using synsets from WordNet to distinguish different meanings of a word under different contexts as well as clustering different words with similar meanings. Also based on the characteristics of Web services descriptions, we propose an approach to identifying semantically important terms to adjust weightings. Our experiments show that our approach achieves its goal well.

Download Full-text

Natural Language Processing Based Question Answering Using Vector Space Model

Advances in Intelligent Systems and Computing - Proceedings of Sixth International Conference on Soft Computing for Problem Solving ◽

10.1007/978-981-10-3325-4_37 ◽

2017 ◽

pp. 368-375 ◽

Cited By ~ 1

Author(s):

R. Jayashree ◽

N. Niveditha

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Vector Space ◽

Language Processing ◽

Question Answering ◽

Vector Space Model ◽

Space Model

Download Full-text

Competitive Author Profiling Using Compression-Based Strategies

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488517400086 ◽

2017 ◽

Vol 25 (Suppl. 2) ◽

pp. 5-20

Author(s):

Francisco Claude ◽

Daniil Galaktionov ◽

Roberto Konow ◽

Susana Ladra ◽

Óscar Pedreira

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Time Performance ◽

Different Types ◽

Art Methods ◽

Demographic Attributes ◽

Author Profiling ◽

Textual Content

Author profiling consists in determining some demographic attributes — such as gender, age, nationality, language, religion, and others — of an author for a given document. This task, which has applications in fields such as forensics, security, or marketing, has been approached from different areas, especially from linguistics and natural language processing, by extracting different types of features from training documents, usually content — and style-based features. In this paper we address the problem by using several compression-inspired strategies that generate different models without analyzing or extracting specific features from the textual content, making them style-oblivious approaches. We analyze the behavior of these techniques, combine them and compare them with other state-of-the-art methods. We show that they can be competitive in terms of accuracy, giving the best predictions for some domains, and they are efficient in time performance.

Download Full-text

Information retrieval in an infodemic: the case of COVID-19 publications

10.1101/2021.01.29.428847 ◽

2021 ◽

Author(s):

Sohrab Ferdowsi ◽

Nikolay Borissov ◽

Elham Kashani ◽

David Vicente Alvarez ◽

Jenny Copara ◽

...

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Language Processing ◽

Exponential Growth ◽

Information Needs ◽

Scientific Literature ◽

The Other ◽

Retrieval Models ◽

Processing Algorithms ◽

Number Of Publications

AbstractIn the context of searching for COVID-19 related scientific literature, we present an information retrieval methodology for effectively finding relevant publications for different information needs. We discuss different components of our architecture consisting of traditional information retrieval models, as well as modern neural natural language processing algorithms. We present recipes to better adapt these components to the case of an infodemic, where, from one hand, the number of publications has an exponential growth and, from the other hand, the topics of interest evolve as the pandemic progresses. The methodology was evaluated in the TREC-COVID challenge, achieving competitive results with top ranking teams participating in the competition. In retrospect to this challenge, we provide additional insights with further useful impacts.

Download Full-text

Sarcastic Sentiment Detection Based on Types of Sarcasm Occurring in Twitter Data

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2017100105 ◽

2017 ◽

Vol 13 (4) ◽

pp. 89-108 ◽

Cited By ~ 4

Author(s):

Santosh Kumar Bharti ◽

Ramkrushna Pradhan ◽

Korra Sathya Babu ◽

Sanjay Kumar Jena

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

The Other ◽

High Pitch ◽

Intended Meaning ◽

Twitter Data ◽

Textual Data ◽

Sentiment Detection

In Natural Language Processing (NLP), sarcasm analysis in the text is considered as the most challenging task. It has been broadly researched in recent years. The property of sarcasm that makes it harder to detect is the gap between the literal and its intended meaning. It is a particular kind of sentiment which is capable of flipping the entire sense of a text. Sarcasm is often expressed verbally through the use of high pitch with heavy tonal stress. The other clues of sarcasm are the usage of various gestures such as gently sloping of eyes, hands movements, shaking heads, etc. However, the appearances of these clues for sarcasm are absent in textual data which makes the detection of sarcasm dependent upon several other factors. In this article, six algorithms were proposed to analyze the sarcasm in tweets of Twitter. These algorithms are based on the possible occurrences of sarcasm in tweets. Finally, the experimental results of the proposed algorithms were compared with some of the existing state-of-the-art.

Download Full-text