Computing Lexical Contrast

Knowing the degree of semantic contrast between words has widespread application in natural language processing, including machine translation, information retrieval, and dialogue systems. Manually created lexicons focus on opposites, such as hot and cold. Opposites are of many kinds such as antipodals, complementaries, and gradable. Existing lexicons often do not classify opposites into the different kinds, however. They also do not explicitly list word pairs that are not opposites but yet have some degree of contrast in meaning, such as warm and cold or tropical and freezing. We propose an automatic method to identify contrasting word pairs that is based on the hypothesis that if a pair of words, A and B, are contrasting, then there is a pair of opposites, C and D, such that A and C are strongly related and B and D are strongly related. (For example, there exists the pair of opposites hot and cold such that tropical is related to hot, and freezing is related to cold.) We will call this the contrast hypothesis. We begin with a large crowdsourcing experiment to determine the amount of human agreement on the concept of oppositeness and its different kinds. In the process, we flesh out key features of different kinds of opposites. We then present an automatic and empirical measure of lexical contrast that relies on the contrast hypothesis, corpus statistics, and the structure of a Roget-like thesaurus. We show how, using four different data sets, we evaluated our approach on two different tasks, solving “most contrasting word” questions and distinguishing synonyms from opposites. The results are analyzed across four parts of speech and across five different kinds of opposites. We show that the proposed measure of lexical contrast obtains high precision and large coverage, outperforming existing methods.

Download Full-text

Dagstuhl seminar 19461 on conversational search

ACM SIGIR Forum ◽

10.1145/3451964.3451967 ◽

2020 ◽

Vol 54 (1) ◽

pp. 1-11

Author(s):

Avishek Anand ◽

Lawrence Cavedon ◽

Matthias Hagen ◽

Hideo Joho ◽

Mark Sanderson ◽

...

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Human Computer Interaction ◽

Language Processing ◽

Research Agenda ◽

Web Search ◽

Dialogue Systems ◽

Future Directions ◽

Working Groups

In the week of November 10--15, 2019, 44 researchers from the fields of information retrieval and Web search, natural language processing, human computer interaction, and dialogue systems met for the Dagstuhl Seminar 19461 "Conversational Search" to share the latest development in the area of conversational search and discuss its research agenda and future directions. The clear signal from the seminar is that research opportunities to advance conversational search are available to many areas and that collaboration in an interdisciplinary community is essential to achieve the goals. This report overviews the program and selected findings of the working groups.

Download Full-text

Rancang Bangun Aplikasi Chatbot Sebagai Media Pencarian Informasi Anime Menggunakan Regular Expression Pattern Matching

Jurnal ULTIMATICS ◽

10.31937/ti.v9i1.559 ◽

2017 ◽

Vol 9 (1) ◽

pp. 19-24 ◽

Cited By ~ 1

Author(s):

David Domarco ◽

Ni Made Satvika Iswari

Keyword(s):

Information Retrieval ◽

Expression Pattern ◽

Pattern Matching ◽

Language Processing ◽

Regular Expression ◽

Technology Development ◽

Data Retrieval ◽

Index Terms ◽

Retrieval Engine ◽

Behavioral Intention To Use

Technology development has affected many areas of life, especially the entertainment field. One of the fastest growing entertainment industry is anime. Anime has evolved as a trend and a hobby, especially for the population in the regions of Asia. The number of anime fans grow every year and trying to dig up as much information about their favorite anime. Therefore, a chatbot application was developed in this study as anime information retrieval media using regular expression pattern matching method. This application is intended to facilitate the anime fans in searching for information about the anime they like. By using this application, user can gain a convenience and interactive anime data retrieval that can’t be found when searching for information via search engines. Chatbot application has successfully met the standards of information retrieval engine with a very good results, the value of 72% precision and 100% recall showing the harmonic mean of 83.7%. As the application of hedonic, chatbot already influencing Behavioral Intention to Use by 83% and Immersion by 82%. Index Terms—anime, chatbot, information retrieval, Natural Language Processing (NLP), Regular Expression Pattern Matching

Download Full-text

Designing a Chat-Bot for College Information using Information Retrieval and Automatic Text Summarization Techniques

Current Chinese Computer Science ◽

10.2174/2665997201999201022191540 ◽

2020 ◽

Vol 01 ◽

Author(s):

Radha Guha

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Text Summarization ◽

The Internet ◽

Specific Domain ◽

User Query ◽

College Information ◽

Chat Bot

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.

Download Full-text

Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval

10.1145/3342827 ◽

2019 ◽

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

International Conference

Download Full-text

Thai Fake News Detection Based on Information Retrieval, Natural Language Processing and Machine Learning

SN Computer Science ◽

10.1007/s42979-021-00775-6 ◽

2021 ◽

Vol 2 (6) ◽

Author(s):

Phayung Meesad

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Fake News

Download Full-text

Neural methods for effective, efficient, and exposure-aware information retrieval

ACM SIGIR Forum ◽

10.1145/3476415.3476434 ◽

2021 ◽

Vol 55 (1) ◽

pp. 1-2

Author(s):

Bhaskar Mitra

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Large Scale ◽

Web Search ◽

Real Life ◽

Inverted Index ◽

Information Need ◽

Product Model ◽

Performance Improvements ◽

Deep Model

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.

Download Full-text

Developing the Persian Wordnet of Verbs Using Supervised Learning

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3450969 ◽

2021 ◽

Vol 20 (4) ◽

pp. 1-18

Author(s):

Zahra Mousavi ◽

Heshaam Faili

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Language Processing ◽

Supervised Classification ◽

Word Sense ◽

Direct Influence ◽

Training Set ◽

Bilingual Dictionary ◽

Automated Method ◽

Princeton Wordnet

Nowadays, wordnets are extensively used as a major resource in natural language processing and information retrieval tasks. Therefore, the accuracy of wordnets has a direct influence on the performance of the involved applications. This paper presents a fully-automated method for extending a previously developed Persian wordnet to cover more comprehensive and accurate verbal entries. At first, by using a bilingual dictionary, some Persian verbs are linked to Princeton WordNet synsets. A feature set related to the semantic behavior of compound verbs as the majority of Persian verbs is proposed. This feature set is employed in a supervised classification system to select the proper links for inclusion in the wordnet. We also benefit from a pre-existing Persian wordnet, FarsNet, and a similarity-based method to produce a training set. This is the largest automatically developed Persian wordnet with more than 27,000 words, 28,000 PWN synsets and 67,000 word-sense pairs that substantially outperforms the previous Persian wordnet with about 16,000 words, 22,000 PWN synsets and 38,000 word-sense pairs.

Download Full-text

Report on the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries at SIGIR 2019

ACM SIGIR Forum ◽

10.1145/3458553.3458554 ◽

2019 ◽

Vol 53 (2) ◽

pp. 3-10

Author(s):

Muthu Kumar Chandrasekaran ◽

Philipp Mayr

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Research And Development ◽

Language Processing ◽

Digital Libraries ◽

State Of The Art ◽

Shared Task ◽

Processing Information ◽

Joint Workshop

The 4 th joint BIRNDL workshop was held at the 42nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) in Paris, France. BIRNDL 2019 intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. The workshop incorporated different paper sessions and the 5 th edition of the CL-SciSumm Shared Task.

Download Full-text

Learning emotional word embeddings for sentiment analysis

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201993 ◽

2021 ◽

pp. 1-13

Author(s):

Qingtian Zeng ◽

Xishi Zhao ◽

Xiaohui Hu ◽

Hua Duan ◽

Zhongying Zhao ◽

...

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

State Of The Art ◽

Research Problem ◽

Emotional Word ◽

Classification Model ◽

Data Sets ◽

Word Embeddings ◽

Real World Data ◽

Text Documents

Word embeddings have been successfully applied in many natural language processing tasks due to its their effectiveness. However, the state-of-the-art algorithms for learning word representations from large amounts of text documents ignore emotional information, which is a significant research problem that must be addressed. To solve the above problem, we propose an emotional word embedding (EWE) model for sentiment analysis in this paper. This method first applies pre-trained word vectors to represent document features using two different linear weighting methods. Then, the resulting document vectors are input to a classification model and used to train a text sentiment classifier, which is based on a neural network. In this way, the emotional polarity of the text is propagated into the word vectors. The experimental results on three kinds of real-world data sets demonstrate that the proposed EWE model achieves superior performances on text sentiment prediction, text similarity calculation, and word emotional expression tasks compared to other state-of-the-art models.

Download Full-text

Integrating natural language processing and information retrieval in a troubleshooting help desk

IEEE Expert ◽

10.1109/64.248348 ◽

1993 ◽

Vol 8 (6) ◽

pp. 9-17 ◽

Cited By ~ 5

Author(s):

P.G. Anick

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Help Desk

Download Full-text