Natural language technology and query expansion: issues, state-of-the-art and perspectives

AbstractAn increasing number of researchers and practitioners in Natural Language Engineering face the prospect of having to work with entire texts, rather than individual sentences. While it is clear that text must have useful structure, its nature may be less clear, making it more difficult to exploit in applications. This survey of work on discourse structure thus provides a primer on the bases of which discourse is structured along with some of their formal properties. It then lays out the current state-of-the-art with respect to algorithms for recognizing these different structures, and how these algorithms are currently being used in Language Technology applications. After identifying resources that should prove useful in improving algorithm performance across a range of languages, we conclude by speculating on future discourse structure-enabled technology.

Download Full-text

Report on the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries at SIGIR 2019

ACM SIGIR Forum ◽

10.1145/3458553.3458554 ◽

2019 ◽

Vol 53 (2) ◽

pp. 3-10

Author(s):

Muthu Kumar Chandrasekaran ◽

Philipp Mayr

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Research And Development ◽

Language Processing ◽

Digital Libraries ◽

State Of The Art ◽

Shared Task ◽

Processing Information ◽

Joint Workshop

The 4 th joint BIRNDL workshop was held at the 42nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) in Paris, France. BIRNDL 2019 intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. The workshop incorporated different paper sessions and the 5 th edition of the CL-SciSumm Shared Task.

Download Full-text

Large-scale Semantic Parsing without Question-Answer Pairs

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00190 ◽

2014 ◽

Vol 2 ◽

pp. 377-392 ◽

Cited By ~ 40

Author(s):

Siva Reddy ◽

Mirella Lapata ◽

Mark Steedman

Keyword(s):

Natural Language ◽

Large Scale ◽

Graph Matching ◽

State Of The Art ◽

The State ◽

Semantic Parsing ◽

Matching Problem ◽

Weak Supervision ◽

Benchmark Datasets

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.

Download Full-text

Recommending Relevant Tutorial Fragments for API-Related Natural Language Questions

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021500406 ◽

2021 ◽

Vol 31 (09) ◽

pp. 1251-1275

Author(s):

Di Wu ◽

Xiao-Yuan Jing ◽

Haowen Chen ◽

Xiaohui Kong ◽

Jifeng Xuan

Keyword(s):

Natural Language ◽

State Of The Art ◽

Metric Learning ◽

Application Programming Interface ◽

Manual Annotation ◽

Candidate List ◽

Novel Approach ◽

Application Programming ◽

Programming Interface ◽

Reciprocal Rank

Application Programming Interface (API) tutorial is an important API learning resource. To help developers learn APIs, an API tutorial is often split into a number of consecutive units that describe the same topic (i.e. tutorial fragment). We regard a tutorial fragment explaining an API as a relevant fragment of the API. Automatically recommending relevant tutorial fragments can help developers learn how to use an API. However, existing approaches often employ supervised or unsupervised manner to recommend relevant fragments, which suffers from much manual annotation effort or inaccurate recommended results. Furthermore, these approaches only support developers to input exact API names. In practice, developers often do not know which APIs to use so that they are more likely to use natural language to describe API-related questions. In this paper, we propose a novel approach, called Tutorial Fragment Recommendation (TuFraRec), to effectively recommend relevant tutorial fragments for API-related natural language questions, without much manual annotation effort. For an API tutorial, we split it into fragments and extract APIs from each fragment to build API-fragment pairs. Given a question, TuFraRec first generates several clarification APIs that are related to the question. We use clarification APIs and API-fragment pairs to construct candidate API-fragment pairs. Then, we design a semi-supervised metric learning (SML)-based model to find relevant API-fragment pairs from the candidate list, which can work well with a few labeled API-fragment pairs and a large number of unlabeled API-fragment pairs. In this way, the manual effort for labeling the relevance of API-fragment pairs can be reduced. Finally, we sort and recommend relevant API-fragment pairs based on the recommended strategy. We evaluate TuFraRec on 200 API-related natural language questions and two public tutorial datasets (Java and Android). The results demonstrate that on average TuFraRec improves NDCG@5 by 0.06 and 0.09, and improves Mean Reciprocal Rank (MRR) by 0.07 and 0.09 on two tutorial datasets as compared with the state-of-the-art approach.

Download Full-text

Multimodal Systems

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.45 ◽

2014 ◽

Author(s):

Elisabeth André ◽

Jean-Claude Martin

Keyword(s):

Natural Language ◽

Rapid Growth ◽

Large Data ◽

Access To Information ◽

Language Technology ◽

Data Archives ◽

Multimodal Systems ◽

Integral Role ◽

Effective Output

Recent years have witnessed a rapid growth in the development of multimodal systems. Improving technology and tools enable the development of more intuitive styles of interaction and convenient ways of accessing large data archives. Starting from the observation that natural language plays an integral role in many multimodal systems, this chapter focuses on the use of natural language in combination with other modalities, such as body gestures or gaze. It addresses the following three issues: (1) how to integrate multimodal input including spoken or typed language in a synergistic manner; (2) how to combine natural language with other modalities in order to generate more effective output; and (3) how to make use of natural language technology in combination with other modalities in order to enable better access to information.

Download Full-text

Natural Language Technology in Mobile Devices: Two Grounding Frameworks

Mobile Speech and Advanced Natural Language Solutions ◽

10.1007/978-1-4614-6018-3_7 ◽

2012 ◽

pp. 185-196

Author(s):

Jerome R. Bellegarda

Keyword(s):

Natural Language ◽

Mobile Devices ◽

Language Technology

Download Full-text

Analyzing Compositionality-Sensitivity of NLI Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016867 ◽

2019 ◽

Vol 33 ◽

pp. 6867-6874 ◽

Cited By ~ 1

Author(s):

Yixin Nie ◽

Yicheng Wang ◽

Mohit Bansal

Keyword(s):

Natural Language ◽

High Probability ◽

State Of The Art ◽

Model Performance ◽

Analysis Tool ◽

Bag Of Words ◽

Compositional Semantics ◽

Sensitivity Testing ◽

Performance Loss ◽

Future Work

Success in natural language inference (NLI) should require a model to understand both lexical and compositional semantics. However, through adversarial evaluation, we find that several state-of-the-art models with diverse architectures are over-relying on the former and fail to use the latter. Further, this compositionality unawareness is not reflected via standard evaluation on current datasets. We show that removing RNNs in existing models or shuffling input words during training does not induce large performance loss despite the explicit removal of compositional information. Therefore, we propose a compositionality-sensitivity testing setup that analyzes models on natural examples from existing datasets that cannot be solved via lexical features alone (i.e., on which a bag-of-words model gives a high probability to one wrong label), hence revealing the models’ actual compositionality awareness. We show that this setup not only highlights the limited compositional ability of current NLI models, but also differentiates model performance based on design, e.g., separating shallow bag-of-words models from deeper, linguistically-grounded tree-based models. Our evaluation setup is an important analysis tool: complementing currently existing adversarial and linguistically driven diagnostic evaluations, and exposing opportunities for future work on evaluating models’ compositional understanding.

Download Full-text

Densely Supervised Hierarchical Policy-Value Network for Image Paragraph Generation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/137 ◽

2019 ◽

Author(s):

Siying Wu ◽

Zheng-Jun Zha ◽

Zilei Wang ◽

Houqiang Li ◽

Feng Wu

Keyword(s):

Natural Language ◽

State Of The Art ◽

Cross Entropy ◽

Image Captioning ◽

Value Network ◽

Entropy Loss ◽

Fine Grained ◽

Performance Improvements ◽

Single Sentence ◽

Multiple State

Image paragraph generation aims to describe an image with a paragraph in natural language. Compared to image captioning with a single sentence, paragraph generation provides more expressive and fine-grained description for storytelling. Existing approaches mainly optimize paragraph generator towards minimizing word-wise cross entropy loss, which neglects linguistic hierarchy of paragraph and results in ``sparse" supervision for generator learning. In this paper, we propose a novel Densely Supervised Hierarchical Policy-Value (DHPV) network for effective paragraph generation. We design new hierarchical supervisions consisting of hierarchical rewards and values at both sentence and word levels. The joint exploration of hierarchical rewards and values provides dense supervision cues for learning effective paragraph generator. We propose a new hierarchical policy-value architecture which exploits compositionality at token-to-token and sentence-to-sentence levels simultaneously and can preserve the semantic and syntactic constituent integrity. Extensive experiments on the Stanford image-paragraph benchmark have demonstrated the effectiveness of the proposed DHPV approach with performance improvements over multiple state-of-the-art methods.

Download Full-text

SANTM: Efficient Self-attention-driven Network for Text Matching

ACM Transactions on Internet Technology ◽

10.1145/3426971 ◽

2022 ◽

Vol 22 (3) ◽

pp. 1-21

Author(s):

Prayag Tiwari ◽

Amit Kumar Jaiswal ◽

Sahil Garg ◽

Ilsun You

Keyword(s):

Natural Language ◽

State Of The Art ◽

The State ◽

Attention Mechanism ◽

Matching Problems ◽

Attention Model ◽

Extra Information ◽

Textual Entailment ◽

Benchmark Datasets ◽

Text Matching

Self-attention mechanisms have recently been embraced for a broad range of text-matching applications. Self-attention model takes only one sentence as an input with no extra information, i.e., one can utilize the final hidden state or pooling. However, text-matching problems can be interpreted either in symmetrical or asymmetrical scopes. For instance, paraphrase detection is an asymmetrical task, while textual entailment classification and question-answer matching are considered asymmetrical tasks. In this article, we leverage attractive properties of self-attention mechanism and proposes an attention-based network that incorporates three key components for inter-sequence attention: global pointwise features, preceding attentive features, and contextual features while updating the rest of the components. Our model follows evaluation on two benchmark datasets cover tasks of textual entailment and question-answer matching. The proposed efficient Self-attention-driven Network for Text Matching outperforms the state of the art on the Stanford Natural Language Inference and WikiQA datasets with much fewer parameters.

Download Full-text

Issues in the Syntactic Parsing of Queries for a Natural Language Interface to Databases

Handbook of Research on Natural Language Processing and Smart Service Systems - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-4730-4.ch007 ◽

2021 ◽

pp. 157-179

Author(s):

Alexander Gelbukh ◽

José A. Martínez F. ◽

Andres Verastegui ◽

Alberto Ochoa

Keyword(s):

Natural Language ◽

State Of The Art ◽

Experimental Tests ◽

Syntactic Parsing ◽

Natural Language Interfaces ◽

Natural Language Interface ◽

Overall Performance

In this chapter, an exhaustive parser is presented. The parser was developed to be used in a natural language interface to databases (NLIDB) project. This chapter includes a brief description of state-of-the-art NLIDBs, including a description of the methods used and the performance of some interfaces. Some of the general problems in natural language interfaces to databases are also explained. The exhaustive parser was developed, aiming at improving the overall performance of the interface; therefore, the interface is also briefly described. This chapter also presents the drawbacks discovered during the experimental tests of the parser, which show that it is unsuitable for improving the NLIDB performance.

Download Full-text