Single-shot Semantic Matching Network for Moment Localization in Videos

Moment localization in videos using natural language refers to finding the most relevant segment from videos given a natural language query. Most of the existing methods require video segment candidates for further matching with the query, which leads to extra computational costs, and they may also not locate the relevant moments under any length evaluated. To address these issues, we present a lightweight single-shot semantic matching network (SSMN) to avoid the complex computations required to match the query and the segment candidates, and the proposed SSMN can locate moments of any length theoretically. Using the proposed SSMN, video features are first uniformly sampled to a fixed number, while the query sentence features are generated and enhanced by GloVe, long-term short memory (LSTM), and soft-attention modules. Subsequently, the video features and sentence features are fed to an enhanced cross-modal attention model to mine the semantic relationships between vision and language. Finally, a score predictor and a location predictor are designed to locate the start and stop indexes of the query moment. We evaluate the proposed method on two benchmark datasets and the experimental results demonstrate that SSMN outperforms state-of-the-art methods in both precision and efficiency.

Download Full-text

DRr-Net: Dynamic Re-Read Network for Sentence Semantic Matching

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017442 ◽

2019 ◽

Vol 33 ◽

pp. 7442-7449 ◽

Cited By ~ 3

Author(s):

Kun Zhang ◽

Guangyi Lv ◽

Linyuan Wang ◽

Le Wu ◽

Enhong Chen ◽

...

Keyword(s):

Natural Language ◽

Small Region ◽

Psychological Research ◽

Attention Mechanism ◽

Semantic Relations ◽

Semantic Matching ◽

Close Attention ◽

Original Sentence ◽

Benchmark Datasets ◽

Sentence Matching

Sentence semantic matching requires an agent to determine the semantic relation between two sentences, which is widely used in various natural language tasks such as Natural Language Inference (NLI) and Paraphrase Identification (PI). Among all matching methods, attention mechanism plays an important role in capturing the semantic relations and properly aligning the elements of two sentences. Previous methods utilized attention mechanism to select important parts of sentences at one time. However, the important parts of the sentence during semantic matching are dynamically changing with the degree of sentence understanding. Selecting the important parts at one time may be insufficient for semantic understanding. To this end, we propose a Dynamic Re-read Network (DRr-Net) approach for sentence semantic matching, which is able to pay close attention to a small region of sentences at each step and re-read the important words for better sentence semantic understanding. To be specific, we first employ Attention Stack-GRU (ASG) unit to model the original sentence repeatedly and preserve all the information from bottom-most word embedding input to up-most recurrent output. Second, we utilize Dynamic Re-read (DRr) unit to pay close attention to one important word at one time with the consideration of learned information and re-read the important words for better sentence semantic understanding. Extensive experiments on three sentence matching benchmark datasets demonstrate that DRr-Net has the ability to model sentence semantic more precisely and significantly improve the performance of sentence semantic matching. In addition, it is very interesting that some of finding in our experiments are consistent with the findings of psychological research.

Download Full-text

SANTM: Efficient Self-attention-driven Network for Text Matching

ACM Transactions on Internet Technology ◽

10.1145/3426971 ◽

2022 ◽

Vol 22 (3) ◽

pp. 1-21

Author(s):

Prayag Tiwari ◽

Amit Kumar Jaiswal ◽

Sahil Garg ◽

Ilsun You

Keyword(s):

Natural Language ◽

State Of The Art ◽

The State ◽

Attention Mechanism ◽

Matching Problems ◽

Attention Model ◽

Extra Information ◽

Textual Entailment ◽

Benchmark Datasets ◽

Text Matching

Self-attention mechanisms have recently been embraced for a broad range of text-matching applications. Self-attention model takes only one sentence as an input with no extra information, i.e., one can utilize the final hidden state or pooling. However, text-matching problems can be interpreted either in symmetrical or asymmetrical scopes. For instance, paraphrase detection is an asymmetrical task, while textual entailment classification and question-answer matching are considered asymmetrical tasks. In this article, we leverage attractive properties of self-attention mechanism and proposes an attention-based network that incorporates three key components for inter-sequence attention: global pointwise features, preceding attentive features, and contextual features while updating the rest of the components. Our model follows evaluation on two benchmark datasets cover tasks of textual entailment and question-answer matching. The proposed efficient Self-attention-driven Network for Text Matching outperforms the state of the art on the Stanford Natural Language Inference and WikiQA datasets with much fewer parameters.

Download Full-text

Functional Partitioning of Ontologies for Natural Language Query Completion in Question Answering Systems

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/602 ◽

2018 ◽

Author(s):

Jaydeep Sen ◽

Ashish Mittal ◽

Diptikalyan Saha ◽

Karthik Sankaranarayanan

Keyword(s):

Natural Language ◽

Question Answering ◽

Completion Problem ◽

Question Answering Systems ◽

Query Logs ◽

Natural Language Query ◽

Information Retrieval Systems ◽

Benchmark Datasets ◽

Novel Concept ◽

Functional Partitioning

Query completion systems are well studied in the context of information retrieval systems that handle keyword queries. However, Natural Language Interface to Databases (NLIDB) systems that focus on syntactically correct and semantically complete queries to obtain high precision answers require a fundamentally different approach to the query completion problem as opposed to IR systems. To the best of our knowledge, we are first to focus on the problem of query completion for NLIDB systems. In particular, we introduce a novel concept of functional partitioning of an ontology and then design algorithms to intelligently use the components obtained from functional partitioning to extend a state-of-the-art NLIDB system to produce accurate and semantically meaningful query completions in the absence of query logs. We test the proposed query completion framework on multiple benchmark datasets and demonstrate the efficacy of our technique empirically.

Download Full-text

Joint Multimodal Embedding and Backtracking Search in Vision-and-Language Navigation

Sensors ◽

10.3390/s21031012 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1012

Author(s):

Jisu Hwang ◽

Incheol Kim

Keyword(s):

Natural Language ◽

Language Processing ◽

Input Data ◽

Language Instruction ◽

Scoring Method ◽

Processing Technologies ◽

Backtracking Search ◽

Panoramic Images ◽

Benchmark Datasets ◽

Vision And Language

Due to the development of computer vision and natural language processing technologies in recent years, there has been a growing interest in multimodal intelligent tasks that require the ability to concurrently understand various forms of input data such as images and text. Vision-and-language navigation (VLN) require the alignment and grounding of multimodal input data to enable real-time perception of the task status on panoramic images and natural language instruction. This study proposes a novel deep neural network model (JMEBS), with joint multimodal embedding and backtracking search for VLN tasks. The proposed JMEBS model uses a transformer-based joint multimodal embedding module. JMEBS uses both multimodal context and temporal context. It also employs backtracking-enabled greedy local search (BGLS), a novel algorithm with a backtracking feature designed to improve the task success rate and optimize the navigation path, based on the local and global scores related to candidate actions. A novel global scoring method is also used for performance improvement by comparing the partial trajectories searched thus far with a plurality of natural language instructions. The performance of the proposed model on various operations was then experimentally demonstrated and compared with other models using the Matterport3D Simulator and room-to-room (R2R) benchmark datasets.

Download Full-text

Large-scale Semantic Parsing without Question-Answer Pairs

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00190 ◽

2014 ◽

Vol 2 ◽

pp. 377-392 ◽

Cited By ~ 40

Author(s):

Siva Reddy ◽

Mirella Lapata ◽

Mark Steedman

Keyword(s):

Natural Language ◽

Large Scale ◽

Graph Matching ◽

State Of The Art ◽

The State ◽

Semantic Parsing ◽

Matching Problem ◽

Weak Supervision ◽

Benchmark Datasets

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.

Download Full-text

Quantum metrology with full and fast quantum control

Quantum ◽

10.22331/q-2017-09-06-27 ◽

2017 ◽

Vol 1 ◽

pp. 27 ◽

Cited By ~ 43

Author(s):

Pavel Sekatski ◽

Michalis Skotiniotis ◽

Janek Kołodyński ◽

Wolfgang Dür

Keyword(s):

Quantum Control ◽

Frequency Estimation ◽

Fixed Number ◽

Constant Factor ◽

Quantum Limit ◽

Single Shot ◽

Standard Quantum Limit ◽

Parallel Scheme ◽

Rank One ◽

Fast Control

We establish general limits on how precise a parameter, e.g. frequency or the strength of a magnetic field, can be estimated with the aid of full and fast quantum control. We consider uncorrelated noisy evolutions of N qubits and show that fast control allows to fully restore the Heisenberg scaling (~1/N^2) for all rank-one Pauli noise except dephasing. For all other types of noise the asymptotic quantum enhancement is unavoidably limited to a constant-factor improvement over the standard quantum limit (~1/N) even when allowing for the full power of fast control. The latter holds both in the single-shot and infinitely-many repetitions scenarios. However, even in this case allowing for fast quantum control helps to increase the improvement factor. Furthermore, for frequency estimation with finite resource we show how a parallel scheme utilizing any fixed number of entangled qubits but no fast quantum control can be outperformed by a simple, easily implementable, sequential scheme which only requires entanglement between one sensing and one auxiliary qubit.

Download Full-text

A Natural Language Query Framework for the Semantic Web

Journal of Korean institute of intelligent systems ◽

10.5391/jkiis.2008.18.1.127 ◽

2008 ◽

Vol 18 (1) ◽

pp. 127-132

Author(s):

Jin-Sung Kim

Keyword(s):

Semantic Web ◽

Natural Language ◽

Natural Language Query

Download Full-text

Knowledge Graph Question and Answer System for Mechanical Intelligent Manufacturing Based on Deep Learning

Mathematical Problems in Engineering ◽

10.1155/2021/6627114 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Miaoyuan Shi

Keyword(s):

Deep Learning ◽

Natural Language ◽

Intelligent Manufacturing ◽

Focus Of Attention ◽

Knowledge Graph ◽

Single Entity ◽

Question And Answer ◽

Natural Language Query ◽

The One ◽

Answering Questions

With the development of deep learning and its wide application in the field of natural language, the question and answer research of knowledge graph based on deep learning has gradually become the focus of attention. After that, the natural language query is converted into a structured query sentence to identify the entities and attributes in the user’s natural language query and the specified entities and attributes are used to retrieve answers to the knowledge graph. Using the advantage of deep learning in capturing sentence information, it incorporates the attention mechanism to obtain the semantic vector of the relevant attributes in the query and uses the parameter sharing mechanism to insert candidate attributes into the triple in the same model to obtain the semantic vector of typical candidates. The experiment measured that under the 100,000 RDF dataset, the single entity query of the MIQE model does not exceed 3 seconds, and the connection query does not exceed 5 seconds. Under the one-million RDF dataset, the single entity query of the MIQE model does not exceed 8 seconds, and the connection query will not be more than 10 seconds. Experimental data show that the system of knowledge-answering questions of engineering of intelligent construction based on deep learning has good horizontal scalability.

Download Full-text

Comparison of Templates with Word2vec in Finding Semantic Relations Between Words

Journal of Intelligent Systems with Applications ◽

10.54856/jiswa.201805007 ◽

2018 ◽

pp. 13-17

Author(s):

Kaan Ant ◽

Ugur Sogukpinar ◽

Mehmet Fatif Amasyali

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Semantic Relations ◽

Template Method ◽

Semantic Relationships ◽

Semantic Spaces

The use of databases those containing semantic relationships between words is becoming increasingly widespread in order to make natural language processing work more effective. Instead of the word-bag approach, the suggested semantic spaces give the distances between words, but they do not express the relation types. In this study, it is shown how semantic spaces can be used to find the type of relationship and it is compared with the template method. According to the results obtained on a very large scale, while is_a and opposite are more successful for semantic spaces for relations, the approach of templates is more successful in the relation types at_location, made_of and non relational.

Download Full-text

Natural language description of images using hybrid recurrent neural network

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i4.pp2932-2940 ◽

2019 ◽

Vol 9 (4) ◽

pp. 2932

Author(s):

Md. Asifuzzaman Jishan ◽

Khan Raqib Mahmud ◽

Abul Kalam Al Azad

Keyword(s):

Neural Network ◽

Natural Language ◽

Recurrent Neural Network ◽

Short Term Memory ◽

Text Line ◽

Short Term ◽

Word Representation ◽

Benchmark Datasets ◽

Long Short Term Memory ◽

Language Description

We presented a learning model that generated natural language description of images. The model utilized the connections between natural language and visual data by produced text line based contents from a given image. Our Hybrid Recurrent Neural Network model is based on the intricacies of Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Bi-directional Recurrent Neural Network (BRNN) models. We conducted experiments on three benchmark datasets, e.g., Flickr8K, Flickr30K, and MS COCO. Our hybrid model utilized LSTM model to encode text line or sentences independent of the object location and BRNN for word representation, this reduced the computational complexities without compromising the accuracy of the descriptor. The model produced better accuracy in retrieving natural language based description on the dataset.

Download Full-text