Person Retrieval in Surveillance Using Textual Query: A Review

10.36227/techrxiv.14518809 ◽

2021 ◽

Author(s):

Hiren Galiyawala ◽

Mehul S Raval

Keyword(s):

Natural Language ◽

Language Processing ◽

Hair Colour ◽

Soft Biometrics ◽

Brown Hair ◽

Objective Performance ◽

Recent Advancement ◽

Benchmark Datasets ◽

Prime Objective ◽

And Performance

Recent advancement of research in biometrics, computer vision, and natural language processing has discovered opportunities for person retrieval from surveillance videos using textual query. The prime objective of a surveillance system is to locate a person using a description, e.g., a short woman with a pink t-shirt and white skirt carrying a black purse. She has brown hair. Such a description contains attributes like gender, height, type of clothing, colour of clothing, hair colour, and accessories. Such attributes are formally known as soft biometrics. They help bridge the semantic gap between a human description and a machine as a textual query contains the person’s soft biometric attributes. It is also not feasible to manually search through huge volumes of surveillance footage to retrieve a specific person. Hence, automatic person retrieval using vision and language-based algorithms is becoming popular. In comparison to other state-of-the-art reviews, the contribution of the paper is as follows: 1. Recommends most discriminative soft biometrics for specific challenging conditions. 2. Integrates benchmark datasets and retrieval methods for objective performance evaluation. 3. A complete snapshot of techniques based on features, classifiers, number of soft biometric attributes, type of the deep neural networks, and performance measures. 4. The comprehensive coverage of person retrieval from handcrafted features based methods to end-to-end approaches based on natural language description.

Download Full-text

Joint Multimodal Embedding and Backtracking Search in Vision-and-Language Navigation

Sensors ◽

10.3390/s21031012 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1012

Author(s):

Jisu Hwang ◽

Incheol Kim

Keyword(s):

Natural Language ◽

Language Processing ◽

Input Data ◽

Language Instruction ◽

Scoring Method ◽

Processing Technologies ◽

Backtracking Search ◽

Panoramic Images ◽

Benchmark Datasets ◽

Vision And Language

Due to the development of computer vision and natural language processing technologies in recent years, there has been a growing interest in multimodal intelligent tasks that require the ability to concurrently understand various forms of input data such as images and text. Vision-and-language navigation (VLN) require the alignment and grounding of multimodal input data to enable real-time perception of the task status on panoramic images and natural language instruction. This study proposes a novel deep neural network model (JMEBS), with joint multimodal embedding and backtracking search for VLN tasks. The proposed JMEBS model uses a transformer-based joint multimodal embedding module. JMEBS uses both multimodal context and temporal context. It also employs backtracking-enabled greedy local search (BGLS), a novel algorithm with a backtracking feature designed to improve the task success rate and optimize the navigation path, based on the local and global scores related to candidate actions. A novel global scoring method is also used for performance improvement by comparing the partial trajectories searched thus far with a plurality of natural language instructions. The performance of the proposed model on various operations was then experimentally demonstrated and compared with other models using the Matterport3D Simulator and room-to-room (R2R) benchmark datasets.

Download Full-text

LIS4: Lesk Inspired Sense Specific Semantic Similarity using WordNet

Journal of Information & Knowledge Management ◽

10.1142/s0219649221500064 ◽

2021 ◽

pp. 2150006

Author(s):

Saravanakumar Kandasamy ◽

Aswani Kumar Cherukuri

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Gold Standard ◽

Question Answering ◽

Knowledge Based ◽

Benchmark Datasets ◽

Processing Information

Semantic similarity quantification between concepts is one of the inevitable parts in domains like Natural Language Processing, Information Retrieval, Question Answering, etc. to understand the text and their relationships better. Last few decades, many measures have been proposed by incorporating various corpus-based and knowledge-based resources. WordNet and Wikipedia are two of the Knowledge-based resources. The contribution of WordNet in the above said domain is enormous due to its richness in defining a word and all of its relationship with others. In this paper, we proposed an approach to quantify the similarity between concepts that exploits the synsets and the gloss definitions of different concepts using WordNet. Our method considers the gloss definitions, contextual words that are helping in defining a word, synsets of contextual word and the confidence of occurrence of a word in other word’s definition for calculating the similarity. The evaluation based on different gold standard benchmark datasets shows the efficiency of our system in comparison with other existing taxonomical and definitional measures.

Download Full-text

Leveraging Natural Language Processing Applications Using Machine Learning

Handbook of Research on Emerging Trends and Applications of Machine Learning - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-9643-1.ch016 ◽

2020 ◽

pp. 338-360

Author(s):

Janjanam Prabhudas ◽

C. H. Pradeep Reddy

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Summarization ◽

Feature Representation ◽

Learning Models ◽

Primary Focus ◽

And Performance

The enormous increase of information along with the computational abilities of machines created innovative applications in natural language processing by invoking machine learning models. This chapter will project the trends of natural language processing by employing machine learning and its models in the context of text summarization. This chapter is organized to make the researcher understand technical perspectives regarding feature representation and their models to consider before applying on language-oriented tasks. Further, the present chapter revises the details of primary models of deep learning, its applications, and performance in the context of language processing. The primary focus of this chapter is to illustrate the technical research findings and gaps of text summarization based on deep learning along with state-of-the-art deep learning models for TS.

Download Full-text

Middle-Level Attribute-Based Language Retouching for Image Caption Generation

Applied Sciences ◽

10.3390/app8101850 ◽

2018 ◽

Vol 8 (10) ◽

pp. 1850 ◽

Cited By ~ 1

Author(s):

Zhibin Guan ◽

Kang Liu ◽

Yan Ma ◽

Xu Qian ◽

Tongkai Ji

Keyword(s):

Natural Language ◽

Language Processing ◽

Middle Level ◽

Generation Model ◽

Image Description ◽

Image Captioning ◽

Benchmark Datasets ◽

Intermediate Image ◽

Image Caption Generation ◽

Image Caption

Image caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing (NLP). The existing image captioning methods are mainly focused on generating the final image caption directly, which may lose significant identification information of objects contained in the raw image. Therefore, we propose a new middle-level attribute-based language retouching (MLALR) method to solve this problem. Our proposed MLALR method uses the middle-level attributes predicted from the object regions to retouch the intermediate image description, which is generated by our language generation model. The advantage of our MLALR method is that it can correct descriptive errors in the intermediate image description and make the final image caption more accurate. Moreover, evaluation using benchmark datasets—MSCOCO, Flickr8K, and Flickr30K—validated the impressive performance of our MLALR method with evaluation metrics—BLEU, METEOR, ROUGE-L, CIDEr, and SPICE.

Download Full-text

Text Analysis of Assembly Work Instructions

Volume 1B: 35th Computers and Information in Engineering Conference ◽

10.1115/detc2015-47246 ◽

2015 ◽

Cited By ~ 1

Author(s):

Rahul Sharan Renu ◽

Gregory Mocko

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Lead Times ◽

Parts Of Speech ◽

Assembly Work ◽

And Performance ◽

Quality Of Products ◽

Speech Tagging

The objective of this research is to investigate the requirements and performance of parts-of-speech tagging of assembly work instructions. Natural Language Processing of assembly work instructions is required to perform data mining with the objective of knowledge reuse. Assembly work instructions are key process engineering elements that allow for predictable assembly quality of products and predictable assembly lead times. Authoring of assembly work instructions is a subjective process. It has been observed that most assembly work instructions are not grammatically complete sentences. It is hypothesized that this can lead to false parts-of-speech tagging (by Natural Language Processing tools). To test this hypothesis, two parts-of-speech taggers are used to tag 500 assembly work instructions (obtained from the automotive industry). The first parts-of-speech tagger is obtained from Natural Language Processing Toolkit (nltk.org) and the second parts-of-speech tagger is obtained from Stanford Natural Language Processing Group (nlp.stanford.edu). For each of these taggers, two experiments are conducted. In the first experiment, the assembly work instructions are input to the each tagger in raw form. In the second experiment, the assembly work instructions are preprocessed to make them grammatically complete, and then input to the tagger. It is found that the Stanford Natural Language Processing tagger with the preprocessed assembly work instructions produced the least number of false parts-of-speech tags.

Download Full-text

Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings

10.31219/osf.io/j76y3 ◽

2018 ◽

Cited By ~ 1

Author(s):

Debanjan Mahata ◽

John Kuriakose ◽

Rajiv Ratn Shah ◽

Roger Zimmermann

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Keyphrase Extraction ◽

Text Documents ◽

Benchmark Datasets

Keyphrase extraction is a fundamental task in natural language processing that facilitates mapping of documents to a set of representative phrases. In this paper, we present an unsupervised technique (Key2Vec) that leverages phrase embeddings for ranking keyphrases extracted from scientific articles. Specifically, we propose an effective way of processing text documents for training multi-word phrase embeddings that are used for thematic representation of scientific articles and ranking of keyphrases extracted from them using theme-weighted PageRank. Evaluations are performed on benchmark datasets producing state-of-the-art results.

Download Full-text

Attention as Relation: Learning Supervised Multi-head Self-Attention for Relation Extraction

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/524 ◽

2020 ◽

Author(s):

Jie Liu ◽

Shaowei Chen ◽

Bingquan Wang ◽

Jiaxin Zhang ◽

Na Li ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Relation Extraction ◽

Attention Mechanism ◽

Entity Extraction ◽

Relation Type ◽

Benchmark Datasets ◽

Relation Learning

Joint entity and relation extraction is critical for many natural language processing (NLP) tasks, which has attracted increasing research interest. However, it is still faced with the challenges of identifying the overlapping relation triplets along with the entire entity boundary and detecting the multi-type relations. In this paper, we propose an attention-based joint model, which mainly contains an entity extraction module and a relation detection module, to address the challenges. The key of our model is devising a supervised multi-head self-attention mechanism as the relation detection module to learn the token-level correlation for each relation type separately. With the attention mechanism, our model can effectively identify overlapping relations and flexibly predict the relation type with its corresponding intensity. To verify the effectiveness of our model, we conduct comprehensive experiments on two benchmark datasets. The experimental results demonstrate that our model achieves state-of-the-art performances.

Download Full-text

Drop Redundant, Shrink Irrelevant: Selective Knowledge Injection for Language Pretraining

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/552 ◽

2021 ◽

Author(s):

Ningyu Zhang ◽

Shumin Deng ◽

Xu Cheng ◽

Xi Chen ◽

Yichi Zhang ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Prior Knowledge ◽

Language Processing ◽

Empirical Investigation ◽

State Of The Art ◽

Knowledge Bases ◽

Experimental Results ◽

Benchmark Datasets ◽

Injection Methods

Previous research has demonstrated the power of leveraging prior knowledge to improve the performance of deep models in natural language processing. However, traditional methods neglect the fact that redundant and irrelevant knowledge exists in external knowledge bases. In this study, we launched an in-depth empirical investigation into downstream tasks and found that knowledge-enhanced approaches do not always exhibit satisfactory improvements. To this end, we investigate the fundamental reasons for ineffective knowledge infusion and present selective injection for language pretraining, which constitutes a model-agnostic method and is readily pluggable into previous approaches. Experimental results on benchmark datasets demonstrate that our approach can enhance state-of-the-art knowledge injection methods.

Download Full-text

Extracting Findings from Narrative Reports: Software Transferability and Sources of Physician Disagreement

Methods of Information in Medicine ◽

10.1055/s-0038-1634499 ◽

1998 ◽

Vol 37 (01) ◽

pp. 01-07 ◽

Cited By ~ 31

Author(s):

G.J. Kuperman ◽

C. Friedman ◽

G. Hripcsak

Keyword(s):

Health Care ◽

Natural Language ◽

Language Processing ◽

Medical Center ◽

General Purpose ◽

Clinical Use ◽

Local Behavior ◽

Care Community ◽

And Performance ◽

Clinical Conditions

Abstract:While natural language processing systems are beginning to see clinical use, it remains unclear whether they can be disseminated effectively through the health care community. MedLEE, a general-purpose natural language processor developed for Columbia-Presbyterian Medical Center, was compared to physicians' ability to detect seven clinical conditions in 200 Brigham and Women's Hospital chest radiograph reports. Using the system on the new institution's reports resulted in a small but measurable drop in performance (it was distinguishable from physicians at p = 0.011). By making adjustments to the interpretation of the processor's coded output (without changing the processor itself), local behavior was better accommodated, and performance improved so that it was indistinguishable from the physicians. Pairs of physicians disagreed on at least one condition for 22% of reports; the source of disagreement appeared to be interpretation of findings, gauging likelihood and degree of disease, and coding errors.

Download Full-text