Chronological Ordering Based on Context Overlap Detection

Mohamed H. Haggag; Bassma M. Othman

doi:10.4018/ijirr.2012100103

Chronological Ordering Based on Context Overlap Detection

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2012100103 ◽

2012 ◽

Vol 2 (4) ◽

pp. 31-44

Author(s):

Mohamed H. Haggag ◽

Bassma M. Othman

Keyword(s):

Language Processing ◽

Linguistic Knowledge ◽

Syntactic Analysis ◽

Text Generation ◽

Context Processing ◽

Domain Specific ◽

Word Level ◽

Sentence Level ◽

Proposed Model ◽

Chronological Sequence

Context processing plays an important role in different Natural Language Processing applications. Sentence ordering is one of critical tasks in text generation. Following the same order of sentences in the row sources of text is not necessarily to be applied for the resulted text. Accordingly, a need for chronological sentence ordering is of high importance in this regard. Some researches followed linguistic syntactic analysis and others used statistical approaches. This paper proposes a new model for sentence ordering based on sematic analysis. Word level semantics forms a seed to sentence level sematic relations. The model introduces a clustering technique based on sentences senses relatedness. Following to this, sentences are chronologically ordered through two main steps; overlap detection and chronological cause-effect rules. Overlap detection drills down into each cluster to step through its sentences in chronological sequence. Cause-effect rules forms the linguistic knowledge controlling sentences relations. Evaluation of the proposed algorithm showed the capability of the proposed model to process size free texts, non-domain specific and open to extend the cause-effect rules for specific ordering needs.

Download Full-text

Similarity of Computations Across Domains Does Not Imply Shared Implementation: The Case of Language Comprehension

Current Directions in Psychological Science ◽

10.1177/09637214211046955 ◽

2021 ◽

Vol 30 (6) ◽

pp. 526-534

Author(s):

Evelina Fedorenko ◽

Cory Shain

Keyword(s):

Language Processing ◽

Language Comprehension ◽

Fluid Intelligence ◽

Linguistic Knowledge ◽

Domain Specific ◽

Cognitive Operations ◽

High Level ◽

General Circuits ◽

The Mind ◽

Language Network

Understanding language requires applying cognitive operations (e.g., memory retrieval, prediction, structure building) that are relevant across many cognitive domains to specialized knowledge structures (e.g., a particular language’s lexicon and syntax). Are these computations carried out by domain-general circuits or by circuits that store domain-specific representations? Recent work has characterized the roles in language comprehension of the language network, which is selective for high-level language processing, and the multiple-demand (MD) network, which has been implicated in executive functions and linked to fluid intelligence and thus is a prime candidate for implementing computations that support information processing across domains. The language network responds robustly to diverse aspects of comprehension, but the MD network shows no sensitivity to linguistic variables. We therefore argue that the MD network does not play a core role in language comprehension and that past findings suggesting the contrary are likely due to methodological artifacts. Although future studies may reveal some aspects of language comprehension that require the MD network, evidence to date suggests that those will not be related to core linguistic processes such as lexical access or composition. The finding that the circuits that store linguistic knowledge carry out computations on those representations aligns with general arguments against the separation of memory and computation in the mind and brain.

Download Full-text

Textual Adversarial Attacking with Limited Queries

Electronics ◽

10.3390/electronics10212671 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2671

Author(s):

Yu Zhang ◽

Junan Yang ◽

Xiaoshuai Li ◽

Hui Liu ◽

Kun Shao

Keyword(s):

Language Processing ◽

Main Idea ◽

Local Model ◽

Small Perturbations ◽

Target Model ◽

Word Level ◽

Sentence Level ◽

Adversarial Examples ◽

Reducing Costs ◽

The Cost

Recent studies have shown that natural language processing (NLP) models are vulnerable to adversarial examples, which are maliciously designed by adding small perturbations to benign inputs that are imperceptible to the human eye, leading to false predictions by the target model. Compared to character- and sentence-level textual adversarial attacks, word-level attack can generate higher-quality adversarial examples, especially in a black-box setting. However, existing attack methods usually require a huge number of queries to successfully deceive the target model, which is costly in a real adversarial scenario. Hence, finding appropriate models is difficult. Therefore, we propose a novel attack method, the main idea of which is to fully utilize the adversarial examples generated by the local model and transfer part of the attack to the local model to complete ahead of time, thereby reducing costs related to attacking the target model. Extensive experiments conducted on three public benchmarks show that our attack method can not only improve the success rate but also reduce the cost, while outperforming the baselines by a significant margin.

Download Full-text

Hybrid Combination of Machine Translation with Part-of-Speech Analysis

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.416-417.1552 ◽

2013 ◽

Vol 416-417 ◽

pp. 1552-1557

Author(s):

Xiao Xu Hu

Keyword(s):

Machine Translation ◽

Linguistic Knowledge ◽

Main Method ◽

Advantages And Disadvantages ◽

Part Of Speech ◽

Word Level ◽

The Arts ◽

Sentence Level ◽

Hybrid Framework ◽

The Rich

Hypothesis combination is a main method to improve the performance of machine translation (MT) system. The state-of-the-arts strategies include sentence-level and word-level methods, which has its own advantages and disadvantages. And, the current strategies mainly depends on the statistical method with little guidance from the rich linguistic knowledge. This paper propose hybrid framework to combine the ability of the sentence-level and word-level methods. In word-level stage, the method select the well translated words according to its part-of-speech and translation ability of this part-of-speech of the MT system which generate this word. The experimental results with different MT systems proves the effectiveness of this approach.

Download Full-text

Automatic Word Spacing of Korean Using Syllable and Morpheme

Applied Sciences ◽

10.3390/app11020626 ◽

2021 ◽

Vol 11 (2) ◽

pp. 626

Author(s):

Jeong-Myeong Choi ◽

Jong-Dae Kim ◽

Chan-Young Park ◽

Yu-Seop Kim

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Sequence Information ◽

Morphological Pattern ◽

Word Level ◽

Proposed Model ◽

Correction Problem ◽

Long Short Term Memory ◽

N Gram ◽

Pattern Information

In Korean, spacing is very important to understand the readability and context of sentences. In addition, in the case of natural language processing for Korean, if a sentence with an incorrect spacing is used, the structure of the sentence is changed, which affects performance. In the previous study, spacing errors were corrected using n-gram based statistical methods and morphological analyzers, and recently many studies using deep learning have been conducted. In this study, we try to solve the spacing error correction problem using both the syllable-level and morpheme-level. The proposed model uses a structure that combines the convolutional neural network layer that can learn syllable and morphological pattern information in sentences and the bidirectional long short-term memory layer that can learn forward and backward sequence information. When evaluating the performance of the proposed model, the accuracy was evaluated at the syllable-level, and also precision, recall, and f1 score were evaluated at the word-level. As a result of the experiment, it was confirmed that performance was improved from the previous study.

Download Full-text

Textual Backdoor Defense via Poisoned Sample Recognition

Applied Sciences ◽

10.3390/app11219938 ◽

2021 ◽

Vol 11 (21) ◽

pp. 9938

Author(s):

Kun Shao ◽

Yu Zhang ◽

Junan Yang ◽

Hui Liu

Keyword(s):

Success Rate ◽

Language Processing ◽

Training Data ◽

Infection Model ◽

Search Range ◽

Word Level ◽

Sentence Level ◽

Preliminary Model ◽

Sample Recognition ◽

Better Than

Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attacks based on data poisoning in existing research is as high as 100%. In order to enhance the natural language processing model’s defense against backdoor attacks, we propose a textual backdoor defense method via poisoned sample recognition. Our method consists of two parts: the first step is to add a controlled noise layer after the model embedding layer, and to train a preliminary model with incomplete or no backdoor embedding, which reduces the effectiveness of poisoned samples. Then, we use the model to initially identify the poisoned samples in the training set so as to narrow the search range of the poisoned samples. The second step uses all the training data to train an infection model embedded in the backdoor, which is used to reclassify the samples selected in the first step, and finally identify the poisoned samples. Through detailed experiments, we have proved that our defense method can effectively defend against a variety of backdoor attacks (character-level, word-level and sentence-level backdoor attacks), and the experimental effect is better than the baseline method. For the BERT model trained by the IMDB dataset, this method can even reduce the success rate of word-level backdoor attacks to 0%.

Download Full-text

Deep Learning Based on Hierarchical Self-Attention for Finance Distress Prediction Incorporating Text

Computational Intelligence and Neuroscience ◽

10.1155/2021/1165296 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Sumei Ruan ◽

Xusheng Sun ◽

Ruanxingchen Yao ◽

Wei Li

Keyword(s):

Financial Distress ◽

Financial Risk ◽

Large Scale ◽

Early Stage ◽

Fine Tuning ◽

Original Text ◽

Word Level ◽

Sentence Level ◽

Proposed Model ◽

Red Flag

To detect comprehensive clues and provide more accurate forecasting in the early stage of financial distress, in addition to financial indicators, digitalization of lengthy but indispensable textual disclosure, such as Management Discussion and Analysis (MD&A), has been emphasized by researchers. However, most studies divide the long text into words and count words to treat the text as word count vectors, bringing massive invalid information but ignoring meaningful contexts. Aiming to efficiently represent the text of large size, an end-to-end neural networks model based on hierarchical self-attention is proposed in this study after the state-of-the-art pretrained model is introduced for text embedding including contexts. The proposed model has two notable characteristics. First, the hierarchical self-attention only affords the essential content with high weights in word-level and sentence-level and automatically neglects lots of information that has no business with risk prediction, which is suitable for extracting effective parts of the large-scale text. Second, after fine-tuning, the word embedding adapts the specific contexts of samples and conveys the original text expression more accurately without excessive manual operations. Experiments confirm that the addition of text improves the accuracy of financial distress forecasting and the proposed model outperforms benchmark models better at AUC and F2-score. For visualization, the elements in the weight matrix of hierarchical self-attention act as scalers to estimate the importance of each word and sentence. In this way, the “red-flag” statement that implies financial risk is figured out and highlighted in the original text, providing effective references for decision-makers.

Download Full-text

Dual-View Variational Autoencoders for Semi-Supervised Text Matching

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/737 ◽

2019 ◽

Author(s):

Zhongbin Xie ◽

Shuai Ma

Keyword(s):

Question Answering ◽

Fundamental Problem ◽

Sentence Pair ◽

Community Question Answering ◽

Word Level ◽

Sentence Level ◽

Proposed Model ◽

Variational Autoencoder ◽

Matching Models ◽

Text Matching

Semantically matching two text sequences (usually two sentences) is a fundamental problem in NLP. Most previous methods either encode each of the two sentences into a vector representation (sentence-level embedding) or leverage word-level interaction features between the two sentences. In this study, we propose to take the sentence-level embedding features and the word-level interaction features as two distinct views of a sentence pair, and unify them with a framework of Variational Autoencoders such that the sentence pair is matched in a semi-supervised manner. The proposed model is referred to as Dual-View Variational AutoEncoder (DV-VAE), where the optimization of the variational lower bound can be interpreted as an implicit Co-Training mechanism for two matching models over distinct views. Experiments on SNLI, Quora and a Community Question Answering dataset demonstrate the superiority of our DV-VAE over several strong semi-supervised and supervised text matching models.

Download Full-text

A longitudinal neuroimaging dataset on language processing in children ages 5, 7, and 9 years old.

10.31234/osf.io/tpndf ◽

2021 ◽

Author(s):

Jin Wang ◽

Marisa N. Lytle ◽

Yael Weiss ◽

Brianna L. Yamasaki ◽

James R. Booth

Keyword(s):

Language Processing ◽

Semantic Processing ◽

Longitudinal Design ◽

Word Level ◽

Sentence Level ◽

Multiple Imaging ◽

Educational Assessments ◽

Magnetic Resonance Imaging Mri ◽

Changes Over Time ◽

Brain Behavior

This dataset examines language development with a longitudinal design and includes diffusion- and T1-weighted structural magnetic resonance imaging (MRI), task-based functional MRI (fMRI), and a battery of psycho-educational assessments and parental questionnaires. We collected data from 5.5-6.5-year-old children (ses-5) and followed them up when they were 7-8 years old (ses-7) and then again at 8.5-10 years old (ses-9). To increase the sample size at the older time points, another cohort of 7-8-year-old children (ses-7) were recruited and followed up when they were 8.5-10 years old (ses-9). In total, 322 children who completed at least one structural and functional scan were included. Children performed four fMRI tasks consisting of two word-level tasks examining phonological and semantic processing and two sentence-level tasks investigating semantic and syntactic processing. The MRI data is valuable for examining changes over time in interactive specialization due to the use of multiple imaging modalities and tasks in this longitudinal design. In addition, the extensive psycho-educational assessments and questionnaires provide opportunities to explore brain-behavior and brain-environment associations.

Download Full-text

Fronto-temporal brain systems supporting spoken language comprehension

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2007.2158 ◽

2007 ◽

Vol 363 (1493) ◽

pp. 1037-1054 ◽

Cited By ~ 109

Author(s):

Lorraine K Tyler ◽

William Marslen-Wilson

Keyword(s):

Functional Connectivity ◽

Language Processing ◽

Language Comprehension ◽

Critical Role ◽

Spoken Language ◽

Semantic Interpretation ◽

Syntactic Analysis ◽

Connectivity Analysis ◽

Sentence Level ◽

Spoken Language Comprehension

The research described here combines psycholinguistically well-motivated questions about different aspects of human language comprehension with behavioural and neuroimaging studies of normal performance, incorporating both subtractive analysis techniques and functional connectivity methods, and applying these tasks and techniques to the analysis of the functional and neural properties of brain-damaged patients with selective linguistic deficits in the relevant domains. The results of these investigations point to a set of partially dissociable sub-systems supporting three major aspects of spoken language comprehension, involving regular inflectional morphology, sentence-level syntactic analysis and sentence-level semantic interpretation. Differential patterns of fronto-temporal connectivity for these three domains confirm that the core aspects of language processing are carried out in a fronto-temporo-parietal language system which is modulated in different ways as a function of different linguistic processing requirements. No one region or sub-region holds the key to a specific language function; each requires the coordination of activity within a number of different regions. Functional connectivity analysis plays the critical role of indicating the regions which directly participate in a given sub-process, by virtue of their joint time-dependent activity. By revealing these codependencies, connectivity analysis sharpens the pattern of structure–function relations underlying specific aspects of language performance.

Download Full-text

nLORE: A Linguistically Rich Deep-Learning System for Locative-Reference Extraction in Tweets

Intelligent Environments 2021 - Ambient Intelligence and Smart Environments ◽

10.3233/aise210103 ◽

2021 ◽

Author(s):

Nicolás José Fernández-Martínez ◽

Carlos Periñán-Pascual

Keyword(s):

Deep Learning ◽

Language Processing ◽

Real Life ◽

General Purpose ◽

Learning System ◽

Geospatial Data ◽

Linguistic Knowledge ◽

Fine Grained ◽

Domain Specific ◽

Car Accidents

Location-based systems require rich geospatial data in emergency and crisis-related situations (e.g. earthquakes, floods, terrorist attacks, car accidents or pandemics) for the geolocation of not only a given incident but also the affected places and people in need of immediate help, which could potentially save lives and prevent further damage to urban or environmental areas. Given the sparsity of geotagged tweets, geospatial data must be obtained from the locative references mentioned in textual data such as tweets. In this context, we introduce nLORE (neural LOcative Reference Extractor), a deep-learning system that serves to detect locative references in English tweets by making use of the linguistic knowledge provided by LORE. nLORE, which captures fine-grained complex locative references of any type, outperforms not only LORE, but also well-known general-purpose or domain-specific off-the-shelf entity-recognizer systems, both qualitatively and quantitatively. However, LORE shows much better runtime efficiency, which is especially important in emergency-based and crisis-related scenarios that demand quick intervention to send first responders to affected areas and people. This highlights the often undervalued yet very important role of rule-based models in natural language processing for real-life and real-time scenarios.

Download Full-text