Text Normalization Using Encoder–Decoder Networks Based on the Causal Feature Extractor

The encoder–decoder architecture is a well-established, effective and widely used approach in many tasks of natural language processing (NLP), among other domains. It consists of two closely-collaborating components: An encoder that transforms the input into an intermediate form, and a decoder producing the output. This paper proposes a new method for the encoder, named Causal Feature Extractor (CFE), based on three main ideas: Causal convolutions, dilatations and bidirectionality. We apply this method to text normalization, which is a ubiquitous problem that appears as the first step of many text-to-speech (TTS) systems. Given a text with symbols, the problem consists in writing the text exactly as it should be read by the TTS system. We make use of an attention-based encoder–decoder architecture using a fine-grained character-level approach rather than the usual word-level one. The proposed CFE is compared to other common encoders, such as convolutional neural networks (CNN) and long-short term memories (LSTM). Experimental results show the feasibility of CFE, achieving better results in terms of accuracy, number of parameters, convergence time, and use of an attention mechanism based on attention matrices. The obtained accuracy ranges from 83.5% to 96.8% correctly normalized sentences, depending on the dataset. Moreover, the proposed method is generic and can be applied to different types of input such as text, audio and images.

Download Full-text

Preliminary Results on Different Text Processing Tasks Using Encoder-Decoder Networks and the Causal Feature Extractor

Applied Sciences ◽

10.3390/app10175772 ◽

2020 ◽

Vol 10 (17) ◽

pp. 5772

Author(s):

Adrián Javaloy ◽

Ginés García-Mateos

Keyword(s):

Language Processing ◽

Network Architecture ◽

State Of The Art ◽

Text Processing ◽

Short Term ◽

Training Time ◽

Preliminary Results ◽

Feature Extractor ◽

Decoder Architecture ◽

Different Types

Deep learning methods are gaining popularity in different application domains, and especially in natural language processing. It is commonly believed that using a large enough dataset and an adequate network architecture, almost any processing problem can be solved. A frequent and widely used typology is the encoder-decoder architecture, where the input data is transformed into an intermediate code by means of an encoder, and then a decoder takes this code to produce its output. Different types of networks can be used in the encoder and the decoder, depending on the problem of interest, such as convolutional neural networks (CNN) or long-short term memories (LSTM). This paper uses for the encoder a method recently proposed, called Causal Feature Extractor (CFE). It is based on causal convolutions (i.e., convolutions that depend only on one direction of the input), dilatation (i.e., increasing the aperture size of the convolutions) and bidirectionality (i.e., independent networks in both directions). Some preliminary results are presented on three different tasks and compared with state-of-the-art methods: bilingual translation, LaTeX decompilation and audio transcription. The proposed method achieves promising results, showing its ubiquity to work with text, audio and images. Moreover, it has a shorter training time, requiring less time per iteration, and a good use of the attention mechanisms based on attention matrices.

Download Full-text

Explicit Interaction Model towards Text Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016359 ◽

2019 ◽

Vol 33 ◽

pp. 6359-6366 ◽

Cited By ~ 3

Author(s):

Cunxiao Du ◽

Zhaozheng Chen ◽

Fuli Feng ◽

Lei Zhu ◽

Tian Gan ◽

...

Keyword(s):

Language Processing ◽

Text Classification ◽

Deep Neural Networks ◽

Interaction Mechanism ◽

Interaction Model ◽

Classification Task ◽

Fine Grained ◽

Word Level ◽

Benchmark Datasets ◽

Classification Tasks

Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the text-level representations. To address this problem, we introduce the interaction mechanism to incorporate word-level matching signals into the text classification task. In particular, we design a novel framework, EXplicit interAction Model (dubbed as EXAM), equipped with the interaction mechanism. We justified the proposed approach on several benchmark datasets including both multilabel and multi-class text classification tasks. Extensive experimental results demonstrate the superiority of the proposed method. As a byproduct, we have released the codes and parameter settings to facilitate other researches.

Download Full-text

The Function of Words: Distinct Neural Correlates for Words Denoting Differently Manipulable Objects

Journal of Cognitive Neuroscience ◽

10.1162/jocn.2009.21310 ◽

2010 ◽

Vol 22 (8) ◽

pp. 1844-1851 ◽

Cited By ~ 64

Author(s):

Shirley-Ann Rueschemeyer ◽

Daan van Rooij ◽

Oliver Lindemann ◽

Roel M. Willems ◽

Harold Bekkering

Keyword(s):

Language Processing ◽

Tool Use ◽

Perception And Action ◽

Neural Correlates ◽

Conceptual Representation ◽

Brain Areas ◽

Fine Grained ◽

Different Types ◽

The Subject ◽

The Brain

Recent research indicates that language processing relies on brain areas dedicated to perception and action. For example, processing words denoting manipulable objects has been shown to activate a fronto-parietal network involved in actual tool use. This is suggested to reflect the knowledge the subject has about how objects are moved and used. However, information about how to use an object may be much more central to the conceptual representation of an object than information about how to move an object. Therefore, there may be much more fine-grained distinctions between objects on the neural level, especially related to the usability of manipulable objects. In the current study, we investigated whether a distinction can be made between words denoting (1) objects that can be picked up to move (e.g., volumetrically manipulable objects: bookend, clock) and (2) objects that must be picked up to use (e.g., functionally manipulable objects: cup, pen). The results show that functionally manipulable words elicit greater levels of activation in the fronto-parietal sensorimotor areas than volumetrically manipulable words. This suggests that indeed a distinction can be made between different types of manipulable objects. Specifically, how an object is used functionally rather than whether an object can be displaced with the hand is reflected in semantic representations in the brain.

Download Full-text

Modified artificial bee colony algorithm for solving mixed interval-valued fuzzy shortest path problem

Complex & Intelligent Systems ◽

10.1007/s40747-021-00278-0 ◽

2021 ◽

Author(s):

Ali Ebrahimnejad ◽

Mohammad Enayattabr ◽

Homayun Motameni ◽

Harish Garg

Keyword(s):

Shortest Path ◽

Artificial Bee Colony ◽

Fuzzy Numbers ◽

Pso Algorithm ◽

Convergence Time ◽

Bee Colony ◽

Different Types ◽

Implementation Time ◽

Fuzzy Network ◽

Interval Valued

AbstractIn recent years, numerous researchers examined and analyzed several different types of uncertainty in shortest path (SP) problems. However, those SP problems in which the costs of arcs are expressed in terms of mixed interval-valued fuzzy numbers are less addressed. Here, for solving such uncertain SP problems, first a new procedure is extended to approximate the summation of mixed interval-valued fuzzy numbers using alpha cuts. Then, an extended distance function is introduced for comparing the path weights. Finally, we intend to use a modified artificial bee colony (MABC) algorithm to find the interval-valued membership function of SP in such mixed interval-valued fuzzy network. The proposed algorithm is illustrated via two applications of SP problems in wireless sensor networks and then the results are compared with those derived from genetic and particle swarm optimization (PSO) algorithms, based on three indexes convergence iteration, convergence time and run time. The obtained results confirm that the MABC algorithm has less convergence iteration, convergence time and implementation time compared to GA and PSO algorithm.

Download Full-text

Perceiving Residents’ Festival Activities Based on Social Media Data: A Case Study in Beijing, China

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070474 ◽

2021 ◽

Vol 10 (7) ◽

pp. 474

Author(s):

Bingqing Wang ◽

Bin Meng ◽

Juan Wang ◽

Siyu Chen ◽

Jian Liu

Keyword(s):

Social Media ◽

Language Processing ◽

Topic Model ◽

Central Area ◽

Classification Model ◽

Social Media Data ◽

Ring Road ◽

Different Types ◽

Spatial Differences ◽

Media Data

Social media data contains real-time expressed information, including text and geographical location. As a new data source for crowd behavior research in the era of big data, it can reflect some aspects of the behavior of residents. In this study, a text classification model based on the BERT and Transformers framework was constructed, which was used to classify and extract more than 210,000 residents’ festival activities based on the 1.13 million Sina Weibo (Chinese “Twitter”) data collected from Beijing in 2019 data. On this basis, word frequency statistics, part-of-speech analysis, topic model, sentiment analysis and other methods were used to perceive different types of festival activities and quantitatively analyze the spatial differences of different types of festivals. The results show that traditional culture significantly influences residents’ festivals, reflecting residents’ motivation to participate in festivals and how residents participate in festivals and express their emotions. There are apparent spatial differences among residents in participating in festival activities. The main festival activities are distributed in the central area within the Fifth Ring Road in Beijing. In contrast, expressing feelings during the festival is mainly distributed outside the Fifth Ring Road in Beijing. The research integrates natural language processing technology, topic model analysis, spatial statistical analysis, and other technologies. It can also broaden the application field of social media data, especially text data, which provides a new research paradigm for studying residents’ festival activities and adds residents’ perception of the festival. The research results provide a basis for the design and management of the Chinese festival system.

Download Full-text

Space Precession Target Classification Based on Radar High-Resolution Range Profiles

International Journal of Antennas and Propagation ◽

10.1155/2019/8151620 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9

Author(s):

Yizhe Wang ◽

Cunqian Feng ◽

Yongshun Zhang ◽

Sisan He

Keyword(s):

Parameter Extraction ◽

Classification Performance ◽

Support Vector ◽

Electromagnetic Data ◽

Feature Extractor ◽

Different Types ◽

Radar Echo ◽

High Level ◽

Cone Target

Precession is a common micromotion form of space targets, introducing additional micro-Doppler (m-D) modulation into the radar echo. Effective classification of space targets is of great significance for further micromotion parameter extraction and identification. Feature extraction is a key step during the classification process, largely influencing the final classification performance. This paper presents two methods for classifying different types of space precession targets from the HRRPs. We first establish the precession model of space targets and analyze the scattering characteristics and then compute electromagnetic data of the cone target, cone-cylinder target, and cone-cylinder-flare target. Experimental results demonstrate that the support vector machine (SVM) using histograms of oriented gradient (HOG) features achieves a good result, whereas the deep convolutional neural network (DCNN) obtains a higher classification accuracy. DCNN combines the feature extractor and the classifier itself to automatically mine the high-level signatures of HRRPs through a training process. Besides, the efficiency of the two classification processes are compared using the same dataset.

Download Full-text

Analysis Methods in Neural Language Processing: A Survey

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00254 ◽

2019 ◽

Vol 7 ◽

pp. 49-72 ◽

Cited By ~ 26

Author(s):

Yonatan Belinkov ◽

James Glass

Keyword(s):

Language Processing ◽

Network Models ◽

Research Trends ◽

Neural Network Models ◽

Fine Grained ◽

Survey Paper ◽

Review Analysis ◽

Analysis Methods ◽

New Models ◽

Future Work

The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature-rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more fine-grained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.

Download Full-text

Prosody in Automatic Speech Processing

The Oxford Handbook of Language Prosody ◽

10.1093/oxfordhb/9780198832232.013.42 ◽

2020 ◽

pp. 632-645

Author(s):

Anton Batliner ◽

Bernd Möbius

Keyword(s):

Speech Processing ◽

High Performance ◽

Early Years ◽

Short History ◽

Word Level ◽

Dialogue Acts ◽

Semantics And Pragmatics ◽

Different Types ◽

History Of

Automatic speech processing (ASP) is understood as covering word recognition, the processing of higher linguistic components (syntax, semantics, and pragmatics), and the processing of computational paralinguistics (CP), which deals with speaker states and traits. This chapter attempts to track the role of prosody in ASP from the word level up to CP. A short history of the field from 1980 to 2020 distinguishes the early years (until 2000)—when the prosodic contribution to the modelling of linguistic phenomena, such as accents, boundaries, syntax, semantics, and dialogue acts, was the focus—from the later years, when the focus shifted to paralinguistics; prosody ceased to be visible. Different types of predictor variables are addressed, among them high-performance power features as well as leverage features, which can also be employed in teaching and therapy.

Download Full-text

Language beyond the language system: dorsal visuospatial pathways support processing of demonstratives and spatial language during naturalistic fast fMRI

10.1101/651257 ◽

2019 ◽

Author(s):

Roberta Rocca ◽

Kenny R. Coventry ◽

Kristian Tylén ◽

Marlene Staib ◽

Torben E. Lund ◽

...

Keyword(s):

Language Processing ◽

Information Integration ◽

Visual Cues ◽

Reference Frames ◽

Finite Impulse Response ◽

Angular Gyrus ◽

Attentional Orienting ◽

Word Level ◽

Pragmatic Inference ◽

The Right

AbstractSpatial demonstratives are powerful linguistic tools used to establish joint attention. Identifying the meaning of semantically underspecified expressions like “this one” hinges on the integration of linguistic and visual cues, attentional orienting and pragmatic inference. This synergy between language and extralinguistic cognition is pivotal to language comprehension in general, but especially prominent in demonstratives.In this study, we aimed to elucidate which neural architectures enable this intertwining between language and extralinguistic cognition using a naturalistic fMRI paradigm. In our experiment, 28 participants listened to a specially crafted dialogical narrative with a controlled number of spatial demonstratives. A fast multiband-EPI acquisition sequence (TR = 388ms) combined with finite impulse response (FIR) modelling of the hemodynamic response was used to capture signal changes at word-level resolution.We found that spatial demonstratives bilaterally engage a network of parietal areas, including the supramarginal gyrus, the angular gyrus, and precuneus, implicated in information integration and visuospatial processing. Moreover, demonstratives recruit frontal regions, including the right FEF, implicated in attentional orienting and reference frames shifts. Finally, using multivariate similarity analyses, we provide evidence for a general involvement of the dorsal (“where”) stream in the processing of spatial expressions, as opposed to ventral pathways encoding object semantics.Overall, our results suggest that language processing relies on a distributed architecture, recruiting neural resources for perception, attention, and extra-linguistic aspects of cognition in a dynamic and context-dependent fashion.

Download Full-text

Fine-Grained Named Entity Typing over Distantly Supervised Data Based on Refined Representations

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6234 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7391-7398

Author(s):

Muhammad Asif Ali ◽

Yifang Sun ◽

Bing Li ◽

Wei Wang

Keyword(s):

Language Processing ◽

Training Data ◽

Specific Context ◽

Fine Grained ◽

Named Entity ◽

Distant Supervision ◽

Proposed Model ◽

Wide Range ◽

Relative Score ◽

Noisy Labels

Fine-Grained Named Entity Typing (FG-NET) is a key component in Natural Language Processing (NLP). It aims at classifying an entity mention into a wide range of entity types. Due to a large number of entity types, distant supervision is used to collect training data for this task, which noisily assigns type labels to entity mentions irrespective of the context. In order to alleviate the noisy labels, existing approaches on FG-NET analyze the entity mentions entirely independent of each other and assign type labels solely based on mention's sentence-specific context. This is inadequate for highly overlapping and/or noisy type labels as it hinders information passing across sentence boundaries. For this, we propose an edge-weighted attentive graph convolution network that refines the noisy mention representations by attending over corpus-level contextual clues prior to the end classification. Experimental evaluation shows that the proposed model outperforms the existing research by a relative score of upto 10.2% and 8.3% for macro-f1 and micro-f1 respectively.

Download Full-text