scholarly journals Language Modeling for Morphologically Rich Languages: Character-Aware Modeling for Word-Level Prediction

2018 ◽  
Vol 6 ◽  
pp. 451-465 ◽  
Author(s):  
Daniela Gerz ◽  
Ivan Vulić ◽  
Edoardo Ponti ◽  
Jason Naradowsky ◽  
Roi Reichart ◽  
...  

Neural architectures are prominent in the construction of language models (LMs). However, word-level prediction is typically agnostic of subword-level information (characters and character sequences) and operates over a closed vocabulary, consisting of a limited word set. Indeed, while subword-aware models boost performance across a variety of NLP tasks, previous work did not evaluate the ability of these models to assist next-word prediction in language modeling tasks. Such subword-level informed models should be particularly effective for morphologically-rich languages (MRLs) that exhibit high type-to-token ratios. In this work, we present a large-scale LM study on 50 typologically diverse languages covering a wide variety of morphological systems, and offer new LM benchmarks to the community, while considering subword-level information. The main technical contribution of our work is a novel method for injecting subword-level information into semantic word vectors, integrated into the neural language modeling training, to facilitate word-level prediction. We conduct experiments in the LM setting where the number of infrequent words is large, and demonstrate strong perplexity gains across our 50 languages, especially for morphologically-rich languages. Our code and data sets are publicly available.

2021 ◽  
Vol 15 ◽  
Author(s):  
Jianwei Zhang ◽  
Xubin Zhang ◽  
Lei Lv ◽  
Yining Di ◽  
Wei Chen

Background: Learning discriminative representation from large-scale data sets has made a breakthrough in decades. However, it is still a thorny problem to generate representative embedding from limited examples, for example, a class containing only one image. Recently, deep learning-based Few-Shot Learning (FSL) has been proposed. It tackles this problem by leveraging prior knowledge in various ways. Objective: In this work, we review recent advances of FSL from the perspective of high-dimensional representation learning. The results of the analysis can provide insights and directions for future work. Methods: We first present the definition of general FSL. Then we propose a general framework for the FSL problem and give the taxonomy under the framework. We survey two FSL directions: learning policy and meta-learning. Results: We review the advanced applications of FSL, including image classification, object detection, image segmentation and other tasks etc., as well as the corresponding benchmarks to provide an overview of recent progress. Conclusion: FSL needs to be further studied in medical images, language models, and reinforcement learning in future work. In addition, cross-domain FSL, successive FSL, and associated FSL are more challenging and valuable research directions.


2020 ◽  
Vol 500 (3) ◽  
pp. 3838-3853
Author(s):  
Fuyu Dong ◽  
Yu Yu ◽  
Jun Zhang ◽  
Xiaohu Yang ◽  
Pengjie Zhang

ABSTRACT The integrated Sachs–Wolfe (ISW) effect is caused by the decay of cosmological gravitational potential and is therefore a unique probe of dark energy. However, its robust detection is still problematic. Various tensions between different data sets, different large-scale structure (LSS) tracers, and between data and the ΛCDM theory prediction exist. We propose a novel method of ISW measurement by cross-correlating cosmic microwave background (CMB) and the LSS traced by ‘low-density position’ (LDP). It isolates the ISW effect generated by low-density regions of the universe but insensitive to selection effects associated with voids. We apply it to the DR8 galaxy catalogue of the DESI Legacy imaging surveys and obtain the LDPs at z ≤ 0.6 over ∼20 000 deg2 sky coverage. We then cross-correlate with the Planck temperature map and detect the ISW effect at 3.2σ. We further compare the measurement with numerical simulations of the concordance ΛCDM cosmology and find the ISW amplitude parameter AISW = 1.14 ± 0.38 when we adopt an LDP definition radius $R_\mathrm{ s}=3^{^{\prime }}$, fully consistent with the prediction of the standard ΛCDM cosmology (AISW = 1). This agreement with ΛCDM cosmology holds for all the galaxy samples and Rs that we have investigated. Furthermore, the S/N is comparable to that of galaxy ISW measurement. These results demonstrate the LDP method as a competitive alternative to existing ISW measurement methods and provide independent checks to existing tensions.


Author(s):  
Kai Liu ◽  
Hua Wang ◽  
Fei Han ◽  
Hao Zhang

Visual place recognition is essential for large-scale simultaneous localization and mapping (SLAM). Long-term robot operations across different time of the days, months, and seasons introduce new challenges from significant environment appearance variations. In this paper, we propose a novel method to learn a location representation that can integrate the semantic landmarks of a place with its holistic representation. To promote the robustness of our new model against the drastic appearance variations due to long-term visual changes, we formulate our objective to use non-squared ℓ2-norm distances, which leads to a difficult optimization problem that minimizes the ratio of the ℓ2,1-norms of matrices. To solve our objective, we derive a new efficient iterative algorithm, whose convergence is rigorously guaranteed by theory. In addition, because our solution is strictly orthogonal, the learned location representations can have better place recognition capabilities. We evaluate the proposed method using two large-scale benchmark data sets, the CMU-VL and Nordland data sets. Experimental results have validated the effectiveness of our new method in long-term visual place recognition applications.


Author(s):  
Xinyan Huang ◽  
Xinjun Wang ◽  
Yan Zhang ◽  
Jinxin Zhao

<p class="Abstract">A trace of an entity is a behavior trajectory of the entity. Periodicity is a frequent phenomenon for the traces of an entity. Finding periodic traces for an entity is essential to understanding the entity behaviors. However, mining periodic traces is of complexity procedure, involving the unfixed period of a trace, the existence of multiple periodic traces, the large-scale events of an entity and the complexity of the model to represent all the events. However, the existing methods can’t offer the desirable efficiency for periodic traces mining. In this paper, Firstly, a graph model(an event relationship graph) is adopted to represent all the events about an entity, then a novel and efficient algorithm, TracesMining, is proposed to mine all the periodic traces. In our algorithm, firstly, the cluster analysis method is adopted according to the similarity of the activity attribute of an event and each cluster gets a different label, and secondly a novel method is proposed to mine all the Star patterns from the event relationship graph. Finally, an efficient method is proposed to merge all the Stars to get all the periodic traces. High efficiency is achieved by our algorithm through deviating from the existing edge-by-edge pattern-growth framework and reducing the heavy cost of the calculation of the support of a pattern and avoiding the production of lots of redundant patterns. In addition, our algorithm could mine all the large periodic traces and most small periodic traces. Extensive experimental studies on synthetic data sets demonstrate the effectiveness of our method.</p>


2012 ◽  
Vol 20 (2) ◽  
pp. 235-259 ◽  
Author(s):  
MARTHA YIFIRU TACHBELIE ◽  
SOLOMON TEFERRA ABATE ◽  
WOLFGANG MENZEL

AbstractThis paper presents morpheme-based language models developed for Amharic (a morphologically rich Semitic language) and their application to a speech recognition task. A substantial reduction in the out of vocabulary rate has been observed as a result of using subwords or morphemes. Thus a severe problem of morphologically rich languages has been addressed. Moreover, lower perplexity values have been obtained with morpheme-based language models than with word-based models. However, when comparing the quality based on the probability assigned to the test sets, word-based models seem to fare better. We have studied the utility of morpheme-based language models in speech recognition systems and found that the performance of a relatively small vocabulary (5k) speech recognition system improved significantly as a result of using morphemes as language modeling and dictionary units. However, as the size of the vocabulary increases (20k or more) the morpheme-based systems suffer from acoustic confusability and did not achieve a significant improvement over a word-based system with an equivalent vocabulary size even with the use of higher order (quadrogram) n-gram language models.


2018 ◽  
Vol 6 ◽  
pp. 529-541 ◽  
Author(s):  
Jacob Buckman ◽  
Graham Neubig

In this work, we propose a new language modeling paradigm that has the ability to perform both prediction and moderation of information flow at multiple granularities: neural lattice language models. These models construct a lattice of possible paths through a sentence and marginalize across this lattice to calculate sequence probabilities or optimize parameters. This approach allows us to seamlessly incorporate linguistic intuitions — including polysemy and the existence of multiword lexical items — into our language model. Experiments on multiple language modeling tasks show that English neural lattice language models that utilize polysemous embeddings are able to improve perplexity by 9.95% relative to a word-level baseline, and that a Chinese model that handles multi-character tokens is able to improve perplexity by 20.94% relative to a character-level baseline.


2021 ◽  
Author(s):  
Xu Dong ◽  
huipeng li

Abstract The output of conventional Teager energy operator (TEO) is approximately equal to the square product of the instantaneous amplitude and the instantaneous frequency ( A 2 Ω 2 ). The original TEO can effectively enhance the transient shock components and suppress the non-impacting elements, and it also changes the frequency distribution of the original shock. In this paper, a complete Teager energy operator is proposed, and its expression is more exact than original method. By keeping the positive and negative distribution of the shock signal x ( t ), the fundamental frequency energy of the impulses can be effectively enhanced. The incipient fault characteristics of large-scale rotating machinery are typically micro shock pulse, extremely weak and mixed with heavy noise. Preprocessing the fault signal and enhancing the micro shock component are essential means to extract the early fault features. In the experiment part, the applicability of the proposed method is verified by the simulated micro impact signal, the common bearing fault data-sets and the practical measured data of the test bench.


Author(s):  
Xiuying Chen ◽  
Zhangming Chan ◽  
Shen Gao ◽  
Meng-Hsuan Yu ◽  
Dongyan Zhao ◽  
...  

Timeline summarization targets at concisely summarizing the evolution trajectory along the timeline and existing timeline summarization approaches are all based on extractive methods.In this paper, we propose the task of abstractive timeline summarization, which tends to concisely paraphrase the information in the time-stamped events.Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order.To tackle this challenge, we propose a memory-based timeline summarization model (MTS).Concretely, we propose a time-event memory to establish a timeline, and use the time position of events on this timeline to guide generation process.Besides, in each decoding step, we incorporate event-level information into word-level attention to avoid confusion between events.Extensive experiments are conducted on a large-scale real-world dataset, and the results show that MTS achieves the state-of-the-art performance in terms of both automatic and human evaluations.


2019 ◽  
Author(s):  
Yukun Feng ◽  
Hidetaka Kamigaito ◽  
Hiroya Takamura ◽  
Manabu Okumura

Sign in / Sign up

Export Citation Format

Share Document