Learning to Learn Morphological Inflection for Resource-Poor Languages

Katharina Kann; Samuel R. Bowman; Kyunghyun Cho

doi:10.1609/aaai.v34i05.6316

Learning to Learn Morphological Inflection for Resource-Poor Languages

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6316 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8058-8065

Author(s):

Katharina Kann ◽

Samuel R. Bowman ◽

Kyunghyun Cho

Keyword(s):

Fine Tuning ◽

Target Language ◽

Model Parameters ◽

Transfer Model ◽

Absolute Accuracy ◽

Learning To Learn ◽

Suggested Approach ◽

High Resource ◽

Resource Poor ◽

Cross Lingual

We propose to cast the task of morphological inflection—mapping a lemma to an indicated inflected form—for resource-poor languages as a meta-learning problem. Treating each language as a separate task, we use data from high-resource source languages to learn a set of model parameters that can serve as a strong initialization point for fine-tuning on a resource-poor target language. Experiments with two model architectures on 29 target languages from 3 families show that our suggested approach outperforms all baselines. In particular, it obtains a 31.7% higher absolute accuracy than a previously proposed cross-lingual transfer model and outperforms the previous state of the art by 1.7% absolute accuracy on average over languages.

Download Full-text

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00303 ◽

2020 ◽

Vol 8 ◽

pp. 109-124

Author(s):

Shuyan Zhou ◽

Shruti Rijhwani ◽

John Wieting ◽

Jaime Carbonell ◽

Graham Neubig

Keyword(s):

State Of The Art ◽

Target Language ◽

Entity Linking ◽

Average Gain ◽

Source Language ◽

Low Resource ◽

High Resource ◽

Language Knowledge ◽

Cross Lingual ◽

Improved Model

Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts. The first step of (X)EL is candidate generation, which retrieves a list of plausible candidate entities from the target-language KB for each mention. Approaches based on resources from Wikipedia have proven successful in the realm of relatively high-resource languages, but these do not extend well to low-resource languages with few, if any, Wikipedia pages. Recently, transfer learning methods have been shown to reduce the demand for resources in the low-resource languages by utilizing resources in closely related languages, but the performance still lags far behind their high-resource counterparts. In this paper, we first assess the problems faced by current entity candidate generation methods for low-resource XEL, then propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios. The methods are simple, but effective: We experiment with our approach on seven XEL datasets and find that they yield an average gain of 16.9% in Top-30 gold candidate recall, compared with state-of-the-art baselines. Our improved model also yields an average gain of 7.9% in in-KB accuracy of end-to-end XEL. 1

Download Full-text

Source Language Adaptation Approaches for Resource-Poor Machine Translation

Computational Linguistics ◽

10.1162/coli_a_00248 ◽

2016 ◽

Vol 42 (2) ◽

pp. 277-306 ◽

Cited By ~ 8

Author(s):

Pidong Wang ◽

Preslav Nakov ◽

Hwee Tou Ng

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Target Language ◽

Source Language ◽

World Languages ◽

Word Level ◽

Resource Poor ◽

Morphological Variants ◽

Cross Lingual ◽

Translation Systems

Most of the world languages are resource-poor for statistical machine translation; still, many of them are actually related to some resource-rich language. Thus, we propose three novel, language-independent approaches to source language adaptation for resource-poor statistical machine translation. Specifically, we build improved statistical machine translation models from a resource-poor language POOR into a target language TGT by adapting and using a large bitext for a related resource-rich language RICH and the same target language TGT. We assume a small POOR–TGT bitext from which we learn word-level and phrase-level paraphrases and cross-lingual morphological variants between the resource-rich and the resource-poor language. Our work is of importance for resource-poor machine translation because it can provide a useful guideline for people building machine translation systems for resource-poor languages. Our experiments for Indonesian/Malay–English translation show that using the large adapted resource-rich bitext yields 7.26 BLEU points of improvement over the unadapted one and 3.09 BLEU points over the original small bitext. Moreover, combining the small POOR–TGT bitext with the adapted bitext outperforms the corresponding combinations with the unadapted bitext by 1.93–3.25 BLEU points. We also demonstrate the applicability of our approaches to other languages and domains.

Download Full-text

Monolingual and Cross-Lingual Intent Detection without Training Data in Target Languages

Electronics ◽

10.3390/electronics10121412 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1412

Author(s):

Jurgita Kapočiūtė-Dzikienė ◽

Askars Salimbajevs ◽

Raivis Skadiņš

Keyword(s):

Experimental Investigation ◽

Training Data ◽

Fine Tuning ◽

Target Language ◽

Learning Approach ◽

Lazy Learning ◽

Detection Problem ◽

Target Languages ◽

Cross Lingual ◽

Similar Accuracy

Due to recent DNN advancements, many NLP problems can be effectively solved using transformer-based models and supervised data. Unfortunately, such data is not available in some languages. This research is based on assumptions that (1) training data can be obtained by the machine translating it from another language; (2) there are cross-lingual solutions that work without the training data in the target language. Consequently, in this research, we use the English dataset and solve the intent detection problem for five target languages (German, French, Lithuanian, Latvian, and Portuguese). When seeking the most accurate solutions, we investigate BERT-based word and sentence transformers together with eager learning classifiers (CNN, BERT fine-tuning, FFNN) and lazy learning approach (Cosine similarity as the memory-based method). We offer and evaluate several strategies to overcome the data scarcity problem with machine translation, cross-lingual models, and a combination of the previous two. The experimental investigation revealed the robustness of sentence transformers under various cross-lingual conditions. The accuracy equal to ~0.842 is achieved with the English dataset with completely monolingual models is considered our top-line. However, cross-lingual approaches demonstrate similar accuracy levels reaching ~0.831, ~0.829, ~0.853, ~0.831, and ~0.813 on German, French, Lithuanian, Latvian, and Portuguese languages.

Download Full-text

Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00205 ◽

2013 ◽

Vol 1 ◽

pp. 1-12 ◽

Cited By ~ 23

Author(s):

Oscar Täckström ◽

Dipanjan Das ◽

Slav Petrov ◽

Ryan McDonald ◽

Joakim Nivre

Keyword(s):

Conditional Random Field ◽

Target Language ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

European Languages ◽

Partially Observed ◽

Resource Poor ◽

Conditional Random Field Model ◽

Cross Lingual ◽

Speech Tagging

We consider the construction of part-of-speech taggers for resource-poor languages. Recently, manually constructed tag dictionaries from Wiktionary and dictionaries projected via bitext have been used as type constraints to overcome the scarcity of annotated data in this setting. In this paper, we show that additional token constraints can be projected from a resource-rich source language to a resource-poor target language via word-aligned bitext. We present several models to this end; in particular a partially observed conditional random field model, where coupled token and type constraints provide a partial signal for training. Averaged across eight previously studied Indo-European languages, our model achieves a 25% relative error reduction over the prior state of the art. We further present successful results on seven additional languages from different families, empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages.

Download Full-text

Self-Supervised Pre-Training of Transformers for Satellite Image Time Series Classification

10.36227/techrxiv.13025039.v1 ◽

2020 ◽

Author(s):

Yuan Yuan ◽

Lei Lin

Keyword(s):

Time Series ◽

Deep Learning ◽

Large Scale ◽

Temporal Structure ◽

Satellite Image ◽

Fine Tuning ◽

Small Scale ◽

Model Parameters ◽

Learning Approaches ◽

Wide Range

Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 1.91% to 6.69%. <div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>

Download Full-text

How to Parse Low-Resource Languages: Cross-Lingual Parsing, Target Language Annotation, or Both?

10.18653/v1/w19-7713 ◽

2019 ◽

Author(s):

Ailsa Meechan-Maddon ◽

Joakim Nivre

Keyword(s):

Target Language ◽

Low Resource ◽

Cross Lingual

Download Full-text

Massively Parallel Cross-Lingual Learning in Low-Resource Target Language Translation

10.18653/v1/w18-6324 ◽

2018 ◽

Cited By ~ 1

Author(s):

Zhong Zhou ◽

Matthias Sperber ◽

Alexander Waibel

Keyword(s):

Language Translation ◽

Target Language ◽

Massively Parallel ◽

Low Resource ◽

Cross Lingual

Download Full-text

A transfer learning framework based on motor imagery rehabilitation for stroke

Scientific Reports ◽

10.1038/s41598-021-99114-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Fangzhou Xu ◽

Yunjing Miao ◽

Yanan Sun ◽

Dongju Guo ◽

Jiali Xu ◽

...

Keyword(s):

Transfer Learning ◽

Motor Imagery ◽

Stroke Rehabilitation ◽

Transfer Functions ◽

Model Parameters ◽

Transfer Model ◽

Learning Networks ◽

Training Time ◽

Fine Tune ◽

Bci System

AbstractDeep learning networks have been successfully applied to transfer functions so that the models can be adapted from the source domain to different target domains. This study uses multiple convolutional neural networks to decode the electroencephalogram (EEG) of stroke patients to design effective motor imagery (MI) brain-computer interface (BCI) system. This study has introduced ‘fine-tune’ to transfer model parameters and reduced training time. The performance of the proposed framework is evaluated by the abilities of the models for two-class MI recognition. The results show that the best framework is the combination of the EEGNet and ‘fine-tune’ transferred model. The average classification accuracy of the proposed model for 11 subjects is 66.36%, and the algorithm complexity is much lower than other models.These good performance indicate that the EEGNet model has great potential for MI stroke rehabilitation based on BCI system. It also successfully demonstrated the efficiency of transfer learning for improving the performance of EEG-based stroke rehabilitation for the BCI system.

Download Full-text

Summertime canopy albedo is sensitive to forest thinning

Biogeosciences Discussions ◽

10.5194/bgd-10-15373-2013 ◽

2013 ◽

Vol 10 (9) ◽

pp. 15373-15414 ◽

Cited By ~ 1

Author(s):

J. Otto ◽

D. Berveiller ◽

F.-M. Bréon ◽

N. Delpierre ◽

G. Geppert ◽

...

Keyword(s):

Forest Management ◽

Tree Species ◽

Structural Changes ◽

Near Infrared ◽

Canopy Structure ◽

Model Parameters ◽

Transfer Model ◽

Forest Thinning ◽

Crown Volume ◽

The Difference

Abstract. Despite an emerging body of literature linking canopy albedo to forest management, understanding of the process is still fragmented. We combined a stand-level forest gap model with a canopy radiation transfer model and satellite-derived model parameters to quantify the effects of forest thinning, that is removing trees at a certain time during the forest rotation, on summertime canopy albedo. The effects of different forest species (pine, beech, oak) and four thinning strategies (light to intense thinning regimes) were examined. During stand establishment, summertime canopy albedo is driven by tree species. In the later stages of stand development, the effect of tree species on summertime canopy albedo decreases in favour of an increasing influence of forest thinning on summertime canopy albedo. These trends continue until the end of the rotation where thinning explains up to 50% of the variance in near-infrared canopy albedo and up to 70% of the variance in visible canopy albedo. More intense thinning lowers the summertime shortwave albedo in the canopy by as much as 0.02 compared to unthinned forest. The structural changes associated with forest thinning can be described by the change in LAI in combination with crown volume. However, forests with identical canopy structure can have different summertime albedo values due to their location: the further north a forest is situated, the more the solar zenith angle increases and thus the higher is the summertime canopy albedo, independent of the wavelength. Despite the increase of absolute summertime canopy albedo values with latitude, the difference in canopy albedo between managed and unmanaged forest decreases with increasing latitude. Forest management thus strongly altered summertime forest albedo.

Download Full-text

Colorisation of archival aerial imagery using deep learning

10.5194/egusphere-egu21-11925 ◽

2021 ◽

Author(s):

Ryusei Ishii ◽

Patrice Carbonneau ◽

Hitoshi Miyamoto

Keyword(s):

Deep Learning ◽

Urban Expansion ◽

Research Work ◽

Fine Tuning ◽

Aerial Images ◽

Model Parameters ◽

Style Transfer ◽

Freeze Layer ◽

Rgb Images ◽

Japanese Rivers

<p>Archival imagery dating back to the mid-twentieth century holds information that pre-dates urban expansion and the worst impacts of climate change.&#160; In this research, we examine deep learning colorisation methods applied to historical aerial images in Japan.&#160; Specifically, we attempt to colorize monochrome images of river basins by applying the method of Neural Style Transfer (NST).&#160;&#160;&#160; First, we created RGB orthomosaics (1m) for reaches of 3 Japanese rivers, the Kurobe, Ishikari, and Kinu rivers.&#160; From the orthomosaics, we extract 60 thousand image tiles of `100 x100` pixels in order to train the CNN used in NST.&#160; The Image tiles were classified into 6 classes: urban, river, forest, tree, grass, and paddy field.&#160; Second, we use the VGG16 model pre-trained on ImageNet data in a transfer learning approach where we freeze a variable number of layers.&#160; We fine-tuned the training epochs, learning rate, and frozen layers in VGG16 in order to derive the optimal CNN used in NST.&#160; The fine tuning resulted in the F-measure accuracy of 0.961, 0.947, and 0.917 for the freeze layer in 7,11,15, respectively.&#160; Third, we colorize monochrome aerial images by the NST with the retrained model weights.&#160; Here used RGB images for 7 Japanese rivers and the corresponding grayscale versions to evaluate the present NST colorization performance.&#160; The RMSE between the RGB and resultant colorized images showed the best performance with the model parameters of lower content layer (6), shallower freeze layer (7), and larger style/content weighting ratio (1.0 x10&#8309;).&#160; The NST hyperparameter analysis indicated that the colorized images became rougher when the content layer selected deeper in the VGG model.&#160; This is because the deeper the layer, the more features were extracted from the original image.&#160; It was also confirmed that the Kurobe and Ishikari rivers indicated higher accuracy in colorisation.&#160; It might come from the fact that the training dataset of the fine tuning was extracted from these river images.&#160; Finally, we colorized historical monochrome images of Kurobe river with the best NST parameters, resulting in quality high enough compared with the RGB images.&#160; The result indicated that the fine tuning of the NST model could achieve high performance to proceed further land cover classification in future research work.</p>

Download Full-text