scholarly journals Leveraging Multimodal Out-of-Domain Information to Improve Low-Resource Speech Translation

2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Wenbo Zhu ◽  
Hao Jin ◽  
WeiChang Yeh ◽  
Jianwen Chen ◽  
Lufeng Luo ◽  
...  

Speech translation (ST) is a bimodal conversion task from source speech to the target text. Generally, deep learning-based ST systems require sufficient training data to obtain a competitive result, even with a state-of-the-art model. However, the training data is usually unable to meet the completeness condition due to the small sample problems. Most low-resource ST tasks improve data integrity with a single model, but this optimization has a single dimension and limited effectiveness. In contrast, multimodality is introduced to leverage different dimensions of data features for multiperspective modeling. This approach mutually addresses the gaps in the different modalities to enhance the representation of the data and improve the utilization of the training samples. Therefore, it is a new challenge to leverage the enormous multimodal out-of-domain information to improve the low-resource tasks. This paper describes how to use multimodal out-of-domain information to improve low-resource models. First, we propose a low-resource ST framework to reconstruct large-scale label-free audio by combining self-supervised learning. At the same time, we introduce a machine translation (MT) pretraining model to complement text embedding and fine-tune decoding. In addition, we analyze the similarity at the decoder side. We reduce multimodal invalid pseudolabels by performing random depth pruning in the similarity layer to minimize error propagation and use additional CTC loss in the nonsimilarity layer to optimize the ensemble loss. Finally, we study the weighting ratio of the fusion technique in the multimodal decoder. Our experiment results show that the proposed method is promising for low-resource ST, with improvements of up to +3.6 BLEU points compared to baseline low-resource ST models.

2021 ◽  
Author(s):  
Wilson Wongso ◽  
Henry Lucky ◽  
Derwin Suhartono

Abstract The Sundanese language has over 32 million speakers worldwide, but the language has reaped little to no benefits from the recent advances in natural language understanding. Like other low-resource languages, the only alternative is to fine-tune existing multilingual models. In this paper, we pre-trained three monolingual Transformer-based language models on Sundanese data. When evaluated on a downstream text classification task, we found that most of our monolingual models outperformed larger multilingual models despite the smaller overall pre-training data. In the subsequent analyses, our models benefited strongly from the Sundanese pre-training corpus size and do not exhibit socially biased behavior. We released our models for other researchers and practitioners to use.


2021 ◽  
Author(s):  
Samreen Ahmed ◽  
shakeel khoja

<p>In recent years, low-resource Machine Reading Comprehension (MRC) has made significant progress, with models getting remarkable performance on various language datasets. However, none of these models have been customized for the Urdu language. This work explores the semi-automated creation of the Urdu Question Answering Dataset (UQuAD1.0) by combining machine-translated SQuAD with human-generated samples derived from Wikipedia articles and Urdu RC worksheets from Cambridge O-level books. UQuAD1.0 is a large-scale Urdu dataset intended for extractive machine reading comprehension tasks consisting of 49k question Answers pairs in question, passage, and answer format. In UQuAD1.0, 45000 pairs of QA were generated by machine translation of the original SQuAD1.0 and approximately 4000 pairs via crowdsourcing. In this study, we used two types of MRC models: rule-based baseline and advanced Transformer-based models. However, we have discovered that the latter outperforms the others; thus, we have decided to concentrate solely on Transformer-based architectures. Using XLMRoBERTa and multi-lingual BERT, we acquire an F<sub>1</sub> score of 0.66 and 0.63, respectively.</p>


2020 ◽  
pp. 1-22
Author(s):  
Sukanta Sen ◽  
Mohammed Hasanuzzaman ◽  
Asif Ekbal ◽  
Pushpak Bhattacharyya ◽  
Andy Way

Abstract Neural machine translation (NMT) has recently shown promising results on publicly available benchmark datasets and is being rapidly adopted in various production systems. However, it requires high-quality large-scale parallel corpus, and it is not always possible to have sufficiently large corpus as it requires time, money, and professionals. Hence, many existing large-scale parallel corpus are limited to the specific languages and domains. In this paper, we propose an effective approach to improve an NMT system in low-resource scenario without using any additional data. Our approach aims at augmenting the original training data by means of parallel phrases extracted from the original training data itself using a statistical machine translation (SMT) system. Our proposed approach is based on the gated recurrent unit (GRU) and transformer networks. We choose the Hindi–English, Hindi–Bengali datasets for Health, Tourism, and Judicial (only for Hindi–English) domains. We train our NMT models for 10 translation directions, each using only 5–23k parallel sentences. Experiments show the improvements in the range of 1.38–15.36 BiLingual Evaluation Understudy points over the baseline systems. Experiments show that transformer models perform better than GRU models in low-resource scenarios. In addition to that, we also find that our proposed method outperforms SMT—which is known to work better than the neural models in low-resource scenarios—for some translation directions. In order to further show the effectiveness of our proposed model, we also employ our approach to another interesting NMT task, for example, old-to-modern English translation, using a tiny parallel corpus of only 2.7K sentences. For this task, we use publicly available old-modern English text which is approximately 1000 years old. Evaluation for this task shows significant improvement over the baseline NMT.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5191
Author(s):  
Jin Zhang ◽  
Fengyuan Wei ◽  
Fan Feng ◽  
Chunyang Wang

Convolutional neural networks provide an ideal solution for hyperspectral image (HSI) classification. However, the classification effect is not satisfactory when limited training samples are available. Focused on “small sample” hyperspectral classification, we proposed a novel 3D-2D-convolutional neural network (CNN) model named AD-HybridSN (Attention-Dense-HybridSN). In our proposed model, a dense block was used to reuse shallow features and aimed at better exploiting hierarchical spatial–spectral features. Subsequent depth separable convolutional layers were used to discriminate the spatial information. Further refinement of spatial–spectral features was realized by the channel attention method and spatial attention method, which were performed behind every 3D convolutional layer and every 2D convolutional layer, respectively. Experiment results indicate that our proposed model can learn more discriminative spatial–spectral features using very few training data. In Indian Pines, Salinas and the University of Pavia, AD-HybridSN obtain 97.02%, 99.59% and 98.32% overall accuracy using only 5%, 1% and 1% labeled data for training, respectively, which are far better than all the contrast models.


2021 ◽  
Author(s):  
Samreen Ahmed ◽  
shakeel khoja

<p>In recent years, low-resource Machine Reading Comprehension (MRC) has made significant progress, with models getting remarkable performance on various language datasets. However, none of these models have been customized for the Urdu language. This work explores the semi-automated creation of the Urdu Question Answering Dataset (UQuAD1.0) by combining machine-translated SQuAD with human-generated samples derived from Wikipedia articles and Urdu RC worksheets from Cambridge O-level books. UQuAD1.0 is a large-scale Urdu dataset intended for extractive machine reading comprehension tasks consisting of 49k question Answers pairs in question, passage, and answer format. In UQuAD1.0, 45000 pairs of QA were generated by machine translation of the original SQuAD1.0 and approximately 4000 pairs via crowdsourcing. In this study, we used two types of MRC models: rule-based baseline and advanced Transformer-based models. However, we have discovered that the latter outperforms the others; thus, we have decided to concentrate solely on Transformer-based architectures. Using XLMRoBERTa and multi-lingual BERT, we acquire an F<sub>1</sub> score of 0.66 and 0.63, respectively.</p>


Author(s):  
Y. Hamrouni ◽  
É. Paillassa ◽  
V. Chéret ◽  
C. Monteil ◽  
D. Sheeren

Abstract. The current context of availability of Earth Observation satellite data at high spatial and temporal resolutions makes it possible to map large areas. Although supervised classification is the most widely adopted approach, its performance is highly dependent on the availability and the quality of training data. However, gathering samples from field surveys or through photo interpretation is often expensive and time-consuming especially when the area to be classified is large. In this paper we propose the use of an active learning-based technique to address this issue by reducing the labelling effort required for supervised classification while increasing the generalisation capabilities of the classifier across space. Experiments were conducted to identify poplar plantations in three different sites in France using Sentinel-2 time series. In order to characterise the age of the identified poplar stands, temporal means of Sentinel-1 backscatter coefficients were computed. The results are promising and show the good capacities of the active learning-based approach to achieve similar performance (Poplar F-score &amp;geq; 90%) to traditional passive learning (i.e. with random selection of samples) with up to 50% fewer training samples. Sentinel-1 annual means have demonstrated their potential to differentiate two stand ages with an overall accuracy of 83% regardless of the cultivar considered.


Sensors ◽  
2019 ◽  
Vol 19 (23) ◽  
pp. 5276 ◽  
Author(s):  
Fan Feng ◽  
Shuangting Wang ◽  
Chunyang Wang ◽  
Jin Zhang

Every pixel in a hyperspectral image contains detailed spectral information in hundreds of narrow bands captured by hyperspectral sensors. Pixel-wise classification of a hyperspectral image is the cornerstone of various hyperspectral applications. Nowadays, deep learning models represented by the convolutional neural network (CNN) provides an ideal solution for feature extraction, and has made remarkable achievements in supervised hyperspectral classification. However, hyperspectral image annotation is time-consuming and laborious, and available training data is usually limited. Due to the “small-sample problem”, CNN-based hyperspectral classification is still challenging. Focused on the limited sample-based hyperspectral classification, we designed an 11-layer CNN model called R-HybridSN (Residual-HybridSN) from the perspective of network optimization. With an organic combination of 3D-2D-CNN, residual learning, and depth-separable convolutions, R-HybridSN can better learn deep hierarchical spatial–spectral features with very few training data. The performance of R-HybridSN is evaluated over three public available hyperspectral datasets on different amounts of training samples. Using only 5%, 1%, and 1% labeled data for training in Indian Pines, Salinas, and University of Pavia, respectively, the classification accuracy of R-HybridSN is 96.46%, 98.25%, 96.59%, respectively, which is far better than the contrast models.


Plants ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 2633
Author(s):  
Zane K. J. Hartley ◽  
Andrew P. French

Wheat head detection is a core computer vision problem related to plant phenotyping that in recent years has seen increased interest as large-scale datasets have been made available for use in research. In deep learning problems with limited training data, synthetic data have been shown to improve performance by increasing the number of training examples available but have had limited effectiveness due to domain shift. To overcome this, many adversarial approaches such as Generative Adversarial Networks (GANs) have been proposed as a solution by better aligning the distribution of synthetic data to that of real images through domain augmentation. In this paper, we examine the impacts of performing wheat head detection on the global wheat head challenge dataset using synthetic data to supplement the original dataset. Through our experimentation, we demonstrate the challenges of performing domain augmentation where the target domain is large and diverse. We then present a novel approach to improving scores through using heatmap regression as a support network, and clustering to combat high variation of the target domain.


2008 ◽  
Vol 31 (4) ◽  
pp. 19
Author(s):  
I Pasic ◽  
A Shlien ◽  
A Novokmet ◽  
C Zhang ◽  
U Tabori ◽  
...  

Introduction: OS, a common Li-Fraumeni syndrome (LFS)-associated neoplasm, is a common bone malignancy of children and adolescents. Sporadic OS is also characterized by young age of onset and high genomic instability, suggesting a genetic contribution to disease. This study examined the contribution of novel DNA structural variation elements, CNVs, to OS susceptibility. Given our finding of excessive constitutional DNA CNV in LFS patients, which often coincide with cancer-related genes, we hypothesized that constitutional CNV may also provide clues about the aetiology of LFS-related sporadic neoplasms like OS. Methods: CNV in blood DNA of 26 patients with sporadic OS was compared to that of 263 normal control samples from the International HapMap project, as well as 62 local controls. Analysis was performed on DNA hybridized to Affymetrix genome-wide human SNP array 6.0 by Partek Genomic Suite. Results: There was no detectable difference in average number of CNVs, CNV length, and total structural variation (product of average CNV number and length) between individuals with OS and controls. While this data is preliminary (small sample size), it argues against the presence of constitutional genomic instability in individuals with sporadic OS. Conclusion: We found that the majority of tumours from patients with sporadic OS show CN loss at chr3q13.31, raising the possibility that chr3q13.31 may represent a “driver” region in OS aetiology. In at least one OS tumour, which displays CN loss at chr3q13.31, we demonstrate decreased expression of a known tumour suppressor gene located at chr3q13.31. We are investigating the role ofchr3q13.31 in development of OS.


2020 ◽  
Vol 27 ◽  
Author(s):  
Zaheer Ullah Khan ◽  
Dechang Pi

Background: S-sulfenylation (S-sulphenylation, or sulfenic acid) proteins, are special kinds of post-translation modification, which plays an important role in various physiological and pathological processes such as cytokine signaling, transcriptional regulation, and apoptosis. Despite these aforementioned significances, and by complementing existing wet methods, several computational models have been developed for sulfenylation cysteine sites prediction. However, the performance of these models was not satisfactory due to inefficient feature schemes, severe imbalance issues, and lack of an intelligent learning engine. Objective: In this study, our motivation is to establish a strong and novel computational predictor for discrimination of sulfenylation and non-sulfenylation sites. Methods: In this study, we report an innovative bioinformatics feature encoding tool, named DeepSSPred, in which, resulting encoded features is obtained via n-segmented hybrid feature, and then the resampling technique called synthetic minority oversampling was employed to cope with the severe imbalance issue between SC-sites (minority class) and non-SC sites (majority class). State of the art 2DConvolutional Neural Network was employed over rigorous 10-fold jackknife cross-validation technique for model validation and authentication. Results: Following the proposed framework, with a strong discrete presentation of feature space, machine learning engine, and unbiased presentation of the underline training data yielded into an excellent model that outperforms with all existing established studies. The proposed approach is 6% higher in terms of MCC from the first best. On an independent dataset, the existing first best study failed to provide sufficient details. The model obtained an increase of 7.5% in accuracy, 1.22% in Sn, 12.91% in Sp and 13.12% in MCC on the training data and12.13% of ACC, 27.25% in Sn, 2.25% in Sp, and 30.37% in MCC on an independent dataset in comparison with 2nd best method. These empirical analyses show the superlative performance of the proposed model over both training and Independent dataset in comparison with existing literature studies. Conclusion : In this research, we have developed a novel sequence-based automated predictor for SC-sites, called DeepSSPred. The empirical simulations outcomes with a training dataset and independent validation dataset have revealed the efficacy of the proposed theoretical model. The good performance of DeepSSPred is due to several reasons, such as novel discriminative feature encoding schemes, SMOTE technique, and careful construction of the prediction model through the tuned 2D-CNN classifier. We believe that our research work will provide a potential insight into a further prediction of S-sulfenylation characteristics and functionalities. Thus, we hope that our developed predictor will significantly helpful for large scale discrimination of unknown SC-sites in particular and designing new pharmaceutical drugs in general.


Sign in / Sign up

Export Citation Format

Share Document