BERT-based ensemble methods with data augmentation for legal textual entailment in COLIEE statute law task

Author(s):  
Masaharu Yoshioka ◽  
Yasuhiro Aoki ◽  
Youta Suzuki
2005 ◽  
Vol 533 (2) ◽  
pp. 147-159 ◽  
Author(s):  
M.J. Sáiz-Abajo ◽  
B.-H. Mevik ◽  
V.H. Segtnan ◽  
T. Næs

Author(s):  
Qian Yu ◽  
Wai Lam

Data imbalance is a key limiting factor for Learning to Rank (LTR) models in information retrieval. Resampling methods and ensemble methods cannot handle the imbalance problem well since none of them incorporate more informative data into the training procedure of LTR models. We propose a data generation model based on Adversarial Autoencoder (AAE) for tackling the data imbalance in LTR via informative data augmentation. This model can be utilized for handling two types of data imbalance, namely, imbalance regarding relevance levels for a particular query and imbalance regarding the amount of relevance judgements in different queries. In the proposed model, relevance information is disentangled from the latent representations in this AAE-based model in order to reconstruct data with specific relevance levels. The semantic information of queries, derived from word embeddings, is incorporated in the adversarial training stage for regularizing the distribution of the latent representation. Two informative data augmentation strategies suitable for LTR are designed utilizing the proposed data generation model. Experiments on benchmark LTR datasets demonstrate that our proposed framework can significantly improve the performance of LTR models.


2020 ◽  
Vol 43 ◽  
Author(s):  
Myrthe Faber

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.


Author(s):  
Alex Hernández-García ◽  
Johannes Mehrer ◽  
Nikolaus Kriegeskorte ◽  
Peter König ◽  
Tim C. Kietzmann

2002 ◽  
Vol 7 (1) ◽  
pp. 31-42
Author(s):  
J. Šaltytė ◽  
K. Dučinskas

The Bayesian classification rule used for the classification of the observations of the (second-order) stationary Gaussian random fields with different means and common factorised covariance matrices is investigated. The influence of the observed data augmentation to the Bayesian risk is examined for three different nonlinear widely applicable spatial correlation models. The explicit expression of the Bayesian risk for the classification of augmented data is derived. Numerical comparison of these models by the variability of Bayesian risk in case of the first-order neighbourhood scheme is performed.


2020 ◽  
Vol 64 (4) ◽  
pp. 40412-1-40412-11
Author(s):  
Kexin Bai ◽  
Qiang Li ◽  
Ching-Hsin Wang

Abstract To address the issues of the relatively small size of brain tumor image datasets, severe class imbalance, and low precision in existing segmentation algorithms for brain tumor images, this study proposes a two-stage segmentation algorithm integrating convolutional neural networks (CNNs) and conventional methods. Four modalities of the original magnetic resonance images were first preprocessed separately. Next, preliminary segmentation was performed using an improved U-Net CNN containing deep monitoring, residual structures, dense connection structures, and dense skip connections. The authors adopted a multiclass Dice loss function to deal with class imbalance and successfully prevented overfitting using data augmentation. The preliminary segmentation results subsequently served as the a priori knowledge for a continuous maximum flow algorithm for fine segmentation of target edges. Experiments revealed that the mean Dice similarity coefficients of the proposed algorithm in whole tumor, tumor core, and enhancing tumor segmentation were 0.9072, 0.8578, and 0.7837, respectively. The proposed algorithm presents higher accuracy and better stability in comparison with some of the more advanced segmentation algorithms for brain tumor images.


2020 ◽  
Author(s):  
Dean Sumner ◽  
Jiazhen He ◽  
Amol Thakkar ◽  
Ola Engkvist ◽  
Esben Jannik Bjerrum

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>


Author(s):  
Gabriel Ribeiro ◽  
Marcos Yamasaki ◽  
Helon Vicente Hultmann Ayala ◽  
Leandro Coelho ◽  
Viviana Mariani

Sign in / Sign up

Export Citation Format

Share Document