CAWA: An Attention-Network for Credit Attribution

Saurav Manchanda; George Karypis

doi:10.1609/aaai.v34i05.6367

CAWA: An Attention-Network for Credit Attribution

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6367 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8472-8479

Author(s):

Saurav Manchanda ◽

George Karypis

Keyword(s):

State Of The Art ◽

Text Summarization ◽

Training Data ◽

Learning Framework ◽

Distant Supervision ◽

Sentence Level ◽

Class Labels ◽

The Individual ◽

Traditional Approaches ◽

Better Than

Credit attribution is the task of associating individual parts in a document with their most appropriate class labels. It is an important task with applications to information retrieval and text summarization. When labeled training data is available, traditional approaches for sequence tagging can be used for credit attribution. However, generating such labeled datasets is expensive and time-consuming. In this paper, we present Credit Attribution With Attention (CAWA), a neural-network-based approach, that instead of using sentence-level labeled data, uses the set of class labels that are associated with an entire document as a source of distant-supervision. CAWA combines an attention mechanism with a multilabel classifier into an end-to-end learning framework to perform credit attribution. CAWA labels the individual sentences from the input document using the resultant attention-weights. CAWA improves upon the state-of-the-art credit attribution approach by not constraining a sentence to belong to just one class, but modeling each sentence as a distribution over all classes, leading to better modeling of semantically-similar classes. Experiments on the credit attribution task on a variety of datasets show that the sentence class labels generated by CAWA outperform the competing approaches. Additionally, on the multilabel text classification task, CAWA performs better than the competing credit attribution approaches1.

Download Full-text

Cross-Relation Cross-Bag Attention for Distantly-Supervised Relation Extraction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301419 ◽

2019 ◽

Vol 33 ◽

pp. 419-426 ◽

Cited By ~ 6

Author(s):

Yujin Yuan ◽

Liyuan Liu ◽

Siliang Tang ◽

Zhongfei Zhang ◽

Yueting Zhuang ◽

...

Keyword(s):

Selective Attention ◽

Supervised Learning ◽

State Of The Art ◽

Relation Extraction ◽

Knowledge Bases ◽

Training Data ◽

Distant Supervision ◽

Sentence Level ◽

Noise Robust

Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C2SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.

Download Full-text

Estimating probability of banking crises using random forest

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i2.pp407-413 ◽

2021 ◽

Vol 10 (2) ◽

pp. 407

Author(s):

Sri Hartini ◽

Zuherman Rustam ◽

Glori Stephani Saragih ◽

María Jesús Segovia Vargas

Keyword(s):

Random Forest ◽

State Of The Art ◽

Banking Crises ◽

Training Data ◽

Annual Data ◽

Systemic Crisis ◽

Classification And Regression ◽

Systemic Crises ◽

The Impact ◽

Better Than

<span id="docs-internal-guid-4935b5ce-7fff-d9fa-75c7-0c6a5aa1f9a6"><span>Banks have a crucial role in the financial system. When many banks suffer from the crisis, it can lead to financial instability. According to the impact of the crises, the banking crisis can be divided into two categories, namely systemic and non-systemic crisis. When systemic crises happen, it may cause even stable banks bankrupt. Hence, this paper proposed a random forest for estimating the probability of banking crises as prevention action. Random forest is well-known as a robust technique both in classification and regression, which is far from the intervention of outliers and overfitting. The experiments were then constructed using the financial crisis database, containing a sample of 79 countries in the period 1981-1999 (annual data). This dataset has 521 samples consisting of 164 crisis samples and 357 non-crisis cases. From the experiments, it was concluded that utilizing 90 percent of training data would deliver 0.98 accuracy, 0.92 sensitivity, 1.00 precision, and 0.96 F1-Score as the highest score than other percentages of training data. These results are also better than state-of-the-art methods used in the same dataset. Therefore, the proposed method is shown promising results to predict the probability of banking crises.</span></span>

Download Full-text

A Girl Has No Name: Automated Authorship Obfuscation using Mutant-X

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2019-0058 ◽

2019 ◽

Vol 2019 (4) ◽

pp. 54-71

Author(s):

Asad Mahmood ◽

Faizan Ahmad ◽

Zubair Shafiq ◽

Padmini Srinivasan ◽

Fareed Zaffar

Keyword(s):

State Of The Art ◽

Fitness Function ◽

Random Search ◽

Training Data ◽

Authorship Attribution ◽

Future Research ◽

Original Text ◽

Text Corpora ◽

Semantic Relevance ◽

Better Than

Abstract Stylometric authorship attribution aims to identify an anonymous or disputed document’s author by examining its writing style. The development of powerful machine learning based stylometric authorship attribution methods presents a serious privacy threat for individuals such as journalists and activists who wish to publish anonymously. Researchers have proposed several authorship obfuscation approaches that try to make appropriate changes (e.g. word/phrase replacements) to evade attribution while preserving semantics. Unfortunately, existing authorship obfuscation approaches are lacking because they either require some manual effort, require significant training data, or do not work for long documents. To address these limitations, we propose a genetic algorithm based random search framework called Mutant-X which can automatically obfuscate text to successfully evade attribution while keeping the semantics of the obfuscated text similar to the original text. Specifically, Mutant-X sequentially makes changes in the text using mutation and crossover techniques while being guided by a fitness function that takes into account both attribution probability and semantic relevance. While Mutant-X requires black-box knowledge of the adversary’s classifier, it does not require any additional training data and also works on documents of any length. We evaluate Mutant-X against a variety of authorship attribution methods on two different text corpora. Our results show that Mutant-X can decrease the accuracy of state-of-the-art authorship attribution methods by as much as 64% while preserving the semantics much better than existing automated authorship obfuscation approaches. While Mutant-X advances the state-of-the-art in automated authorship obfuscation, we find that it does not generalize to a stronger threat model where the adversary uses a different attribution classifier than what Mutant-X assumes. Our findings warrant the need for future research to improve the generalizability (or transferability) of automated authorship obfuscation approaches.

Download Full-text

Rotational Unit of Memory: A Novel Representation Unit for RNNs with Scalable Applications

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00258 ◽

2019 ◽

Vol 7 ◽

pp. 121-138 ◽

Cited By ~ 1

Author(s):

Rumen Dangovski ◽

Li Jing ◽

Preslav Nakov ◽

Mićo Tatalović ◽

Marin Soljačić

Keyword(s):

Short Term Memory ◽

State Of The Art ◽

Language Modeling ◽

Text Summarization ◽

Short Term ◽

Memory State ◽

Long Short Term Memory ◽

Gated Recurrent Units ◽

Better Than ◽

Real World Problems

Stacking long short-term memory (LSTM) cells or gated recurrent units (GRUs) as part of a recurrent neural network (RNN) has become a standard approach to solving a number of tasks ranging from language modeling to text summarization. Although LSTMs and GRUs were designed to model long-range dependencies more accurately than conventional RNNs, they nevertheless have problems copying or recalling information from the long distant past. Here, we derive a phase-coded representation of the memory state, Rotational Unit of Memory (RUM), that unifies the concepts of unitary learning and associative memory. We show experimentally that RNNs based on RUMs can solve basic sequential tasks such as memory copying and memory recall much better than LSTMs/GRUs. We further demonstrate that by replacing LSTM/GRU with RUM units we can apply neural networks to real-world problems such as language modeling and text summarization, yielding results comparable to the state of the art.

Download Full-text

A Two-Stream Mutual Attention Network for Semi-Supervised Biomedical Segmentation with Noisy Labels

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014578 ◽

2019 ◽

Vol 33 ◽

pp. 4578-4585 ◽

Cited By ~ 9

Author(s):

Shaobo Min ◽

Xuejin Chen ◽

Zheng-Jun Zha ◽

Feng Wu ◽

Yongdong Zhang

Keyword(s):

Supervised Learning ◽

State Of The Art ◽

Training Data ◽

Attention Network ◽

Attention Model ◽

Learning Framework ◽

Propagation Analysis ◽

Supervised Methods ◽

Multi Level ◽

Noisy Labels

Learning-based methods suffer from a deficiency of clean annotations, especially in biomedical segmentation. Although many semi-supervised methods have been proposed to provide extra training data, automatically generated labels are usually too noisy to retrain models effectively. In this paper, we propose a Two-Stream Mutual Attention Network (TSMAN) that weakens the influence of back-propagated gradients caused by incorrect labels, thereby rendering the network robust to unclean data. The proposed TSMAN consists of two sub-networks that are connected by three types of attention models in different layers. The target of each attention model is to indicate potentially incorrect gradients in a certain layer for both sub-networks by analyzing their inferred features using the same input. In order to achieve this purpose, the attention models are designed based on the propagation analysis of noisy gradients at different layers. This allows the attention models to effectively discover incorrect labels and weaken their influence during parameter updating process. By exchanging multi-level features within two-stream architecture, the effects of noisy labels in each sub-network are reduced by decreasing the noisy gradients. Furthermore, a hierarchical distillation is developed to provide reliable pseudo labels for unlabelded data, which further boosts the performance of TSMAN. The experiments using both HVSMR 2016 and BRATS 2015 benchmarks demonstrate that our semi-supervised learning framework surpasses the state-of-the-art fully-supervised results.

Download Full-text

Optimisation of 2D U-Net Model Components for Automatic Prostate Segmentation on MRI

Applied Sciences ◽

10.3390/app10072601 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2601 ◽

Cited By ~ 1

Author(s):

Indriani P. Astono ◽

James S. Welsh ◽

Stephan Chalup ◽

Peter Greer

Keyword(s):

State Of The Art ◽

Automatic Segmentation ◽

Feature Maps ◽

The Public ◽

Prostate Segmentation ◽

Segmentation Approach ◽

Model Components ◽

The Individual ◽

Deep Learning Model ◽

Better Than

In this paper, we develop an optimised state-of-the-art 2D U-Net model by studying the effects of the individual deep learning model components in performing prostate segmentation. We found that for upsampling, the combination of interpolation and convolution is better than the use of transposed convolution. For combining feature maps in each convolution block, it is only beneficial if a skip connection with concatenation is used. With respect to pooling, average pooling is better than strided-convolution, max, RMS or L2 pooling. Introducing a batch normalisation layer before the activation layer gives further performance improvement. The optimisation is based on a private dataset as it has a fixed 2D resolution and voxel size for every image which mitigates the need of a resizing operation in the data preparation process. Non-enhancing data preprocessing was applied and five-fold cross-validation was used to evaluate the fully automatic segmentation approach. We show it outperforms the traditional methods that were previously applied on the private dataset, as well as outperforming other comparable state-of-the-art 2D models on the public dataset PROMISE12.

Download Full-text

Learning from Web Data Using Adversarial Discriminative Neural Networks for Fine-Grained Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301273 ◽

2019 ◽

Vol 33 ◽

pp. 273-280 ◽

Cited By ~ 2

Author(s):

Xiaoxiao Sun ◽

Liyi Chen ◽

Jufeng Yang

Keyword(s):

Neural Networks ◽

Large Scale ◽

State Of The Art ◽

Training Data ◽

Web Data ◽

Fine Grained ◽

Learning Framework ◽

Attractive Option ◽

Public Datasets ◽

Noisy Labels

Fine-grained classification is absorbed in recognizing the subordinate categories of one field, which need a large number of labeled images, while it is expensive to label these images. Utilizing web data has been an attractive option to meet the demands of training data for convolutional neural networks (CNNs), especially when the well-labeled data is not enough. However, directly training on such easily obtained images often leads to unsatisfactory performance due to factors such as noisy labels. This has been conventionally addressed by reducing the noise level of web data. In this paper, we take a fundamentally different view and propose an adversarial discriminative loss to advocate representation coherence between standard and web data. This is further encapsulated in a simple, scalable and end-to-end trainable multi-task learning framework. We experiment on three public datasets using large-scale web data to evaluate the effectiveness and generalizability of the proposed approach. Extensive experiments demonstrate that our approach performs favorably against the state-of-the-art methods.

Download Full-text

Textual Backdoor Defense via Poisoned Sample Recognition

Applied Sciences ◽

10.3390/app11219938 ◽

2021 ◽

Vol 11 (21) ◽

pp. 9938

Author(s):

Kun Shao ◽

Yu Zhang ◽

Junan Yang ◽

Hui Liu

Keyword(s):

Success Rate ◽

Language Processing ◽

Training Data ◽

Infection Model ◽

Search Range ◽

Word Level ◽

Sentence Level ◽

Preliminary Model ◽

Sample Recognition ◽

Better Than

Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attacks based on data poisoning in existing research is as high as 100%. In order to enhance the natural language processing model’s defense against backdoor attacks, we propose a textual backdoor defense method via poisoned sample recognition. Our method consists of two parts: the first step is to add a controlled noise layer after the model embedding layer, and to train a preliminary model with incomplete or no backdoor embedding, which reduces the effectiveness of poisoned samples. Then, we use the model to initially identify the poisoned samples in the training set so as to narrow the search range of the poisoned samples. The second step uses all the training data to train an infection model embedded in the backdoor, which is used to reclassify the samples selected in the first step, and finally identify the poisoned samples. Through detailed experiments, we have proved that our defense method can effectively defend against a variety of backdoor attacks (character-level, word-level and sentence-level backdoor attacks), and the experimental effect is better than the baseline method. For the BERT model trained by the IMDB dataset, this method can even reduce the success rate of word-level backdoor attacks to 0%.

Download Full-text

Ordinal Zero-Shot Learning

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/266 ◽

2017 ◽

Author(s):

Zengwei Huo ◽

Xin Geng

Keyword(s):

Side Information ◽

Training Data ◽

Head Pose Estimation ◽

Classification Problems ◽

Ordinal Classification ◽

Regression Methods ◽

New Class ◽

Text Corpora ◽

Class Labels ◽

Better Than

Zero-shot learning predicts new class even if no training data is available for that class. The solution to conventional zero-shot learning usually depends on side information such as attribute or text corpora. But these side information is not easy to obtain or use. Fortunately in many classification tasks, the class labels are ordered, and therefore closely related to each other. This paper deals with zero-shot learning for ordinal classification. The key idea is using label relevance to expand supervision information from seen labels to unseen labels. The proposed method SIDL generates a supervision intensity distribution (SID) that contains each label's supervision intensity, and then learns a mapping from instance to SID. Experiments on two typical ordinal classification problems, i.e., head pose estimation and age estimation, show that SIDL performs significantly better than the compared regression methods. Furthermore, SIDL appears much more robust against the increase of unseen labels than other compared baselines.

Download Full-text

Distant Supervision for Relation Extraction with Sentence Selection and Interaction Representation

Wireless Communications and Mobile Computing ◽

10.1155/2021/8889075 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Tiantian Chen ◽

Nianbin Wang ◽

Hongbin Wang ◽

Haomin Zhan

Keyword(s):

Large Scale ◽

Semantic Information ◽

State Of The Art ◽

Relation Extraction ◽

Semantic Features ◽

Distant Supervision ◽

Word Level ◽

Proposed Model ◽

Relation Prediction ◽

Better Than

Distant supervision (DS) has been widely used for relation extraction (RE), which automatically generates large-scale labeled data. However, there is a wrong labeling problem, which affects the performance of RE. Besides, the existing method suffers from the lack of useful semantic features for some positive training instances. To address the above problems, we propose a novel RE model with sentence selection and interaction representation for distantly supervised RE. First, we propose a pattern method based on the relation trigger words as a sentence selector to filter out noisy sentences to alleviate the wrong labeling problem. After clean instances are obtained, we propose the interaction representation using the word-level attention mechanism-based entity pairs to dynamically increase the weights of the words related to entity pairs, which can provide more useful semantic information for relation prediction. The proposed model outperforms the strongest baseline by 2.61 in F1-score on a widely used dataset, which proves that our model performs significantly better than the state-of-the-art RE systems.

Download Full-text