Unsupervised law article mining based on deep pre-trained language representation models with application to the Italian civil code

Artificial Intelligence and Law ◽

10.1007/s10506-021-09301-8 ◽

2021 ◽

Author(s):

Andrea Tagarelli ◽

Andrea Simeri

Keyword(s):

Deep Learning ◽

Language Processing ◽

Civil Code ◽

Fine Tuning ◽

Learning Approaches ◽

Retrieval Task ◽

Learning Framework ◽

Text Classifiers ◽

The Law ◽

Prediction Problems

AbstractModeling law search and retrieval as prediction problems has recently emerged as a predominant approach in law intelligence. Focusing on the law article retrieval task, we present a deep learning framework named LamBERTa, which is designed for civil-law codes, and specifically trained on the Italian civil code. To our knowledge, this is the first study proposing an advanced approach to law article prediction for the Italian legal system based on a BERT (Bidirectional Encoder Representations from Transformers) learning framework, which has recently attracted increased attention among deep learning approaches, showing outstanding effectiveness in several natural language processing and learning tasks. We define LamBERTa models by fine-tuning an Italian pre-trained BERT on the Italian civil code or its portions, for law article retrieval as a classification task. One key aspect of our LamBERTa framework is that we conceived it to address an extreme classification scenario, which is characterized by a high number of classes, the few-shot learning problem, and the lack of test query benchmarks for Italian legal prediction tasks. To solve such issues, we define different methods for the unsupervised labeling of the law articles, which can in principle be applied to any law article code system. We provide insights into the explainability and interpretability of our LamBERTa models, and we present an extensive experimental analysis over query sets of different type, for single-label as well as multi-label evaluation tasks. Empirical evidence has shown the effectiveness of LamBERTa, and also its superiority against widely used deep-learning text classifiers and a few-shot learner conceived for an attribute-aware prediction task.

Download Full-text

Generation of Cross-Lingual Word Vectors for Low-Resourced Languages Using Deep Learning and Topological Metrics in a Data-Efficient Way

Electronics ◽

10.3390/electronics10121372 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1372

Author(s):

Sanjanasri JP ◽

Vijay Krishna Menon ◽

Soman KP ◽

Rajendran S ◽

Agnieszka Wolk

Keyword(s):

Deep Learning ◽

Language Processing ◽

Semantic Space ◽

Semantic Interpretation ◽

Learning Approaches ◽

Qualitative Comparison ◽

Bilingual Dictionary ◽

Pos Tagging ◽

Part Of Speech ◽

Cross Lingual

Linguists have been focused on a qualitative comparison of the semantics from different languages. Evaluation of the semantic interpretation among disparate language pairs like English and Tamil is an even more formidable task than for Slavic languages. The concept of word embedding in Natural Language Processing (NLP) has enabled a felicitous opportunity to quantify linguistic semantics. Multi-lingual tasks can be performed by projecting the word embeddings of one language onto the semantic space of the other. This research presents a suite of data-efficient deep learning approaches to deduce the transfer function from the embedding space of English to that of Tamil, deploying three popular embedding algorithms: Word2Vec, GloVe and FastText. A novel evaluation paradigm was devised for the generation of embeddings to assess their effectiveness, using the original embeddings as ground truths. Transferability across other target languages of the proposed model was assessed via pre-trained Word2Vec embeddings from Hindi and Chinese languages. We empirically prove that with a bilingual dictionary of a thousand words and a corresponding small monolingual target (Tamil) corpus, useful embeddings can be generated by transfer learning from a well-trained source (English) embedding. Furthermore, we demonstrate the usability of generated target embeddings in a few NLP use-case tasks, such as text summarization, part-of-speech (POS) tagging, and bilingual dictionary induction (BDI), bearing in mind that those are not the only possible applications.

Download Full-text

Deep Learning Approaches for Spoken and Natural Language Processing

10.1007/978-3-030-79778-2 ◽

2021 ◽

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Approaches

Download Full-text

Mol-BERT: An Effective Molecular Representation with BERT for Molecular Property Prediction

Wireless Communications and Mobile Computing ◽

10.1155/2021/7181815 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Juncai Li ◽

Xiaofei Jiang

Keyword(s):

Deep Learning ◽

Language Processing ◽

Large Scale ◽

Molecular Data ◽

Molecular Property ◽

Property Prediction ◽

Learning Framework ◽

Learning Techniques ◽

Potential Benefits ◽

Current Sequence

Molecular property prediction is an essential task in drug discovery. Most computational approaches with deep learning techniques either focus on designing novel molecular representation or combining with some advanced models together. However, researchers pay fewer attention to the potential benefits in massive unlabeled molecular data (e.g., ZINC). This task becomes increasingly challenging owing to the limitation of the scale of labeled data. Motivated by the recent advancements of pretrained models in natural language processing, the drug molecule can be naturally viewed as language to some extent. In this paper, we investigate how to develop the pretrained model BERT to extract useful molecular substructure information for molecular property prediction. We present a novel end-to-end deep learning framework, named Mol-BERT, that combines an effective molecular representation with pretrained BERT model tailored for molecular property prediction. Specifically, a large-scale prediction BERT model is pretrained to generate the embedding of molecular substructures, by using four million unlabeled drug SMILES (i.e., ZINC 15 and ChEMBL 27). Then, the pretrained BERT model can be fine-tuned on various molecular property prediction tasks. To examine the performance of our proposed Mol-BERT, we conduct several experiments on 4 widely used molecular datasets. In comparison to the traditional and state-of-the-art baselines, the results illustrate that our proposed Mol-BERT can outperform the current sequence-based methods and achieve at least 2% improvement on ROC-AUC score on Tox21, SIDER, and ClinTox dataset.

Download Full-text

Self-Supervised Pre-Training of Transformers for Satellite Image Time Series Classification

10.36227/techrxiv.13025039.v1 ◽

2020 ◽

Author(s):

Yuan Yuan ◽

Lei Lin

Keyword(s):

Time Series ◽

Deep Learning ◽

Large Scale ◽

Temporal Structure ◽

Satellite Image ◽

Fine Tuning ◽

Small Scale ◽

Model Parameters ◽

Learning Approaches ◽

Wide Range

Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 1.91% to 6.69%. <div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>

Download Full-text

Covid-19 detection via deep neural network and occlusion sensitivity maps

10.36227/techrxiv.14100890 ◽

2021 ◽

Author(s):

Noor Ahmad ◽

Muhammad Aminu ◽

Mohd Halim Mohd Noor

Keyword(s):

Neural Network ◽

Deep Learning ◽

Deep Neural Network ◽

State Of The Art ◽

Color Images ◽

Fine Tuning ◽

Training Dataset ◽

Learning Approaches ◽

Learning Models ◽

Sensitivity Maps

Deep learning approaches have attracted a lot of attention in the automatic detection of Covid-19 and transfer learning is the most common approach. However, majority of the pre-trained models are trained on color images, which can cause inefficiencies when fine-tuning the models on Covid-19 images which are often grayscale. To address this issue, we propose a deep learning architecture called CovidNet which requires a relatively smaller number of parameters. CovidNet accepts grayscale images as inputs and is suitable for training with limited training dataset. Experimental results show that CovidNet outperforms other state-of-the-art deep learning models for Covid-19 detection.

Download Full-text

Boosting Automated Sleep Staging Performance in Big Datasets using Population Sub-grouping

SLEEP ◽

10.1093/sleep/zsab027 ◽

2021 ◽

Author(s):

Samaneh Nasiri ◽

Gari D Clifford

Keyword(s):

Deep Learning ◽

Feature Space ◽

Learning Approaches ◽

Sleep Staging ◽

Learning Framework ◽

Sensitivity Score ◽

Instrument Noise ◽

Novel Method ◽

Clustering Approach ◽

Statistical Relationships

Abstract Current approaches to automated sleep staging from the electroencephalogram (EEG) rely on constructing a large labeled training and test corpora by aggregating data from different individuals. However, many of the subjects in the training set may exhibit changes in the EEG that are very different from the subjects in the test set. Training an algorithm on such data without accounting for this diversity can cause underperformance. Moreover, test data may have unexpected sensor misplacement or different instrument noise and spectral responses. This work proposes a novel method to learn relevant individuals based on their similarities effectively. The proposed method embeds all training patients into a shared and robust feature space. Individuals that share strong statistical relationships and are similar based on their EEG signals are clustered in this feature space before being passed to a deep learning framework for classification. Using 994 patient EEGs from the 2018 Physionet Challenge (≈ 6,561 hours of recording), we demonstrate that the clustering approach significantly boosts performance compared to state-of-the-art deep learning approaches. The proposed method improves, on average, a precision score from 0.72 to 0.81, a sensitivity score from 0.74 to 0.82, and a Cohen’s Kappa coefficient from 0.64 to 0.75 under 10-fold cross-validation.

Download Full-text

Improving Techniques for Naïve Bayes Text Classifiers

Handbook of Research on Text and Web Mining Technologies ◽

10.4018/978-1-59904-990-8.ch007 ◽

2010 ◽

pp. 111-127

Author(s):

Han-joon Kim

Keyword(s):

Text Classification ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Systems ◽

Classification Model ◽

Learning Approaches ◽

Learning Framework ◽

The Em Algorithm ◽

Meta Learning ◽

Text Classifiers

This chapter introduces two practical techniques for improving Naïve Bayes text classifiers that are widely used for text classification. The Naïve Bayes has been evaluated to be a practical text classification algorithm due to its simple classification model, reasonable classification accuracy, and easy update of classification model. Thus, many researchers have a strong incentive to improve the Naïve Bayes by combining it with other meta-learning approaches such as EM (Expectation Maximization) and Boosting. The EM approach is to combine the Naïve Bayes with the EM algorithm and the Boosting approach is to use the Naïve Bayes as a base classifier in the AdaBoost algorithm. For both approaches, a special uncertainty measure fit for Naïve Bayes learning is used. In the Naïve Bayes learning framework, these approaches are expected to be practical solutions to the problem of lack of training documents in text classification systems.

Download Full-text

A general approach for improving deep learning-based medical relation extraction using a pre-trained model and fine-tuning

Database ◽

10.1093/database/baz116 ◽

2019 ◽

Vol 2019 ◽

Cited By ~ 2

Author(s):

Tao Chen ◽

Mingfen Wu ◽

Hexi Li

Keyword(s):

Deep Learning ◽

Large Scale ◽

Relation Extraction ◽

Training Model ◽

Biomedical Literature ◽

Training Data ◽

Fine Tuning ◽

Learning Approaches ◽

Additional Time ◽

Clinical Records

Abstract The automatic extraction of meaningful relations from biomedical literature or clinical records is crucial in various biomedical applications. Most of the current deep learning approaches for medical relation extraction require large-scale training data to prevent overfitting of the training model. We propose using a pre-trained model and a fine-tuning technique to improve these approaches without additional time-consuming human labeling. Firstly, we show the architecture of Bidirectional Encoder Representations from Transformers (BERT), an approach for pre-training a model on large-scale unstructured text. We then combine BERT with a one-dimensional convolutional neural network (1d-CNN) to fine-tune the pre-trained model for relation extraction. Extensive experiments on three datasets, namely the BioCreative V chemical disease relation corpus, traditional Chinese medicine literature corpus and i2b2 2012 temporal relation challenge corpus, show that the proposed approach achieves state-of-the-art results (giving a relative improvement of 22.2, 7.77, and 38.5% in F1 score, respectively, compared with a traditional 1d-CNN classifier). The source code is available at https://github.com/chentao1999/MedicalRelationExtraction.

Download Full-text