Dual-objective fine-tuning of BERT for entity matching

An increasing number of data providers have adopted shared numbering schemes such as GTIN, ISBN, DUNS, or ORCID numbers for identifying entities in the respective domain. This means for data integration that shared identifiers are often available for a subset of the entity descriptions to be integrated while such identifiers are not available for others. The challenge in these settings is to learn a matcher for entity descriptions without identifiers using the entity descriptions containing identifiers as training data. The task can be approached by learning a binary classifier which distinguishes pairs of entity descriptions for the same real-world entity from descriptions of different entities. The task can also be modeled as a multi-class classification problem by learning classifiers for identifying descriptions of individual entities. We present a dual-objective training method for BERT, called JointBERT, which combines binary matching and multi-class classification, forcing the model to predict the entity identifier for each entity description in a training pair in addition to the match/non-match decision. Our evaluation across five entity matching benchmark datasets shows that dual-objective training can increase the matching performance for seen products by 1% to 5% F1 compared to single-objective Transformer-based methods, given that enough training data is available for both objectives. In order to gain a deeper understanding of the strengths and weaknesses of the proposed method, we compare JointBERT to several other BERT-based matching methods as well as baseline systems along a set of specific matching challenges. This evaluation shows that JointBERT, given enough training data for both objectives, outperforms the other methods on tasks involving seen products, while it underperforms for unseen products. Using a combination of LIME explanations and domain-specific word classes, we analyze the matching decisions of the different deep learning models and conclude that BERT-based models are better at focusing on relevant word classes compared to RNN-based models.

Download Full-text

An Iterative Multi-Source Mutual Knowledge Transfer Framework for Machine Reading Comprehension

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/525 ◽

2020 ◽

Cited By ~ 1

Author(s):

Xin Liu ◽

Kai Liu ◽

Xiang Li ◽

Jinsong Su ◽

Yubin Ge ◽

...

Keyword(s):

Reading Comprehension ◽

Knowledge Transfer ◽

Training Data ◽

Target Domain ◽

Domain Specific ◽

Mutual Knowledge ◽

Benchmark Datasets ◽

Knowledge Distillation ◽

The Many ◽

Machine Reading

The lack of sufficient training data in many domains, poses a major challenge to the construction of domain-specific machine reading comprehension (MRC) models with satisfying performance. In this paper, we propose a novel iterative multi-source mutual knowledge transfer framework for MRC. As an extension of the conventional knowledge transfer with one-to-one correspondence, our framework focuses on the many-to-many mutual transfer, which involves synchronous executions of multiple many-to-one transfers in an iterative manner.Specifically, to update a target-domain MRC model, we first consider other domain-specific MRC models as individual teachers, and employ knowledge distillation to train a multi-domain MRC model, which is differentially required to fit the training data and match the outputs of these individual models according to their domain-level similarities to the target domain. After being initialized by the multi-domain MRC model, the target-domain MRC model is fine-tuned to match both its training data and the output of its previous best model simultaneously via knowledge distillation. Compared with previous approaches, our framework can continuously enhance all domain-specific MRC models by enabling each model to iteratively and differentially absorb the domain-shared knowledge from others. Experimental results and in-depth analyses on several benchmark datasets demonstrate the effectiveness of our framework.

Download Full-text

Intelligent Neural Network Schemes for Multi-Class Classification

Applied Sciences ◽

10.3390/app9194036 ◽

2019 ◽

Vol 9 (19) ◽

pp. 4036 ◽

Cited By ~ 1

Author(s):

You ◽

Wu ◽

Lee ◽

Liu

Keyword(s):

Neural Network ◽

Clustering Algorithm ◽

Classification Problem ◽

Machine Learning Techniques ◽

Training Dataset ◽

Reduction Techniques ◽

Learning Techniques ◽

Benchmark Datasets ◽

Dimensionality Reduction Techniques ◽

Multi Class Classification

Multi-class classification is a very important technique in engineering applications, e.g., mechanical systems, mechanics and design innovations, applied materials in nanotechnologies, etc. A large amount of research is done for single-label classification where objects are associated with a single category. However, in many application domains, an object can belong to two or more categories, and multi-label classification is needed. Traditionally, statistical methods were used; recently, machine learning techniques, in particular neural networks, have been proposed to solve the multi-class classification problem. In this paper, we develop radial basis function (RBF)-based neural network schemes for single-label and multi-label classification, respectively. The number of hidden nodes and the parameters involved with the basis functions are determined automatically by applying an iterative self-constructing clustering algorithm to the given training dataset, and biases and weights are derived optimally by least squares. Dimensionality reduction techniques are adopted and integrated to help reduce the overfitting problem associated with the RBF networks. Experimental results from benchmark datasets are presented to show the effectiveness of the proposed schemes.

Download Full-text

NONPARALLEL HYPERPLANES PROXIMAL CLASSIFIERS BASED ON MANIFOLD REGULARIZATION FOR LABELED AND UNLABELED EXAMPLES

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001413500158 ◽

2013 ◽

Vol 27 (05) ◽

pp. 1350015 ◽

Cited By ~ 4

Author(s):

ZHI-XIA YANG

Keyword(s):

Linear Equations ◽

Classification Problem ◽

Twin Support Vector Machine ◽

Support Vector ◽

Manifold Regularization ◽

Classification Problems ◽

Intrinsic Structure ◽

Benchmark Datasets ◽

Multi Class Classification ◽

Proximal Classifiers

In this paper, we propose two Laplacian nonparallel hyperplane proximal classifiers (LapNPPCs) for semi-supervised and full-supervised classification problem respectively by adding manifold regularization terms. Due to the manifold regularization terms, our LapNPPCs are able to exploit the intrinsic structure of the patterns of the training set. Furthermore, our classifiers only need to solve two systems of linear equations rather than two quadratic programming (QP) problems as needed in Laplacian twin support vector machine (LapTSVM) (Z. Qi, Y. Tian and Y. Shi, Neural Netw.35 (2012) 46–53). Numerical experiments on toy and UCI benchmark datasets show that the accuracy of our LapNPPCs is comparable with other classifiers, such as the standard SVM, TWSVM and LapTSVM, etc. It is also the case that based on our LapNPPCs, some other TWSVM type classifiers with manifold regularization can be constructed by choosing different norms and loss functions to deal with semi-supervised binary and multi-class classification problems.

Download Full-text

Few Shot Network Compression via Cross Distillation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5718 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3203-3210

Author(s):

Haoli Bai ◽

Jiaxiang Wu ◽

Irwin King ◽

Michael Lyu

Keyword(s):

Training Data ◽

Fine Tuning ◽

Privacy And Security ◽

Estimation Errors ◽

Security Issues ◽

Model Compression ◽

Network Output ◽

Benchmark Datasets ◽

And Performance ◽

Network Compression

Model compression has been widely adopted to obtain light-weighted deep neural networks. Most prevalent methods, however, require fine-tuning with sufficient training data to ensure accuracy, which could be challenged by privacy and security issues. As a compromise between privacy and performance, in this paper we investigate few shot network compression: given few samples per class, how can we effectively compress the network with negligible performance drop? The core challenge of few shot network compression lies in high estimation errors from the original network during inference, since the compressed network can easily over-fits on the few training instances. The estimation errors could propagate and accumulate layer-wisely and finally deteriorate the network output. To address the problem, we propose cross distillation, a novel layer-wise knowledge distillation approach. By interweaving hidden layers of teacher and student network, layer-wisely accumulated estimation errors can be effectively reduced. The proposed method offers a general framework compatible with prevalent network compression techniques such as pruning. Extensive experiments n benchmark datasets demonstrate that cross distillation can significantly improve the student network's accuracy when only a few training instances are available.

Download Full-text

Adaptive Semi-Supervised Learning with Discriminative Least Squares Regression

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/337 ◽

2017 ◽

Cited By ~ 9

Author(s):

Minnan Luo ◽

Lingling Zhang ◽

Feiping Nie ◽

Xiaojun Chang ◽

Buyue Qian ◽

...

Keyword(s):

Least Squares ◽

Supervised Learning ◽

Closed Form Solution ◽

Form Solution ◽

Training Data ◽

Least Squares Regression ◽

Adaptive Parameter ◽

Proposed Model ◽

Benchmark Datasets ◽

Multi Class Classification

Semi-supervised learning plays a significant role in multi-class classification, where a small number of labeled data are more deterministic while substantial unlabeled data might cause large uncertainties and potential threats. In this paper, we distinguish the label fitting of labeled and unlabeled training data through a probabilistic vector with an adaptive parameter, which always ensures the significant importance of labeled data and characterizes the contribution of unlabeled instance according to its uncertainty. Instead of using traditional least squares regression (LSR) for classification, we develop a new discriminative LSR by equipping each label with an adjustment vector. This strategy avoids incorrect penalization on samples that are far away from the boundary and simultaneously facilitates multi-class classification by enlarging the geometrical distance of instances belonging to different classes. An efficient alternative algorithm is exploited to solve the proposed model with closed form solution for each updating rule. We also analyze the convergence and complexity of the proposed algorithm theoretically. Experimental results on several benchmark datasets demonstrate the effectiveness and superiority of the proposed model for multi-class classification tasks.

Download Full-text

On the Learning Property of Logistic and Softmax Losses for Deep Neural Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5907 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4739-4746

Author(s):

Xiangrui Li ◽

Xin Li ◽

Deng Pan ◽

Dongxiao Zhu

Keyword(s):

Neural Networks ◽

Visual Recognition ◽

Binary Classification ◽

Training Data ◽

Necessary Condition ◽

Deep Convolutional Neural Networks ◽

Target Class ◽

Benchmark Datasets ◽

Negative Class ◽

Multi Class Classification

Deep convolutional neural networks (CNNs) trained with logistic and softmax losses have made significant advancement in visual recognition tasks in computer vision. When training data exhibit class imbalances, the class-wise reweighted version of logistic and softmax losses are often used to boost performance of the unweighted version. In this paper, motivated to explain the reweighting mechanism, we explicate the learning property of those two loss functions by analyzing the necessary condition (e.g., gradient equals to zero) after training CNNs to converge to a local minimum. The analysis immediately provides us explanations for understanding (1) quantitative effects of the class-wise reweighting mechanism: deterministic effectiveness for binary classification using logistic loss yet indeterministic for multi-class classification using softmax loss; (2) disadvantage of logistic loss for single-label multi-class classification via one-vs.-all approach, which is due to the averaging effect on predicted probabilities for the negative class (e.g., non-target classes) in the learning process. With the disadvantage and advantage of logistic loss disentangled, we thereafter propose a novel reweighted logistic loss for multi-class classification. Our simple yet effective formulation improves ordinary logistic loss by focusing on learning hard non-target classes (target vs. non-target class in one-vs.-all) and turned out to be competitive with softmax loss. We evaluate our method on several benchmark datasets to demonstrate its effectiveness.

Download Full-text

Dog Breed Identification with Fine tuning of Pre-trained models

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1464.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 3677-3680

Keyword(s):

Neural Networks ◽

Transfer Learning ◽

Convolutional Neural Networks ◽

Classification Problem ◽

Training Data ◽

Fine Tuning ◽

Dog Breed ◽

Classification Of Images ◽

Multiclass Classifier

Dog Breed identification is a specific application of Convolutional Neural Networks. Though the classification of Images by Convolutional Neural Network serves to be efficient method, still it has few drawbacks. Convolutional Neural Networks requires a large amount of images as training data and basic time for training the data and to achieve higher accuracy on the classification. To overcome this substantial time we use Transfer Learning. In computer vision, transfer learning refers to the use of a pre-trained models to train the CNN. By Transfer learning, a pre-trained model is trained to provide solution to classification problem which is similar to the classification problem we have. In this project we are using various pre-trained models like VGG16, Xception, InceptionV3 to train over 1400 images covering 120 breeds out of which 16 breeds of dogs were used as classes for training and obtain bottleneck features from these pre-trained models. Finally, Logistic Regression a multiclass classifier is used to identify the breed of the dog from the images and obtained 91%, 94%,95% validation accuracy for these different pre-trained models VGG16, Xception, InceptionV3.

Download Full-text

Accurate and Transferable Multitask Prediction of Chemical Properties with an Atoms-in-Molecule Neural Network

10.26434/chemrxiv.7151435.v2 ◽

2018 ◽

Author(s):

Roman Zubatyuk ◽

Justin S. Smith ◽

Jerzy Leszczynski ◽

Olexandr Isayev

Keyword(s):

Neural Network ◽

Molecular System ◽

Computational Cost ◽

Chemical Properties ◽

The State ◽

Molecular Properties ◽

Training Data ◽

Dft Methods ◽

Benchmark Datasets ◽

Quantum Phenomena

<p>Atomic and molecular properties could be evaluated from the fundamental Schrodinger’s equation and therefore represent different modalities of the same quantum phenomena. Here we present AIMNet, a modular and chemically inspired deep neural network potential. We used AIMNet with multitarget training to learn multiple modalities of the state of the atom in a molecular system. The resulting model shows on several benchmark datasets the state-of-the-art accuracy, comparable to the results of orders of magnitude more expensive DFT methods. It can simultaneously predict several atomic and molecular properties without an increase in computational cost. With AIMNet we show a new dimension of transferability: the ability to learn new targets utilizing multimodal information from previous training. The model can learn implicit solvation energy (like SMD) utilizing only a fraction of original training data, and archive MAD error of 1.1 kcal/mol compared to experimental solvation free energies in MNSol database.</p>

Download Full-text

Deep Learning-Based Differentiation between Mucinous Cystic Neoplasm and Serous Cystic Neoplasm in the Pancreas Using Endoscopic Ultrasonography

Diagnostics ◽

10.3390/diagnostics11061052 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1052

Author(s):

Leang Sim Nguon ◽

Kangwon Seo ◽

Jung-Hyun Lim ◽

Tae-Jun Song ◽

Sung-Hyun Cho ◽

...

Keyword(s):

Decision Making ◽

Deep Learning ◽

Network Model ◽

Endoscopic Ultrasonography ◽

Data Augmentation ◽

Clinical Information ◽

Training Data ◽

Fine Tuning ◽

Cystic Neoplasm ◽

Cystic Neoplasms

Mucinous cystic neoplasms (MCN) and serous cystic neoplasms (SCN) account for a large portion of solitary pancreatic cystic neoplasms (PCN). In this study we implemented a convolutional neural network (CNN) model using ResNet50 to differentiate between MCN and SCN. The training data were collected retrospectively from 59 MCN and 49 SCN patients from two different hospitals. Data augmentation was used to enhance the size and quality of training datasets. Fine-tuning training approaches were utilized by adopting the pre-trained model from transfer learning while training selected layers. Testing of the network was conducted by varying the endoscopic ultrasonography (EUS) image sizes and positions to evaluate the network performance for differentiation. The proposed network model achieved up to 82.75% accuracy and a 0.88 (95% CI: 0.817–0.930) area under curve (AUC) score. The performance of the implemented deep learning networks in decision-making using only EUS images is comparable to that of traditional manual decision-making using EUS images along with supporting clinical information. Gradient-weighted class activation mapping (Grad-CAM) confirmed that the network model learned the features from the cyst region accurately. This study proves the feasibility of diagnosing MCN and SCN using a deep learning network model. Further improvement using more datasets is needed.

Download Full-text

Collecting specialty-related medical terms: Development and evaluation of a resource for Spanish

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01495-w ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Pilar López-Úbeda ◽

Alexandra Pomares-Quimbaya ◽

Manuel Carlos Díaz-Galiano ◽

Stefan Schulz

Keyword(s):

Language Processing ◽

Classification Problem ◽

Snomed Ct ◽

Language Resources ◽

Clinical Specialty ◽

Controlled Vocabularies ◽

Clinical Text ◽

Domain Specific ◽

Medical Terms ◽

Core Vocabulary

Abstract Background Controlled vocabularies are fundamental resources for information extraction from clinical texts using natural language processing (NLP). Standard language resources available in the healthcare domain such as the UMLS metathesaurus or SNOMED CT are widely used for this purpose, but with limitations such as lexical ambiguity of clinical terms. However, most of them are unambiguous within text limited to a given clinical specialty. This is one rationale besides others to classify clinical text by the clinical specialty to which they belong. Results This paper addresses this limitation by proposing and applying a method that automatically extracts Spanish medical terms classified and weighted per sub-domain, using Spanish MEDLINE titles and abstracts as input. The hypothesis is biomedical NLP tasks benefit from collections of domain terms that are specific to clinical subdomains. We use PubMed queries that generate sub-domain specific corpora from Spanish titles and abstracts, from which token n-grams are collected and metrics of relevance, discriminatory power, and broadness per sub-domain are computed. The generated term set, called Spanish core vocabulary about clinical specialties (SCOVACLIS), was made available to the scientific community and used in a text classification problem obtaining improvements of 6 percentage points in the F-measure compared to the baseline using Multilayer Perceptron, thus demonstrating the hypothesis that a specialized term set improves NLP tasks. Conclusion The creation and validation of SCOVACLIS support the hypothesis that specific term sets reduce the level of ambiguity when compared to a specialty-independent and broad-scope vocabulary.

Download Full-text