P2V-MAP: Mapping Market Structures for Large Retail Assortments

The authors propose a new, exploratory approach for analyzing market structures that leverages two recent methodological advances in natural language processing and machine learning. They customize a neural network language model to derive latent product attributes by analyzing the co-occurrences of products in shopping baskets. Applying dimensionality reduction to the latent attributes yields a two-dimensional product map. This method is well-suited to retailers because it relies on data that are readily available from their checkout systems and facilitates their analyses of cross-category product complementarity, in addition to within-category substitution. The approach has high usability because it is automated, is scalable and does not require a priori assumptions. Its results are easy to interpret and update as new market basket data are collected. The authors validate their approach both by conducting an extensive simulation study and by comparing their results with those of state-of-the-art, econometric methods for modeling product relationships. The application of this approach using data collected at a leading German grocery retailer underlines its usefulness and provides novel findings that are relevant to assortment-related decisions.

Download Full-text

The domain-general multiple demand (MD) network does not support core aspects of language comprehension: a large-scale fMRI investigation

10.1101/744094 ◽

2019 ◽

Cited By ~ 2

Author(s):

Evgeniia Diachek ◽

Idan Blank ◽

Matthew Siegelman ◽

Josef Affourtit ◽

Evelina Fedorenko

Keyword(s):

Working Memory ◽

Left Hemisphere ◽

Language Processing ◽

Language Comprehension ◽

Large Scale ◽

Sentence Comprehension ◽

Task Demands ◽

Using Data ◽

Network Language ◽

Temporal Language

AbstractAside from the language-selective left-lateralized fronto-temporal network, language comprehension sometimes additionally recruits a domain-general bilateral fronto-parietal network implicated in executive functions: the multiple demand (MD) network. However, the nature of the MD network’s contributions to language comprehension remains debated. To illuminate the role of this network in language processing, we conducted a large-scale fMRI investigation using data from 30 diverse word and sentence comprehension experiments (481 unique participants, 678 scanning sessions). In line with prior findings, the MD network was active during many language tasks. Moreover, similar to the language-selective network, which is robustly lateralized to the left hemisphere, these responses were stronger in the left-hemisphere MD regions. However, in stark contrast with the language-selective network, the MD network responded more strongly (i) to lists of unconnected words than to sentences, and critically, (ii) in paradigms with an explicit task compared to passive comprehension paradigms. In fact, many passive comprehension tasks failed to elicit a response above the fixation baseline in the MD network, in contrast to strong responses in the language-selective network. In tandem, these results argue against a role for the MD network in core aspects of sentence comprehension like inhibiting irrelevant meanings or parses, keeping intermediate representations active in working memory, or predicting upcoming words or structures. These results align with recent evidence of relatively poor tracking of the linguistic signal by the MD regions during naturalistic comprehension, and instead suggest that the MD network’s engagement during language processing likely reflects effort associated with extraneous task demands.Significance StatementDomain-general executive processes, like working memory and cognitive control, have long been implicated in language comprehension, including in neuroimaging studies that have reported activation in domain-general multiple demand (MD) regions for linguistic manipulations. However, much prior evidence has come from paradigms where language interpretation is accompanied by extraneous tasks. Using a large fMRI dataset (30 experiments/481 participants/678 sessions), we demonstrate that MD regions are engaged during language comprehension in the presence of task demands, but not during passive reading/listening—conditions that strongly activate the fronto-temporal language network. These results present a fundamental challenge to proposals whereby linguistic computations, like inhibiting irrelevant meanings, keeping representations active in working memory, or predicting upcoming elements, draw on domain-general executive resources.

Download Full-text

A Two-Level Recurrent Neural Network Language Model Based on the Continuous Bag-of-Words Model for Sentence Classification

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213019500027 ◽

2019 ◽

Vol 28 (01) ◽

pp. 1950002

Author(s):

Yo Han Lee ◽

Dong W. Kim ◽

Myo Taeg Lim

Keyword(s):

Neural Network ◽

Language Processing ◽

Recurrent Neural Network ◽

Language Model ◽

Basic Structure ◽

Bag Of Words ◽

Fixed Length ◽

Sentence Classification ◽

Network Language ◽

Vector Representations

In this paper, a new two-level recurrent neural network language model (RNNLM) based on the continuous bag-of-words (CBOW) model for application to sentence classification is presented. The vector representations of words learned by a neural network language model have been shown to carry semantic sentiment and are useful in various natural language processing tasks. A disadvantage of CBOW is that it only considers the fixed length of a context because its basic structure is a neural network with a fixed length of input. In contrast, the RNNLM does not have a size limit for a context but only considers the previous context’s words. Therefore, the advantage of RNNLM is complementary to the disadvantage of CBOW. Herein, our proposed model encodes many linguistic patterns and improves upon sentiment analysis and question classification benchmarks compared to previously reported methods.

Download Full-text

Enhancing clinical concept extraction with contextual embeddings

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz096 ◽

2019 ◽

Vol 26 (11) ◽

pp. 1297-1304 ◽

Cited By ~ 29

Author(s):

Yuqi Si ◽

Jingqi Wang ◽

Hua Xu ◽

Kirk Roberts

Keyword(s):

Language Processing ◽

Semantic Information ◽

Medical Information ◽

State Of The Art ◽

Language Model ◽

The State ◽

Concept Extraction ◽

The Impact ◽

Clinical Concept ◽

Embedding Methods

Abstract Objective Neural network–based representations (“embeddings”) have dramatically advanced natural language processing (NLP) tasks, including clinical NLP tasks such as concept extraction. Recently, however, more advanced embedding methods and representations (eg, ELMo, BERT) have further pushed the state of the art in NLP, yet there are no common best practices for how to integrate these representations into clinical tasks. The purpose of this study, then, is to explore the space of possible options in utilizing these new models for clinical concept extraction, including comparing these to traditional word embedding methods (word2vec, GloVe, fastText). Materials and Methods Both off-the-shelf, open-domain embeddings and pretrained clinical embeddings from MIMIC-III (Medical Information Mart for Intensive Care III) are evaluated. We explore a battery of embedding methods consisting of traditional word embeddings and contextual embeddings and compare these on 4 concept extraction corpora: i2b2 2010, i2b2 2012, SemEval 2014, and SemEval 2015. We also analyze the impact of the pretraining time of a large language model like ELMo or BERT on the extraction performance. Last, we present an intuitive way to understand the semantic information encoded by contextual embeddings. Results Contextual embeddings pretrained on a large clinical corpus achieves new state-of-the-art performances across all concept extraction tasks. The best-performing model outperforms all state-of-the-art methods with respective F1-measures of 90.25, 93.18 (partial), 80.74, and 81.65. Conclusions We demonstrate the potential of contextual embeddings through the state-of-the-art performance these methods achieve on clinical concept extraction. Additionally, we demonstrate that contextual embeddings encode valuable semantic information not accounted for in traditional word representations.

Download Full-text

Active Learning for Effectively Fine-Tuning Transfer Learning to Downstream Task

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3446343 ◽

2021 ◽

Vol 12 (2) ◽

pp. 1-24

Author(s):

Md Abul Bashar ◽

Richi Nayak

Keyword(s):

Active Learning ◽

Transfer Learning ◽

Language Processing ◽

State Of The Art ◽

Language Model ◽

Ensemble Classifier ◽

Classification Performance ◽

Fine Tuning ◽

Linguistic Features ◽

Better Than

Language model (LM) has become a common method of transfer learning in Natural Language Processing (NLP) tasks when working with small labeled datasets. An LM is pretrained using an easily available large unlabelled text corpus and is fine-tuned with the labelled data to apply to the target (i.e., downstream) task. As an LM is designed to capture the linguistic aspects of semantics, it can be biased to linguistic features. We argue that exposing an LM model during fine-tuning to instances that capture diverse semantic aspects (e.g., topical, linguistic, semantic relations) present in the dataset will improve its performance on the underlying task. We propose a Mixed Aspect Sampling (MAS) framework to sample instances that capture different semantic aspects of the dataset and use the ensemble classifier to improve the classification performance. Experimental results show that MAS performs better than random sampling as well as the state-of-the-art active learning models to abuse detection tasks where it is hard to collect the labelled data for building an accurate classifier.

Download Full-text

CloudLM: a Cloud-based Language Model for Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2016-0002 ◽

2016 ◽

Vol 105 (1) ◽

pp. 51-61 ◽

Cited By ~ 1

Author(s):

Jorge Ferrández-Tordera ◽

Sergio Ortiz-Rojas ◽

Antonio Toral

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

State Of The Art ◽

Language Model ◽

Essential Element ◽

Language Models ◽

Language Modelling ◽

Statistical Approaches

Abstract Language models (LMs) are an essential element in statistical approaches to natural language processing for tasks such as speech recognition and machine translation (MT). The advent of big data leads to the availability of massive amounts of data to build LMs, and in fact, for the most prominent languages, using current techniques and hardware, it is not feasible to train LMs with all the data available nowadays. At the same time, it has been shown that the more data is used for a LM the better the performance, e.g. for MT, without any indication yet of reaching a plateau. This paper presents CloudLM, an open-source cloud-based LM intended for MT, which allows to query distributed LMs. CloudLM relies on Apache Solr and provides the functionality of state-of-the-art language modelling (it builds upon KenLM), while allowing to query massive LMs (as the use of local memory is drastically reduced), at the expense of slower decoding speed.

Download Full-text

Transformer-based deep neural network language models for Alzheimer’s disease detection from targeted speech

10.21203/rs.3.rs-49267/v2 ◽

2020 ◽

Author(s):

Alireza Roshanzamir ◽

Hamid Aghajan ◽

Mahdieh Soleymani Baghshah

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Natural Language ◽

Language Processing ◽

Language Model ◽

Large Datasets ◽

Language Models ◽

Classification Model ◽

Picture Description ◽

Network Language

Abstract Background: We developed transformer-based deep learning models based on natural language processing for early diagnosis of Alzheimer’s disease from the picture description test.Methods: The lack of large datasets poses the most important limitation for using complex models that do not require feature engineering. Transformer-based pre-trained deep language models have recently made a large leap in NLP research and application. These models are pre-trained on available large datasets to understand natural language texts appropriately, and are shown to subsequently perform well on classification tasks with small training sets. The overall classification model is a simple classifier on top of the pre-trained deep language model.Results: The models are evaluated on picture description test transcripts of the Pitt corpus, which contains data of 170 AD patients with 257 interviews and 99 healthy controls with 243 interviews. The large bidirectional encoder representations from transformers (BERTLarge) embedding with logistic regression classifier achieves classification accuracy of 88.08%, which improves thestate-of-the-art by 2.48%.Conclusions: Using pre-trained language models can improve AD prediction. This not only solves the problem of lack of sufficiently large datasets, but also reduces the need for expert-defined features.

Download Full-text

Where do Clinical Language Models Break Down? A Critical Behavioural Exploration of the ClinicalBERT Deep Transformer Model

Journal of Computational Vision and Imaging Systems ◽

10.15353/jcvis.v6i1.3548 ◽

2021 ◽

Vol 6 (1) ◽

pp. 1-4

Author(s):

Alexander MacLean ◽

Alexander Wong

Keyword(s):

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Language Model ◽

Language Models ◽

Clinical Knowledge ◽

Language Understanding ◽

Improved Performance ◽

Transformer Model ◽

Clinical Domain

The introduction of Bidirectional Encoder Representations from Transformers (BERT) was a major breakthrough for transfer learning in natural language processing, enabling state-of-the-art performance across a large variety of complex language understanding tasks. In the realm of clinical language modeling, the advent of BERT led to the creation of ClinicalBERT, a state-of-the-art deep transformer model pretrained on a wealth of patient clinical notes to facilitate for downstream predictive tasks in the clinical domain. While ClinicalBERT has been widely leveraged by the research community as the foundation for building clinical domain-specific predictive models given its overall improved performance in the Medical Natural Language inference (MedNLI) challenge compared to the seminal BERT model, the fine-grained behaviour and intricacies of this popular clinical language model has not been well-studied. Without this deeper understanding, it is very challenging to understand where ClinicalBERT does well given its additional exposure to clinical knowledge, where it doesn't, and where it can be improved in a meaningful manner. Motivated to garner a deeper understanding, this study presents a critical behaviour exploration of the ClinicalBERT deep transformer model using MedNLI challenge dataset to better understanding the following intricacies: 1) decision-making similarities between ClinicalBERT and BERT (leverage a new metric we introduce called Model Alignment), 2) where ClinicalBERT holds advantages over BERT given its clinical knowledge exposure, and 3) where ClinicalBERT struggles when compared to BERT. The insights gained about the behaviour of ClinicalBERT will help guide towards new directions for designing and training clinical language models in a way that not only addresses the remaining gaps and facilitates for further improvements in clinical language understanding performance, but also highlights the limitation and boundaries of use for such models.

Download Full-text

Generative Pre-Training from Molecules

10.33774/chemrxiv-2021-5fwjd ◽

2021 ◽

Author(s):

Sanjar Adilov

Keyword(s):

Language Processing ◽

State Of The Art ◽

Language Model ◽

Molecular Data ◽

Fine Tuning ◽

Model Parameters ◽

Property Prediction ◽

Machine Learning Methods ◽

Recent Success ◽

Language Construct

SMILES is a line notation for entering and representing molecules. Being inherently a language construct, it allows estimating molecular data in a self-supervised fashion by employing machine learning methods for natural language processing (NLP). The recent success of attention-based neural networks in NLP has made large-corpora transformer pretraining a de facto standard for learning representations and transferring knowledge to downstream tasks. In this work, we attempt to adapt transformer capabilities to a large SMILES corpus by constructing a GPT-2-like language model. We experimentally show that a pretrained causal transformer captures general knowledge that can be successfully transferred to such downstream tasks as focused molecule generation and single-/multi-output molecular-property prediction. For each task, we freeze model parameters and attach trainable lightweight networks between attention blocks—adapters—as alternative to fine-tuning. With a relatively modest setup, our transformer outperforms the recently proposed ChemBERTa transformer and approaches state-of-the-art MoleculeNet and Chemprop results. Overall, transformers pretrained on SMILES corpora are promising alternatives that do not require handcrafted feature engineering, make few assumptions about structure of data, and scale well with the pretraining data size.

Download Full-text

A Novel Sentence Completion System for Punjabi using Deep Neural Networks

International Journal of Software Innovation ◽

10.4018/ijsi.293271 ◽

2022 ◽

Vol 10 (1) ◽

pp. 0-0

Keyword(s):

Deep Neural Networks ◽

State Of The Art ◽

Search Algorithm ◽

Language Model ◽

Qualitative Evaluation ◽

Spoken Language ◽

Cognitive Effort ◽

Sentence Completion ◽

Network Language ◽

Develop State

Sentence completion systems are actively studied by many researchers which ultimately results in the reduction of cognitive effort and enhancement in user-experience. The review of the literature reveals that most of the work in the said area is in English and limited effort spent on other languages, especially vernacular languages. This work aims to develop state-of-the-art sentence completion system for the Punjabi language, which is the 10th most spoken language in the world. The presented work is an outcome of the results of the experimentation on various neural network language model combinations. A new Sentence Search Algorithm (SSA) and patching system are developed to search, complete and rank the completed sub-string and give a syntactically rich sentence(s). The quantitative and qualitative evaluation metrics were utilized to evaluate the system. The results are quite promising, and the best performing model is capable of completing a given sub-string with more acceptability. Best performing model is utilized for developing the user-interface.

Download Full-text

Parallel Image Captioning Using 2D Masked Convolution

Applied Sciences ◽

10.3390/app9091871 ◽

2019 ◽

Vol 9 (9) ◽

pp. 1871 ◽

Cited By ~ 1

Author(s):

Chanrith Poleak ◽

Jangwoo Kwon

Keyword(s):

Language Processing ◽

Short Term Memory ◽

State Of The Art ◽

Graphics Processing Unit ◽

Language Model ◽

Processing Unit ◽

Image Captioning ◽

Convolutional Network ◽

Training Time ◽

Parallel Graphics

Automatically generating a novel description of an image is a challenging and important problem that brings together advanced research in both computer vision and natural language processing. In recent years, image captioning has significantly improved its performance by using long short-term memory (LSTM) as a decoder for the language model. However, despite this improvement, LSTM itself has its own shortcomings as a model because the structure is complicated and its nature is inherently sequential. This paper proposes a model using a simple convolutional network for both encoder and decoder functions of image captioning, instead of the current state-of-the-art approach. Our experiment with this model on a Microsoft Common Objects in Context (MSCOCO) captioning dataset yielded results that are competitive with the state-of-the-art image captioning model across different evaluation metrics, while having a much simpler model and enabling parallel graphics processing unit (GPU) computation during training, resulting in a faster training time.

Download Full-text