Building Chatbots from Forum Data: Model Selection Using Question Answering Metrics

International audience Recently, Azari et al (2006) showed that (AIC) criterion and its corrected versions cannot be directly applied to model selection for longitudinal data with correlated errors. They proposed two model selection criteria, AICc and RICc, by applying likelihood and residual likelihood approaches. These two criteria are estimators of the Kullback-Leibler's divergence distance which is asymmetric. In this work, we apply the likelihood and residual likelihood approaches to propose two new criteria, suitable for small samples longitudinal data, based on the Kullback's symmetric divergence. Their performance relative to others criteria is examined in a large simulation study

Download Full-text

Evaluation of the Benchmark Dose Method for Dichotomous Data: Model Dependence and Model Selection

Regulatory Toxicology and Pharmacology ◽

10.1006/rtph.2002.1578 ◽

2002 ◽

Vol 36 (2) ◽

pp. 184-197 ◽

Cited By ~ 30

Author(s):

Salomon Sand ◽

Agneta Falk Filipsson ◽

Katarina Victorin

Keyword(s):

Model Selection ◽

Data Model ◽

Benchmark Dose ◽

Model Dependence ◽

Dichotomous Data ◽

Dose Method

Download Full-text

Longitudinal data model selection

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2005.05.009 ◽

2006 ◽

Vol 50 (11) ◽

pp. 3053-3066 ◽

Cited By ~ 17

Author(s):

Rahman Azari ◽

Lexin Li ◽

Chih-Ling Tsai

Keyword(s):

Model Selection ◽

Longitudinal Data ◽

Data Model

Download Full-text

Criteria to Validate Count Data Model Selection

Springer Proceedings in Mathematics & Statistics - Stochastic Models, Statistics and Their Applications ◽

10.1007/978-3-030-28665-1_32 ◽

2019 ◽

pp. 429-436

Author(s):

Annika Homburg

Keyword(s):

Model Selection ◽

Count Data ◽

Data Model ◽

Count Data Model

Download Full-text

Corpus-Based Paraphrase Detection Experiments and Review

Information ◽

10.3390/info11050241 ◽

2020 ◽

Vol 11 (5) ◽

pp. 241

Author(s):

Tedo Vrbanec ◽

Ana Meštrović

Keyword(s):

Deep Learning ◽

Model Selection ◽

Question Answering ◽

State Of The Art ◽

Detection Threshold ◽

Text Summarization ◽

Distance Measures ◽

Authorship Attribution ◽

Learning Models ◽

Plagiarism Detection

Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc. In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection. We report the results of eight models (LSI, TF-IDF, Word2Vec, Doc2Vec, GloVe, FastText, ELMO, and USE) evaluated on three different public available corpora: Microsoft Research Paraphrase Corpus, Clough and Stevenson and Webis Crowd Paraphrase Corpus 2011. Through a great number of experiments, we decided on the most appropriate approaches for text pre-processing: hyper-parameters, sub-model selection—where they exist (e.g., Skipgram vs. CBOW), distance measures, and semantic similarity/paraphrase detection threshold. Our findings and those of other researchers who have used deep learning models show that DL models are very competitive with traditional state-of-the-art approaches and have potential that should be further developed.

Download Full-text