Identifying Duplicate Questions in Community Question Answering Forums Using Machine Learning Approaches

Author(s):  
Divya Vanam ◽  
Venkateswara Rao Pulipati
2018 ◽  
Vol 25 (1) ◽  
pp. 5-41
Author(s):  
PRESLAV NAKOV ◽  
LLUÍS MÀRQUEZ ◽  
ALESSANDRO MOSCHITTI ◽  
HAMDY MUBARAK

AbstractWe analyze resources and models for Arabic community Question Answering (cQA). In particular, we focus on CQA-MD, our cQA corpus for Arabic in the domain of medical forums. We describe the corpus and the main challenges it poses due to its mix of informal and formal language, and of different Arabic dialects, as well as due to its medical nature. We further present a shared task on cQA at SemEval, the International Workshop on Semantic Evaluation, based on this corpus. We discuss the features and the machine learning approaches used by the teams who participated in the task, with focus on the models that exploit syntactic information using convolutional tree kernels and neural word embeddings. We further analyze and extend the outcome of the SemEval challenge by training a meta-classifier combining the output of several systems. This allows us to compare different features and different learning algorithms in an indirect way. Finally, we analyze the most frequent errors common to all approaches, categorizing them into prototypical cases, and zooming into the way syntactic information in tree kernel approaches can help solve some of the most difficult cases. We believe that our analysis and the lessons learned from the process of corpus creation as well as from the shared task analysis will be helpful for future research on Arabic cQA.


2021 ◽  
Author(s):  
Wai Keen Vong ◽  
Brenden M. Lake

In order to learn the mappings from words to referents, children must integrate co-occurrence information across individually ambiguous pairs of scenes and utterances, a challenge known as cross-situational word learning. In machine learning, recent multimodal neural networks have been shown to learn meaningful visual-linguistic mappings from cross-situational data, as needed to solve problems such as image captioning and visual question answering. These networks are potentially appealing as cognitive models because they can learn from raw visual and linguistic stimuli, something previous cognitive models have not addressed. In this paper, we examine whether recent machine learning approaches can help explain various behavioral phenomena from the psychological literature on cross-situational word learning. We consider two variants of a multimodal neural network architecture, and look at seven different phenomena associated with cross-situational word learning, and word learning more generally. Our results show that these networks can learn word-referent mappings from a single epoch of training, matching the amount of training found in cross-situational word learning experiments. Additionally, these networks capture some, but not all of the phenomena we studied, with all of the failures related to reasoning via mutual exclusivity. These results provide insight into the kinds of phenomena that arise naturally from relatively generic neural network learning algorithms, and which word learning phenomena require additional inductive biases.


Community question answering CQA) systems are rapidly gaining attention in the society. Several researchers have actively engaged in improving the theories associated with question answering (QA) systems. This paper reviews the literature reported works on question answering QA systems. In this paper, we discuss on the early contributions on QA systems along with their present and future scope. We have categorized the literature reported works into 20 subgroups according to their significance and relevance. The works in each group will be brought out along with their inter-relevance. Finding the question and answer quality is the prime challenge almost addressed by many researchers. Modeling similar questions, identifying experts in prior and understanding seeker satisfaction also considered as potential challenges. Researchers at the most have done experimentations on popular CQAs like Yahoo! Answers, Wiki Answers, Baidu Knows, Brianly, Quora, Pubmed and Stack Overflow respectively. Machine learning, probabilistic modeling, deep learning and hybrid approach of solving show profound significance in addressing various challenges encounter with QA systems. Today the paradigm of CQA systems took the shift by serving as Open Educational Resources to learning community


2017 ◽  
Vol 56 (03) ◽  
pp. 209-216 ◽  
Author(s):  
Said Ouatik El Alaoui ◽  
Mourad Sarrouti

SummaryBackground and Objective: Biomedical question type classification is one of the important components of an automatic biomedical question answering system. The performance of the latter depends directly on the performance of its biomedical question type classification system, which consists of assigning a category to each question in order to determine the appropriate answer extraction algorithm. This study aims to automatically classify biomedical questions into one of the four categories: (1) yes/no, (2) factoid, (3) list, and (4) summary.Methods: In this paper, we propose a biomedical question type classification method based on machine learning approaches to automatically assign a category to a biomedical question. First, we extract features from biomedical questions using the proposed handcrafted lexico-syntactic patterns. Then, we feed these features for machine- learning algorithms. Finally, the class label is predicted using the trained classifiers.Results: Experimental evaluations performed on large standard annotated datasets of biomedical questions, provided by the BioASQ challenge, demonstrated that our method exhibits significant improved performance when compared to four baseline systems. The proposed method achieves a roughly 10-point increase over the best baseline in terms of accuracy. Moreover, the obtained results show that using handcrafted lexico-syntactic patterns as features’ provider of support vector machine (SVM) lead to the highest accuracy of 89.40%.Conclusion: The proposed method can automatically classify BioASQ questions into one of the four categories: yes/no, factoid, list, and summary. Furthermore, the results demonstrated that our method produced the best classification performance compared to four baseline systems.


2019 ◽  
Vol 19 (5-6) ◽  
pp. 636-653
Author(s):  
TIANTIAN GAO ◽  
PAUL FODOR ◽  
MICHAEL KIFER

AbstractThe inherent difficulty of knowledge specification and the lack of trained specialists are some of the key obstacles on the way to making intelligent systems based on the knowledge representation and reasoning (KRR) paradigm commonplace.Knowledge and query authoringusing natural language, especiallycontrollednatural language (CNL), is one of the promising approaches that could enable domain experts, who are not trained logicians, to both create formal knowledge and query it. In previous work, we introduced theKALMsystem (Knowledge Authoring Logic Machine) that supports knowledge authoring (and simple querying) with very high accuracy that at present is unachievable via machine learning approaches. The present paper expands on the question answering aspect of KALM and introducesKALM-QA(KALM for Question Answering) that is capable of answering much more complex English questions. We show that KALM-QA achieves 100% accuracy on an extensive suite of movie-related questions, calledMetaQA, which contains almost 29,000 test questions and over 260,000 training questions. We contrast this with a published machine learning approach, which falls far short of this high mark.


In today’s world, due to the steep rise in internet users, Community Question Answering (CQA) has attracted many research communities. In order to provide the correct and perfect answer to the user asked question from a given large collection of text data, understanding the question properly to suggest a precise answer is a challenging task. Therefore, Question Answering (QA) system is a challenging task than a common information retrieval task done by many search engines. In this paper, an automatic prediction of the quality of CQA answers is proposed. This is accomplished by using five well known machine learning algorithms. Usually, questions asked by the user are based on a topic or theme. We try to exploit this feature in our work by identifying the category of the question posted and further map with the corresponding question. Similarly, for the answers posted by the multiple user’s are processed as answer for category mapping. Here, the results show that for Question Classification (QA), Linear Support Vector Classification (LSVC) is found to be the best classifier and Multinomial Logistic Regression (MLR) is the most suitable for Answer Classification (AC). The MS Macro dataset is used as the underlying dataset for retrieving and testing the question and answer classifiers. The Yahoo Answers are used as a golden reference during the testing throughout our experiments. Experiments results show that the proposed technique is efficient and outperforms Metzler and Kanungo’s (MK++) [1] while providing the best answer summary satisfying the user’s queries.


2019 ◽  
Vol 70 (3) ◽  
pp. 214-224
Author(s):  
Bui Ngoc Dung ◽  
Manh Dzung Lai ◽  
Tran Vu Hieu ◽  
Nguyen Binh T. H.

Video surveillance is emerging research field of intelligent transport systems. This paper presents some techniques which use machine learning and computer vision in vehicles detection and tracking. Firstly the machine learning approaches using Haar-like features and Ada-Boost algorithm for vehicle detection are presented. Secondly approaches to detect vehicles using the background subtraction method based on Gaussian Mixture Model and to track vehicles using optical flow and multiple Kalman filters were given. The method takes advantages of distinguish and tracking multiple vehicles individually. The experimental results demonstrate high accurately of the method.


Sign in / Sign up

Export Citation Format

Share Document