Learning semantic representation with neural networks for community question answering retrieval

2016 ◽  
Vol 93 ◽  
pp. 75-83 ◽  
Author(s):  
Guangyou Zhou ◽  
Yin Zhou ◽  
Tingting He ◽  
Wensheng Wu
2018 ◽  
Vol 30 (6) ◽  
pp. 1647-1672 ◽  
Author(s):  
Bei Wu ◽  
Bifan Wei ◽  
Jun Liu ◽  
Zhaotong Guo ◽  
Yuanhao Zheng ◽  
...  

Most community question answering (CQA) websites manage plenty of question-answer pairs (QAPs) through topic-based organizations, which may not satisfy users' fine-grained search demands. Facets of topics serve as a powerful tool to navigate, refine, and group the QAPs. In this work, we propose FACM, a model to annotate QAPs with facets by extending convolution neural networks (CNNs) with a matching strategy. First, phrase information is incorporated into text representation by CNNs with different kernel sizes. Then, through a matching strategy among QAPs and facet label texts (FaLTs) acquired from Wikipedia, we generate similarity matrices to deal with the facet heterogeneity. Finally, a three-channel CNN is trained for facet label assignment of QAPs. Experiments on three real-world data sets show that FACM outperforms the state-of-the-art methods.


Information ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 48
Author(s):  
Alejandro Figueroa ◽  
Billy Peralta ◽  
Orietta Nicolis

For almost every online service, it is fundamental to understand patterns, differences and trends revealed by age demographic analysis—for example, take the discovery of malicious activity, including identity theft, violation of community guidelines and fake profiles. In the particular case of platforms such as Facebook, Twitter and Yahoo! Answers, user demographics have impacts on their revenues and user experience; demographics assist in ensuring that the needs of each cohort are fulfilled via personalizing and contextualizing content. Despite the fact that technology has been made more accessible, thereby becoming evermore prevalent in both personal and professional lives alike, older people continue to trail Gen Z and Millennials in its adoption. This trailing brings about an under-representation that has a harmful influence on the demographic analysis and on supervised machine learning models. To that end, this paper pioneers attempts at examining this and other major challenges facing three distinct modalities when dealing with community question answering (cQA) platforms (i.e., texts, images and metadata). As for textual inputs, we propose an age-batched greedy curriculum learning (AGCL) approach to lessen the effects of their inherent class imbalances. When built on top of FastText shallow neural networks, AGCL achieved an increase of ca. 4% in macro-F1-score with respect to baseline systems (i.e., off-the-shelf deep neural networks). With regard to metadata, our experiments show that random forest classifiers significantly improve their performance when individuals close to generational borders are excluded (up to 20% more accuracy); and by experimenting with neural network-based visual classifiers, we discovered that images are the most challenging modality for age prediction. In fact, it is hard for a visual inspection to connect profile pictures with age cohorts, and there are considerable differences in their group distributions with respect to meta-data and textual inputs. All in all, we envisage that our findings will be highly relevant as guidelines for constructing assorted multimodal supervised models for automatic age recognition across cQA platforms.


2017 ◽  
Author(s):  
Sheng Zhang ◽  
Jiajun Cheng ◽  
Hui Wang ◽  
Xin Zhang ◽  
Pei Li ◽  
...  

2017 ◽  
Vol 3 (2) ◽  
pp. 51-65
Author(s):  
Daniele Bonadiman ◽  
Antonio Uva ◽  
Alessandro Moschitti

2015 ◽  
Author(s):  
Xiaoqiang Zhou ◽  
Baotian Hu ◽  
Qingcai Chen ◽  
Buzhou Tang ◽  
Xiaolong Wang

2017 ◽  
Vol 24 (4) ◽  
pp. 505-509 ◽  
Author(s):  
Yang Xiang ◽  
Qingcai Chen ◽  
Xiaolong Wang ◽  
Yang Qin

Author(s):  
Ryan Cotterell ◽  
Hinrich Schütze

Much like sentences are composed of words, words themselves are composed of smaller units. For example, the English word questionably can be analyzed as question+ able+ ly. However, this structural decomposition of the word does not directly give us a semantic representation of the word’s meaning. Since morphology obeys the principle of compositionality, the semantics of the word can be systematically derived from the meaning of its parts. In this work, we propose a novel probabilistic model of word formation that captures both the analysis of a word w into its constituent segments and the synthesis of the meaning of w from the meanings of those segments. Our model jointly learns to segment words into morphemes and compose distributional semantic vectors of those morphemes. We experiment with the model on English CELEX data and German DErivBase (Zeller et al., 2013) data. We show that jointly modeling semantics increases both segmentation accuracy and morpheme F1 by between 3% and 5%. Additionally, we investigate different models of vector composition, showing that recurrent neural networks yield an improvement over simple additive models. Finally, we study the degree to which the representations correspond to a linguist’s notion of morphological productivity.


Sign in / Sign up

Export Citation Format

Share Document