Text mining stackoverflow

2016 ◽  
Vol 29 (2) ◽  
pp. 255-275 ◽  
Author(s):  
Arash Joorabchi ◽  
Michael English ◽  
Abdulhussain E. Mahdi

Purpose – The use of social media and in particular community Question Answering (Q & A) websites by learners has increased significantly in recent years. The vast amounts of data posted on these sites provide an opportunity to investigate the topics under discussion and those receiving most attention. The purpose of this paper is to automatically analyse the content of a popular computer programming Q & A website, StackOverflow (SO), determine the exact topics of posted Q & As, and narrow down their categories to help determine subject difficulties of learners. By doing so, the authors have been able to rank identified topics and categories according to their frequencies, and therefore, mark the most asked about subjects and, hence, identify the most difficult and challenging topics commonly faced by learners of computer programming and software development. Design/methodology/approach – In this work the authors have adopted a heuristic research approach combined with a text mining approach to investigate the topics and categories of Q & A posts on the SO website. Almost 186,000 Q & A posts were analysed and their categories refined using Wikipedia as a crowd-sourced classification system. After identifying and counting the occurrence frequency of all the topics and categories, their semantic relationships were established. This data were then presented as a rich graph which could be visualized using graph visualization software such as Gephi. Findings – Reported results and corresponding discussion has given an indication that the insight gained from the process can be further refined and potentially used by instructors, teachers, and educators to pay more attention to and focus on the commonly occurring topics/subjects when designing their course material, delivery, and teaching methods. Research limitations/implications – The proposed approach limits the scope of the analysis to a subset of Q & As which contain one or more links to Wikipedia. Therefore, developing more sophisticated text mining methods capable of analysing a larger portion of available data would improve the accuracy and generalizability of the results. Originality/value – The application of text mining and data analytics technologies in education has created a new interdisciplinary field of research between the education and information sciences, called Educational Data Mining (EDM). The work presented in this paper falls under this field of research; and it is an early attempt at investigating the practical applications of text mining technologies in the area of computer science (CS) education.

2019 ◽  
Vol 72 (1) ◽  
pp. 1-16
Author(s):  
Alton Y.K. Chua ◽  
Snehasish Banerjee

Purpose The purpose of this paper is to explore the use of community question answering sites (CQAs) on the topic of terrorism. Three research questions are investigated: what are the dominant themes reflected in terrorism-related questions? How do answer characteristics vary with question themes? How does users’ anonymity relate to question themes and answer characteristics? Design/methodology/approach Data include 300 questions that attracted 2,194 answers on the community question answering Yahoo! Answers. Content analysis was employed. Findings The questions reflected the community’s information needs ranging from the life of extremists to counter-terrorism policies. Answers were laden with negative emotions reflecting hate speech and Islamophobia, making claims that were rarely verifiable. Users who posted sensitive content generally remained anonymous. Practical implications This paper raises awareness of how CQAs are used to exchange information about sensitive topics such as terrorism. It calls for governments and law enforcement agencies to collaborate with major social media companies to develop a process for cross-platform blacklisting of users and content, as well as identifying those who are vulnerable. Originality/value Theoretically, it contributes to the academic discourse on terrorism in CQAs by exploring the type of questions asked, and the sort of answers they attract. Methodologically, the paper serves to enrich the literature around terrorism and social media that has hitherto mostly drawn data from Facebook and Twitter.


2018 ◽  
Vol 52 (3) ◽  
pp. 329-350 ◽  
Author(s):  
Abhishek Kumar Singh ◽  
Naresh Kumar Nagwani ◽  
Sudhakar Pandey

Purpose Recently, with a high volume of users and user’s content in Community Question Answering (CQA) sites, the quality of answers provided by users has raised a big concern. Finding the expert users can be a method to address this problem, which aims to find the suitable users (answerers) who can provide high-quality relevant answers. The purpose of this paper is to find the expert users for the newly posted questions of the CQA sites. Design/methodology/approach In this paper, a new algorithm, RANKuser, is proposed for identifying the expert users of CQA sites. The proposed RANKuser algorithm consists of three major stages. In the first stage, folksonomy relation between users, tags, and queries is established. User profile attributes, namely, reputation, tags, and badges, are also considered in folksonomy. In the second stage, expertise scores of the user are calculated based on reputation, badges, and tags. Finally, in the third stage, the expert users are identified by extracting top N users based on expertise score. Findings In this work, with the help of proposed ranking algorithm, expert users are identified for newly posted questions. In this paper, comparison of proposed user ranking algorithm (RANKuser) is also performed with other existing ranking algorithms, namely, ML-KNN, rankSVM, LDA, STM CQARank, and EV-based model using performance parameters such as hamming loss, accuracy, average precision, one error, F-measure, and normalized discounted cumulative gain. The proposed ranking method is also compared to the original ranking of CQA sites using the paired t-test. The experimental results demonstrate the effectiveness of the proposed RANKuser algorithm in comparison with the existing ranking algorithms. Originality/value This paper proposes and implements a new algorithm for expert user identification in CQA sites. By utilizing the folksonomy in CQA sites and information of user profile, this algorithm identifies the experts.


2015 ◽  
Vol 39 (1) ◽  
pp. 104-118 ◽  
Author(s):  
Alton Y.K Chua ◽  
Snehasish Banerjee

Purpose – The purpose of this paper is to investigate the ways in which effectiveness of answers in Yahoo! Answers, one of the largest community question answering sites (CQAs), is related to question types and answerer reputation. Effective answers are defined as those that are detailed, readable, superior in quality and contributed promptly. Five question types that were studied include factoid, list, definition, complex interactive and opinion. Answerer reputation refers to the past track record of answerers in the community. Design/methodology/approach – The data set comprises 1,459 answers posted in Yahoo! Answers in response to 464 questions that were distributed across the five question types. The analysis was done using factorial analysis of variance. Findings – The results indicate that factoid, definition and opinion questions are comparable in attracting high quality as well as readable answers. Although reputed answerers generally fared better in offering detailed and high-quality answers, novices were found to submit more readable responses. Moreover, novices were more prompt in answering factoid, list and definition questions. Originality/value – By analysing variations in answer effectiveness with a twin focus on question types and answerer reputation, this study explores a strand of CQA research that has hitherto received limited attention. The findings offer insights to users and designers of CQAs.


2020 ◽  
Vol 11 (9) ◽  
pp. 2169-2182
Author(s):  
Edib Smolo ◽  
Abubakar Muhammad Musa

Purpose The purpose of this paper is to discuss the concepts of hilah (legal stratagem or legal trick) and makhraj (legal exit) and to examine their relevance and application in the contemporary Islamic financial services and products. Design/methodology/approach This paper uses the qualitative research approach to provide a theoretical overview of hilah and makhraj literally and technically and to examine their practical applications in Islamic financial products and services. In particular, this paper evaluates several Islamic financial contracts and examines its practices in light of the implications of hilah or makhraj. Findings The paper finds that there is a glaring difference in perception and application of hilah and makhraj, as argued by some scholars. It has been found that the principle of hilah has been extensively used in the Islamic finance industry as a way to circumvent the riba prohibition. For example, Islamic financial instruments such as bay’ bithaman al-ajil, bay’ al-‘inah, tawarruq, commodity murabahah, musharakah mutanaqisah and, in some cases, the sale and lease back sukuk are found to be tainted by hilah. Research limitations/implications Because this is a theoretical paper, it should be explored in more detail, and critical analysis of Islamic financial services and products should be reviewed in line with these two principles to ascertain if the products and services are in line with Shariah requirements and devoid of hilah practices or not and to align the industry with the maqasid al-Shariah. Practical implications This paper identifies a serious challenge that Islamic finance practitioners face in product development in their effort to provide more competitive services to their customers. As a result, it demonstrates the need to proactively use makhraj in innovating Islamic financial products and proffering more sustainable and competitive solutions. Originality/value This paper discusses a topic that attempts to dispel the suspicious perceptions of some analysts as to the genuineness of Islamic financial practices.


2017 ◽  
Vol 51 (1) ◽  
pp. 17-34 ◽  
Author(s):  
Hei-Chia Wang ◽  
Che-Tsung Yang ◽  
Yi-Hao Yen

Purpose Community question answering (CQA) websites provide an open and free way to share knowledge about general topics on the internet. However, inquirers may not obtain useful answers and those who are qualified to provide answers may also miss opportunities to share their expertise without any notice. To address this problem, the purpose of this paper is to provide the means for inquirers to access archived answers and to identify effective subject matter experts for target questions. Design/methodology/approach This paper presents a question answering promoter, called QAP, for the CQA services. The proposed QAP facilitates the use of filtered archived answers regarded as explicit knowledge and recommended experts regarded as sources of implicit knowledge for the given target questions. Findings The experimental results indicate that QAP can leverage knowledge sharing by refining archived answers upon creditability and distributing raised questions to qualified potential experts. Research limitations/implications This proposed method is designed for the traditional Chinese corpus. Originality/value This paper proposed an integrated framework of answer selection and expert finding uses the bottom-up multipath evaluation algorithm, an underlying voting model, the agglomerative hierarchical clustering technique and feature approaches of answer trustworthiness measuring, identification of satisfied learners and credibility of repliers. The experiments using the corpus crawled from Yahoo! Knowledge Plus under designed scenarios are conducted and results are shown in fine details.


2021 ◽  
Vol 123 (13) ◽  
pp. 37-58
Author(s):  
Silvia Ranfagni ◽  
Monica Faraoni ◽  
Lamberto Zollo ◽  
Virginia Vannucci

PurposeThe purpose of this paper is to propose a research approach to investigate brand alignment by exploiting textual data from online brand communities in the coffee industry. Specifically, consumer brand associations from user-generated content (UGC) and company brand associations from firm-generated content (FGC) are explored to measure the alignment between brand identity and brand image. The selected context of research is the beverage industry wherein companies are called on to develop appropriate digital websites and brand communication strategies to enhance the consumers' brand experience.Design/methodology/approachThe authors introduce a research approach that integrates netnography with text mining analysis. Since brand associations were the basis of the study’s analysis, the authors focused on text mining procedures, providing data (co-occurrences) corresponding to brand associations that consumers perceive and that the company communicates. Data were used to develop the measurements of brand alignment.FindingsThe main findings of this research highlight the importance for both scholars and practitioners of determining brand alignment of beverage products in online communities. Knowing the alignment between the way a company communicates its brand identity and how this is perceived by consumers allows for effectively reviewing brand communication.Originality/valueAlthough the combined analysis of the alignment between brand image and brand identification has received attention in marketing literature, most scholars have neglected how to measure brand alignment. This is a need for many marketing managers in the coffee industry who are now moving in digital environments where the role of consumers is not that of receivers of brand communication but rather that of cocreators of brand value.


2020 ◽  
Vol 54 (4) ◽  
pp. 437-459 ◽  
Author(s):  
Ming Li ◽  
Ying Li ◽  
YingCheng Xu ◽  
Li Wang

PurposeIn community question answering (CQA), people who answer questions assume readers have mastered the content in the answers. Nevertheless, some readers cannot understand all content. Thus, there is a need for further explanation of the concepts that appear in the answers. Moreover, the large number of question and answer (Q&A) documents make manual retrieval difficult. This paper aims to alleviate these issues for CQA websites.Design/methodology/approachIn the paper, an algorithm for recommending explanatory Q&A documents is proposed. Q&A documents are modeled with the biterm topic model (BTM) (Yan et al., 2013). Then, the growing neural gas (GNG) algorithm (Fritzke, 1995) is used to cluster Q&A documents. To train multiple classifiers, three features are extracted from the Q&A categories. Thereafter, an ensemble classification model is constructed to identify the explanatory relationships. Finally, the explanatory Q&A documents are recommended.FindingsThe GNG algorithm shows good clustering performance. The ensemble classification model performs better than other classifiers. The both effect and quality scores of explanatory Q&A recommendations are high. These scores indicate the practicality and good performance of the proposed recommendation algorithm.Research limitations/implicationsThe proposed algorithm alleviates information overload in CQA from the new perspective of recommending explanatory knowledge. It provides new insight into research on recommendations in CQA. Moreover, in practice, CQA websites can use it to help retrieve Q&A documents and facilitate understanding of their contents. However, the algorithm is for the general recommendation of Q&A documents which does not consider individual personalized characteristics. In future work, personalized recommendations will be evaluated.Originality/valueA novel explanatory Q&A recommendation algorithm is proposed for CQA to alleviate the burden of manual retrieval and Q&A overload. The novel GNG clustering algorithm and ensemble classification model provide a more accurate way to identify explanatory Q&A documents. The method of ranking the explanatory Q&A documents improves the effectiveness and quality of the recommendation. The proposed algorithm improves the accuracy and efficiency of retrieving explanatory Q&A documents. It assists users in grasping answers easily.


2019 ◽  
Vol 3 (3) ◽  
pp. 348-372
Author(s):  
Zhengfa Yang ◽  
Qian Liu ◽  
Baowen Sun ◽  
Xin Zhao

Purpose This paper aims to make it convenient for those who have only just begun their research into Community Question Answering (CQA) expert recommendation, and for those who are already concerned with this issue, to ease the extension of our understanding with future research. Design/methodology/approach In this paper, keywords such as “CQA”, “Social Question Answering”, “expert recommendation”, “question routing” and “expert finding” are used to search major digital libraries. The final sample includes a list of 83 relevant articles authored in academia as well as industry that have been published from January 1, 2008 to March 1, 2019. Findings This study proposes a comprehensive framework to categorize extant studies into three broad areas of CQA expert recommendation research: understanding profile modeling, recommendation approaches and recommendation system impacts. Originality/value This paper focuses on discussing and sorting out the key research issues from these three research genres. Finally, it was found that conflicting and contradictory research results and research gaps in the existing research, and then put forward the urgent research topics.


2014 ◽  
Vol 31 (8) ◽  
pp. 15-18
Author(s):  
Denise Brush

Purpose – The purpose of this paper is to provide guidance to librarians about whether to keep or withdraw books on pre-Internet computer programming languages. Design/methodology/approach – For each of the programming languages considered, this article provides historical background and an assessment of current academic library collection needs. Findings – Many older languages (COBOL, FORTRAN, C, Lisp, Prolog, and Ada) are still in use and need reliable sources available for reference. Additionally, books about obsolete languages have educational value due to their influence on the development on newer languages such as C++ and Java. Practical applications – This information will be useful to academic librarians who want to make the best choices about keeping or withdrawing computer programming books. Originality/value – Most librarians responsible for managing computer science collections do not have a computer programming background, so they do not know which older languages are still important.


2019 ◽  
Vol 53 (4) ◽  
pp. 456-483 ◽  
Author(s):  
Ming Li ◽  
Lisheng Chen ◽  
Yingcheng Xu

Purpose A large number of questions are posted on community question answering (CQA) websites every day. Providing a set of core questions will ease the question overload problem. These core questions should cover the main content of the original question set. There should be low redundancy within the core questions and a consistent distribution with the original question set. The paper aims to discuss these issues. Design/methodology/approach In the paper, a method named QueExt method for extracting core questions is proposed. First, questions are modeled using a biterm topic model. Then, these questions are clustered based on particle swarm optimization (PSO). With the clustering results, the number of core questions to be extracted from each cluster can be determined. Afterwards, the multi-objective PSO algorithm is proposed to extract the core questions. Both PSO algorithms are integrated with operators in genetic algorithms to avoid the local optimum. Findings Extensive experiments on real data collected from the famous CQA website Zhihu have been conducted and the experimental results demonstrate the superior performance over other benchmark methods. Research limitations/implications The proposed method provides new insight into and enriches research on information overload in CQA. It performs better than other methods in extracting core short text documents, and thus provides a better way to extract core data. The PSO is a novel method used for selecting core questions. The research on the application of the PSO model is expanded. The study also contributes to research on PSO-based clustering. With the integration of K-means++, the key parameter number of clusters is optimized. Originality/value The novel core question extraction method in CQA is proposed, which provides a novel and efficient way to alleviate the question overload. The PSO model is extended and novelty used in selecting core questions. The PSO model is integrated with K-means++ method to optimize the number of clusters, which is just the key parameter in text clustering based on PSO. It provides a new way to cluster texts.


Sign in / Sign up

Export Citation Format

Share Document