Spam detection and high-quality features to analyse question –answer pairs

2020 ◽  
Vol 38 (5/6) ◽  
pp. 1013-1033
Author(s):  
Hei Chia Wang ◽  
Yu Hung Chiang ◽  
Si Ting Lin

Purpose In community question and answer (CQA) services, because of user subjectivity and the limits of knowledge, the distribution of answer quality can vary drastically – from highly related to irrelevant or even spam answers. Previous studies of CQA portals have faced two important issues: answer quality analysis and spam answer filtering. Therefore, the purposes of this study are to filter spam answers in advance using two-phase identification methods and then automatically classify the different types of question and answer (QA) pairs by deep learning. Finally, this study proposes a comprehensive study of answer quality prediction for different types of QA pairs. Design/methodology/approach This study proposes an integrated model with a two-phase identification method that filters spam answers in advance and uses a deep learning method [recurrent convolutional neural network (R-CNN)] to automatically classify various types of questions. Logistic regression (LR) is further applied to examine which answer quality features significantly indicate high-quality answers to different types of questions. Findings There are four prominent findings. (1) This study confirms that conducting spam filtering before an answer quality analysis can reduce the proportion of high-quality answers that are misjudged as spam answers. (2) The experimental results show that answer quality is better when question types are included. (3) The analysis results for different classifiers show that the R-CNN achieves the best macro-F1 scores (74.8%) in the question type classification module. (4) Finally, the experimental results by LR show that author ranking, answer length and common words could significantly impact answer quality for different types of questions. Originality/value The proposed system is simultaneously able to detect spam answers and provide users with quick and efficient retrieval mechanisms for high-quality answers to different types of questions in CQA. Moreover, this study further validates that crucial features exist among the different types of questions that can impact answer quality. Overall, an identification system automatically summarises high-quality answers for each different type of questions from the pool of messy answers in CQA, which can be very useful in helping users make decisions.

2017 ◽  
Vol 29 (6) ◽  
pp. 793-806 ◽  
Author(s):  
PengPeng Hu ◽  
Taku Komura ◽  
Duan Li ◽  
Ge Wu ◽  
Yueqi Zhong

Purpose The purpose of this paper is to present a novel framework of reconstructing the 3D textile model with synthesized texture. Design/methodology/approach First, a pipeline of 3D textile reconstruction based on KinectFusion is proposed to obtain a better 3D model. Second, “DeepTextures” method is applied to generate new textures for various three-dimensional textile models. Findings Experimental results show that the proposed method can conveniently reconstruct a three-dimensional textile model with synthesized texture. Originality/value A novel pipeline is designed to obtain 3D high-quality textile models based on KinectFusion. The accuracy and robustness of KinectFusion are improved via a turntable. To the best of the authors’ knowledge, this is the first paper to explore the synthesized textile texture for the 3D textile model. This is not only simply mapping the texture onto the 3D model, but also exploring the application of artificial intelligence in the field of textile.


Author(s):  
Lei Xing-lin ◽  
Huang Shan-fan ◽  
Guo Zhong-xiao ◽  
Guo Xiao-yu

As a safety device to alleviate the loss of reactor coolant, the siphon breaking system is widely used in nuclear power plant. Researchers are very interested in this technique for its “passive” characteristic. Vertical downward air-water two-phase flow is encountered in the siphon breaking process. Previous researches have been more focused on some physical parameters, such as water flow rate, air flow rate, pressure drop and the undershooting height. Void fraction, as a key parameter in multiphase flow, should be studied in the siphon breaking phenomenon. Therefore, a needle-contact capacitance probe is used for flow-phase identification and a single-wire capacitance for obtaining the average value of gas distribution along the straight line. Experimental results show that the flow pattern during the vertical downward air-water two-phase flow is mostly annular flow. With the gas entering the pipeline, void fraction profile against time can be divided into three stages. The slope in the first stage is similar to that in the third. However, the slope slows down in the middle stage. The experimental results also show that the real duration time to break the siphon flow is as short as about 6 s. The void fraction at the end of the siphon breaking process is about 0.38. During this stage, a large amount of gas is sucked into the downcomer and little water is inhaled. The gas phase results in a convergent effect, where the air intake is the direct and fundamental reason of siphon breaking.


2019 ◽  
Vol 36 (6) ◽  
pp. 1913-1933
Author(s):  
Amitava Choudhury ◽  
Snehanshu Pal ◽  
Ruchira Naskar ◽  
Amitava Basumallick

PurposeThe purpose of this paper is to develop an automated phase segmentation model from complex microstructure. The mechanical and physical properties of metals and alloys are influenced by their microstructure, and therefore the investigation of microstructure is essential. Coexistence of random or sometimes patterned distribution of different microstructural features such as phase, grains and defects makes microstructure highly complex, and accordingly identification or recognition of individual phase, grains and defects within a microstructure is difficult.Design/methodology/approachIn this perspective, computer vision and image processing techniques are effective to help in understanding and proper interpretation of microscopic image. Microstructure-based image processing mainly focuses on image segmentation, boundary detection and grain size approximation. In this paper, a new approach is presented for automated phase segmentation from 2D microstructure images. The benefit of the proposed work is to identify dominated phase from complex microstructure images. The proposed model is trained and tested with 373 different ultra-high carbon steel (UHCS) microscopic images.FindingsIn this paper, Sobel and Watershed transformation algorithms are used for identification of dominating phases, and deep learning model has been used for identification of phase class from microstructural images.Originality/valueFor the first time, the authors have implemented edge detection followed by watershed segmentation and deep learning (convolutional neural network) to identify phases of UHCS microstructure.


2020 ◽  
Vol 72 (6) ◽  
pp. 887-907
Author(s):  
Lei Li ◽  
Chengzhi Zhang ◽  
Daqing He

PurposeWith the growth in popularity of academic social networking sites, evaluating the quality of the academic information they contain has become increasingly important. Users' evaluations of this are based on predefined criteria, with external factors affecting how important these are seen to be. As few studies on these influences exist, this research explores the factors affecting the importance of criteria used for judging high-quality answers on academic social Q&A sites.Design/methodology/approachScholars who had recommended answers on ResearchGate Q&A were asked to complete a questionnaire survey to rate the importance of various criteria for evaluating the quality of these answers. Statistical analysis methods were used to analyze the data from 215 questionnaires to establish the influence of scholars' demographic characteristics, the question types, the discipline and the combination of these factors on the importance of each evaluation criterion.FindingsParticular disciplines and academic positions had a significant impact on the importance ratings of the criteria of relevance, completeness and credibility. Also, some combinations of factors had a significant impact: for example, older scholars tended to view verifiability as more important to the quality of answers to information-seeking questions than to discussion-seeking questions within the LIS and Art disciplines.Originality/valueThis research can help academic social Q&A platforms recommend high-quality answers based on different influencing factors, in order to meet the needs of scholars more effectively.


2019 ◽  
Vol 30 (6) ◽  
pp. 3139-3162 ◽  
Author(s):  
Elzbieta Fornalik-Wajs ◽  
Aleksandra Roszko ◽  
Janusz Donizak ◽  
Anna Kraszewska

Purpose Nanofluids’ properties made them interesting for various areas like engineering, medicine or cosmetology. Discussed here, research pertains to specific problem of heat transfer enhancement with application of the magnetic field. The main idea was to transfer high heat rates with utilization of nanofluids including metallic non-ferrous particles. The expectation was based on changed nanofluid properties. However, the results of experimental analysis did not meet it. The heat transfer effect was smaller than in the case of base fluid. The only way to understand the process was to involve the computational fluid dynamics, which could help to clarify this issue. The purpose of this research is deep understanding of the external magnetic field effect on the nanofluids heat transfer. Design/methodology/approach In presented experimental and numerical studies, the water and silver nanofluids were considered. From the numerical point of view, three approaches to model the nanofluid in the strong magnetic field were used: single-phase Euler, Euler–Euler and Euler–Lagrange. In two-phase approach, the momentum transfer equations for individual phases were coupled through the interphase momentum transfer term expressing the volume force exerted by one phase on the second one. Findings Therefore, the results of numerical simulation predicted decrease of convection heat transfer for nanofluid with respect to pure water, which agreed with the experimental results. The experimental and numerical results are in good agreement with each other, which confirms the right choice of two-phase approach in analysis of nanofluid thermo-magnetic convection. Originality/value The Euler–Lagrange exhibit the best matching with the experimental results.


2016 ◽  
Vol 33 (6) ◽  
pp. 1753-1766 ◽  
Author(s):  
Chin-Fu Kuo ◽  
Yung-Feng Lu ◽  
Bao-Rong Chang

Purpose – The purpose of this paper is to investigate the scheduling problem of real-time jobs executing on a DVS processor. The jobs must complete their executions by their deadlines and the energy consumption also must be minimized. Design/methodology/approach – The two-phase energy-efficient scheduling algorithm is proposed to solve the scheduling problem for real-time jobs. In the off-line phase, the maximum instantaneous total density and instantaneous total density (ITD) are proposed to derive the speed of the processor for each time instance. The derived speeds are saved for run time. In the on-line phase, the authors set the processor speed according to the derived speeds and set a timer to expire at the corresponding end time instance of the used speed. Findings – When the DVS processor executes a job at a proper speed, the energy consumption of the system can be minimized. Research limitations/implications – This paper does not consider jobs with precedence constraints. It can be explored in the further work. Practical implications – The experimental results of the proposed schemes are presented to show the effectiveness. Originality/value – The experimental results show that the proposed scheduling algorithm, ITD, can achieve energy saving and make the processor fully utilized.


2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Lucas Willian Aguiar Mattias ◽  
Carlos Andres Millan Paramo

Purpose This paper analyzes the effect that is generated in the dynamic response of a Commonwealth Advisory Aeronautical Council building for different types of power spectra. This article also compares synthetic wind method (SWM) results with wind tunnel tests and other numerical approaches.Design/methodology/approach One of the main methodologies developed in Brazil, the SWM, is employed to determine the dynamic wind loads. The Davenport, Lumley and Panowski, Harris, von Karman and Kaimal model are used in SWM to generate the resonant harmonics. Lateral pressures are calculated by the wind speed deflection profile for 30, 35, 40 and 45 m/s. The structure is processed in Autodesk Robot Structural Analysis with numerical analysis in FEM by the Hilber–Hughes–Taylor method. To corroborate the synthetic wind with experimental results, displacement curves are developed for wind tunnel experimental results, Davenport method, Eurocode and NBR 6123, together with the SWM.Findings Results show that for 30 m/s, the lowest convergence of the power spectra models was presented and that the greatest difference found was below 10%. In addition, it was shown that Eurocode 1-4 can lead to oversizing, while NBR 6123 can lead to undersizing, compared with the experimental results. Finally, results by the Davenport method, wind tunnel test and synthetic wind showed good accuracy.Originality/value By carrying out this comparative analysis, this work presents an important contribution in the field of calculating the dynamic response of tall buildings. Studies with these comparisons to corroborate the SWM had not yet been carried out.


2022 ◽  
Vol 31 (1) ◽  
pp. 1-46
Author(s):  
Chao Liu ◽  
Cuiyun Gao ◽  
Xin Xia ◽  
David Lo ◽  
John Grundy ◽  
...  

Context: Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. This is because they can often solve many SE challenges without enormous manual feature engineering effort and complex domain knowledge. Objective: Although many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness, they often ignore two factors: (1) reproducibility —whether the reported experimental results can be obtained by other researchers using authors’ artifacts (i.e., source code and datasets) with the same experimental setup; and (2) replicability —whether the reported experimental result can be obtained by other researchers using their re-implemented artifacts with a different experimental setup. We observed that DL studies commonly overlook these two factors and declare them as minor threats or leave them for future work. This is mainly due to high model complexity with many manually set parameters and the time-consuming optimization process, unlike classical supervised machine learning (ML) methods (e.g., random forest). This study aims to investigate the urgency and importance of reproducibility and replicability for DL studies on SE tasks. Method: In this study, we conducted a literature review on 147 DL studies recently published in 20 SE venues and 20 AI (Artificial Intelligence) venues to investigate these issues. We also re-ran four representative DL models in SE to investigate important factors that may strongly affect the reproducibility and replicability of a study. Results: Our statistics show the urgency of investigating these two factors in SE, where only 10.2% of the studies investigate any research question to show that their models can address at least one issue of replicability and/or reproducibility. More than 62.6% of the studies do not even share high-quality source code or complete data to support the reproducibility of their complex models. Meanwhile, our experimental results show the importance of reproducibility and replicability, where the reported performance of a DL model could not be reproduced for an unstable optimization process. Replicability could be substantially compromised if the model training is not convergent, or if performance is sensitive to the size of vocabulary and testing data. Conclusion: It is urgent for the SE community to provide a long-lasting link to a high-quality reproduction package, enhance DL-based solution stability and convergence, and avoid performance sensitivity on different sampled data.


2018 ◽  
Vol 70 (3) ◽  
pp. 269-287 ◽  
Author(s):  
Lei Li ◽  
Daqing He ◽  
Chengzhi Zhang ◽  
Li Geng ◽  
Ke Zhang

Purpose Academic social (question and answer) Q&A sites are now utilised by millions of scholars and researchers for seeking and sharing discipline-specific information. However, little is known about the factors that can affect their votes on the quality of an answer, nor how the discipline might influence these factors. The paper aims to discuss this issue. Design/methodology/approach Using 1,021 answers collected over three disciplines (library and information services, history of art, and astrophysics) in ResearchGate, statistical analysis is performed to identify the characteristics of high-quality academic answers, and comparisons were made across the three disciplines. In particular, two major categories of characteristics of the answer provider and answer content were extracted and examined. Findings The results reveal that high-quality answers on academic social Q&A sites tend to possess two characteristics: first, they are provided by scholars with higher academic reputations (e.g. more followers, etc.); and second, they provide objective information (e.g. longer answer with fewer subjective opinions). However, the impact of these factors varies across disciplines, e.g., objectivity is more favourable in physics than in other disciplines. Originality/value The study is envisioned to help academic Q&A sites to select and recommend high-quality answers across different disciplines, especially in a cold-start scenario where the answer has not received enough judgements from peers.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Wenjing Lu

This paper proposes a deep learning-based method for mitosis detection in breast histopathology images. A main problem in mitosis detection is that most of the datasets only have weak labels, i.e., only the coordinates indicating the center of the mitosis region. This makes most of the existing powerful object detection methods hardly be used in mitosis detection. Aiming at solving this problem, this paper firstly applies a CNN-based algorithm to pixelwisely segment the mitosis regions, based on which bounding boxes of mitosis are generated as strong labels. Based on the generated bounding boxes, an object detection network is trained to accomplish mitosis detection. Experimental results show that the proposed method is effective in detecting mitosis, and the accuracies outperform state-of-the-art literatures.


Sign in / Sign up

Export Citation Format

Share Document