scholarly journals The Challenges of Modeling and Predicting Online Review Helpfulness

2021 ◽  
Author(s):  
Rogério Figueredo de Sousa ◽  
Thiago Alexandre Salgueiro Pardo

Predicting review helpfulness is an important task in Natural Language Processing. It is useful for dealing with the huge amount of online reviews on varied domains and languages, helping and guiding users on what to read and consider in their daily decisions. However, there are limited initiatives to investigate the nature of this task and how hard it is. This paper aims to fulfill this gap, providing a better understanding of it. Two complementary experiments are performed in order to uncover patterns of usefulness evaluation as performed by humans and relevant features for machine prediction. To assure our results, we run the experiments for two different domains: movies and apps. We show that humans agree on the process of assigning helpfulness to reviews, despite the difficulty of the task. More than this, people perform this process systematically and consistently. Finally, we empirically identify the most relevant content features for machine learning prediction of review helpfulness.

2019 ◽  
Vol 13 (2) ◽  
pp. 159-165
Author(s):  
Manik Sharma ◽  
Gurvinder Singh ◽  
Rajinder Singh

Background: For almost every domain, a tremendous degree of data is accessible in an online and offline mode. Billions of users are daily posting their views or opinions by using different online applications like WhatsApp, Facebook, Twitter, Blogs, Instagram etc. Objective: These reviews are constructive for the progress of the venture, civilization, state and even nation. However, this momentous amount of information is useful only if it is collectively and effectively mined. Methodology: Opinion mining is used to extract the thoughts, expression, emotions, critics, appraisal from the data posted by different persons. It is one of the prevailing research techniques that coalesce and employ the features from natural language processing. Here, an amalgamated approach has been employed to mine online reviews. Results: To improve the results of genetic algorithm based opining mining patent, here, a hybrid genetic algorithm and ontology based 3-tier natural language processing framework named GAO_NLP_OM has been designed. First tier is used for preprocessing and corrosion of the sentences. Middle tier is composed of genetic algorithm based searching module, ontology for English sentences, base words for the review, complete set of English words with item and their features. Genetic algorithm is used to expedite the polarity mining process. The last tier is liable for semantic, discourse and feature summarization. Furthermore, the use of ontology assists in progressing more accurate opinion mining model. Conclusion: GAO_NLP_OM is supposed to improve the performance of genetic algorithm based opinion mining patent. The amalgamation of genetic algorithm, ontology and natural language processing seems to produce fast and more precise results. The proposed framework is able to mine simple as well as compound sentences. However, affirmative preceded interrogative, hidden feature and mixed language sentences still be a challenge for the proposed framework.


Author(s):  
Rohan Pandey ◽  
Vaibhav Gautam ◽  
Ridam Pal ◽  
Harsh Bandhey ◽  
Lovedeep Singh Dhingra ◽  
...  

BACKGROUND The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this ‘Infodemic’ requires strong health messaging systems that are engaging, vernacular, scalable, effective and continuously learn the new patterns of misinformation. OBJECTIVE We created WashKaro, a multi-pronged intervention for mitigating misinformation through conversational AI, machine translation and natural language processing. WashKaro provides the right information matched against WHO guidelines through AI, and delivers it in the right format in local languages. METHODS We theorize (i) an NLP based AI engine that could continuously incorporate user feedback to improve relevance of information, (ii) bite sized audio in the local language to improve penetrance in a country with skewed gender literacy ratios, and (iii) conversational but interactive AI engagement with users towards an increased health awareness in the community. RESULTS A total of 5026 people who downloaded the app during the study window, among those 1545 were active users. Our study shows that 3.4 times more females engaged with the App in Hindi as compared to males, the relevance of AI-filtered news content doubled within 45 days of continuous machine learning, and the prudence of integrated AI chatbot “Satya” increased thus proving the usefulness of an mHealth platform to mitigate health misinformation. CONCLUSIONS We conclude that a multi-pronged machine learning application delivering vernacular bite-sized audios and conversational AI is an effective approach to mitigate health misinformation. CLINICALTRIAL Not Applicable


2021 ◽  
Vol 28 (1) ◽  
pp. e100262
Author(s):  
Mustafa Khanbhai ◽  
Patrick Anyadi ◽  
Joshua Symons ◽  
Kelsey Flott ◽  
Ara Darzi ◽  
...  

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.


2019 ◽  
pp. 1-8 ◽  
Author(s):  
Tomasz Oliwa ◽  
Steven B. Maron ◽  
Leah M. Chase ◽  
Samantha Lomnicki ◽  
Daniel V.T. Catenacci ◽  
...  

PURPOSE Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.


Sign in / Sign up

Export Citation Format

Share Document