scholarly journals Text analysis on health product reviews using r approach

Author(s):  
Nasibah Husna Mohd Kadir ◽  
Sharifah Aliman

In the social media, product reviews contain of text, emoticon, numbers and symbols that hard to identify the text summarization. Text analytics is one of the key techniques in exploring the unstructured data. The purpose of this study is solving the unstructured data by sort and summarizes the review data through a Web-Based Text Analytics using R approach. According to the comparative table between studies in Natural Language Processing (NLP) features, it was observed that Web-Based Text Analytics using R approach can analyze the unstructured data by using the data processing package in R. It combines all the NLP features in the menu part of the text analytics process in steps and it is labeled to make it easier for users to view all the text summarization. This study uses health product review from Shaklee as the data set. The proposed approach shows the acceptable performance in terms of system features execution compared with the baseline model system.

Author(s):  
Sohrab Rahimi ◽  
Sam Mottahedi ◽  
Xi Liu

This study aims to put forth a new method to study the socio-spatial boundaries by using georeferenced community-authored reviews for restaurants. In this study, we show that food choice, drink choice, and restaurant ambience can be good indicators of socio-economic status of the ambient population in different neighborhoods. To this end, we use Yelp user reviews to distinguish different neighborhoods in terms of their food purchases and identify resultant boundaries in 10 North American metropolitan areas. This data-set includes restaurant reviews as well as a limited number of user check-ins and rating in those cities. We use Natural Language Processing (NLP) techniques to select a set of potential features pertaining to food, drink and ambience from Yelp user comments for each geolocated restaurant. We then select those features which determine one’s choice of restaurant and the rating that he/she provides for that restaurant. After identifying these features, we identify neighborhoods where similar taste is practiced. We show that neighborhoods identified through our method show statistically significant differences based on demographic factors such as income, racial composition, and education. We suggest that this method helps urban planners to understand the social dynamics of contemporary cities in absence of information on service-oriented cultural characteristics of urban communities.


The World Wide Web has boosted its content for the past years, it has a vast amount of multimedia resources that continuously grow specifically in documentary data. One of the major contributors of documentary contents can be evidently found on the social media called Facebook. People or netizens on Facebook are actively sharing their opinion about a certain topic or posts that can be related to them or not. With the huge amount of accessible documentary data that are seen on the so-called social media, there are research trends that can be made by the researchers in the field of opinion mining. A netizen’s comment on a particular post can either be a negative or a positive one. This study will discuss the opinion or comment of a netizen whether it is positive or negative or how she/he feels about a specific topic posted on Facebook; this is can be measured by the use of Sentiment Analysis. The combination of the Natural Language Processing and the analytics in textual form is also known as Sentiment Analysis that is use to the extraction of data in a useful manner. This study will be based on the product reviews of Filipinos in Filipino, English and Taglish (mixed Filipino and English) languages. To categorize a comment effectively, the Naïve Bayes Algorithm was implemented to the developed web system.


Author(s):  
Samson Oluwaseun Fadiya

Text analytics applies to most businesses, particularly education segments; for instance, if association or university is suspicious about data secrets being spilt to contenders by the workers, text analytics investigation can help dissect many employees' email messages. The massive volume of both organized and unstructured data principally started from the web-based social networking (media) and Web 2.0. The investigation (analysis) of messages online, tweets, and different types of unstructured text data constitute what we call text analytics, which has been developed during the most recent few years in a way that does not shift, through the upheaval of various algorithms and applications being utilized for the processing of data alongside the protection and IT security. This chapter plans to find common problems faced when using the different medium of data usage in education, one can analyze their information through the perform of sentiment analysis using text analytics by extracting useful information from text documents using IBM's annotation query language (AQL).


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Molham Al-Maleh ◽  
Said Desouki

AbstractNatural language processing has witnessed remarkable progress with the advent of deep learning techniques. Text summarization, along other tasks like text translation and sentiment analysis, used deep neural network models to enhance results. The new methods of text summarization are subject to a sequence-to-sequence framework of encoder–decoder model, which is composed of neural networks trained jointly on both input and output. Deep neural networks take advantage of big datasets to improve their results. These networks are supported by the attention mechanism, which can deal with long texts more efficiently by identifying focus points in the text. They are also supported by the copy mechanism that allows the model to copy words from the source to the summary directly. In this research, we are re-implementing the basic summarization model that applies the sequence-to-sequence framework on the Arabic language, which has not witnessed the employment of this model in the text summarization before. Initially, we build an Arabic data set of summarized article headlines. This data set consists of approximately 300 thousand entries, each consisting of an article introduction and the headline corresponding to this introduction. We then apply baseline summarization models to the previous data set and compare the results using the ROUGE scale.


2016 ◽  
Vol 8s1 ◽  
pp. BII.S37791 ◽  
Author(s):  
Manabu Torii ◽  
Sameer S. Tilak ◽  
Son Doan ◽  
Daniel S. Zisook ◽  
Jung-wei Fan

In an era when most of our life activities are digitized and recorded, opportunities abound to gain insights about population health. Online product reviews present a unique data source that is currently underexplored. Health-related information, although scarce, can be systematically mined in online product reviews. Leveraging natural language processing and machine learning tools, we were able to mine 1.3 million grocery product reviews for health-related information. The objectives of the study were as follows: (1) conduct quantitative and qualitative analysis on the types of health issues found in consumer product reviews; (2) develop a machine learning classifier to detect reviews that contain health-related issues; and (3) gain insights about the task characteristics and challenges for text analytics to guide future research.


2021 ◽  
Author(s):  
Fahed Mubarak Braik ◽  
Abdulla Sulaiman Al Shehhi ◽  
Luigi Saputelli ◽  
Carlos Mata ◽  
Dorzhi Badmaev ◽  
...  

Abstract The purpose of this paper is to communicate the experiences in the development of an innovative concept named "ASK Thamama" as an automated data and information retrieval engine driven by artificial intelligence techniques including text analytics and natural language processing. ASK is an AI enabled conversational search engine used to retrieve information from various internal data repositories using natural language queries. The text processing and conversational engine concept is built upon available open-source software requiring minimum coding of new libraries. A data set with 1000 documents was used to validate key functionalities with an accuracy of 90% of the search queries and able to provide specific answers for 80% of queries framed as questions. The results of this work show encouraging results and demonstrate value that AI-enabled methodologies can provide natural language search by enabling automated workflows for data information retrieval. The developed AI methodology has tremendous potential of integration in an end-to-end workflow of knowledge management by utilizing available document repositories to valuable insights, with little to no human intervention.


Land ◽  
2022 ◽  
Vol 11 (1) ◽  
pp. 123
Author(s):  
Nathan Morrow ◽  
Nancy B. Mock ◽  
Andrea Gatto ◽  
Julia LeMense ◽  
Margaret Hudson

Localized actionable evidence for addressing threats to the environment and human security lacks a comprehensive conceptual frame that incorporates challenges associated with active conflicts. Protective pathways linking previously disciplinarily-divided literatures on environmental security, human security and resilience in a coherent conceptual frame that identifies key relationships is used to analyze a novel, unstructured data set of Global Environment Fund (GEF) programmatic documents. Sub-national geospatial analysis of GEF documentation relating to projects in Africa finds 73% of districts with GEF land degradation projects were co-located with active conflict events. This study utilizes Natural Language Processing on a unique data set of 1500 GEF evaluations to identify text entities associated with conflict. Additional project case studies explore the sequence and relationships of environmental and human security concepts that lead to project success or failure. Differences between biodiversity and climate change projects are discussed but political crisis, poverty and disaster emerged as the most frequently extracted entities associated with conflict in environmental protection projects. Insecurity weakened institutions and fractured communities leading both directly and indirectly to conflict-related damage to environmental programming and desired outcomes. Simple causal explanations found to be inconsistent in previous large-scale statistical associations also inadequately describe dynamics and relationships found in the extracted text entities or case summaries. Emergent protective pathways that emphasized poverty and conflict reduction facilitated by institutional strengthening and inclusion present promising possibilities. Future research with innovative machine learning and other techniques of working with unstructured data may provide additional evidence for implementing actions that address climate change and environmental degradation while strengthening resilience and human security. Resilient, participatory and polycentric governance is key to foster this process.


Author(s):  
Luca Cagliero ◽  
Paolo Garza ◽  
Moreno La Quatra

The recent advances in multimedia and web-based applications have eased the accessibility to large collections of textual documents. To automate the process of document analysis, the research community has put relevant efforts into extracting short summaries of the document content. However, most of the early proposed summarization methods were tailored to English-written textual corpora or to collections of documents all written in the same language. More recently, the joint efforts of the machine learning and the natural language processing communities have produced more portable and flexible solutions, which can be applied to documents written in different languages. This chapter first overviews the most relevant language-specific summarization algorithms. Then, it presents the most recent advances in multi- and cross-lingual text summarization. The chapter classifies the presented methodology, highlights the main pros and cons, and discusses the perspectives of the extension of the current research towards cross-lingual summarization systems.


Author(s):  
Anto Arockia Rosaline R. ◽  
Parvathi R.

Text analytics is the process of extracting high quality information from the text. A set of statistical, linguistic, and machine learning techniques are used to represent the information content from various textual sources such as data analysis, research, or investigation. Text is the common way of communication in social media. The understanding of text includes a variety of tasks including text classification, slang, and other languages. Traditional Natural Language Processing (NLP) techniques require extensive pre-processing techniques to handle the text. When a word “Amazon” occurs in the social media text, there should be a meaningful approach to find out whether it is referring to forest or Kindle. Most of the time, the NLP techniques fail in handling the slang and spellings correctly. Messages in Twitter are so short such that it is difficult to build semantic connections between them. Some messages such as “Gud nite” actually do not contain any real words but are still used for communication.


2004 ◽  
Vol 22 ◽  
pp. 457-479 ◽  
Author(s):  
G. Erkan ◽  
D. R. Radev

We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents.


Sign in / Sign up

Export Citation Format

Share Document