scholarly journals Accurate Information Extraction from Customer Comments Posted Online

2019 ◽  
Vol 8 (4) ◽  
pp. 2151-2153

Customer comments form an integral part for identification of failures and success of a product. Buying patterns of a customer greatly depends on the pattern of comments posted online. Online review/comments can be broadly classified into positive, negative and neutral. Many tools available in market can be used for their classification. However, there are various flaws in classifying methods that can tweak the result of these comments such as “Unidentified/Hidden information in neutral comments”, “Wrong keyword extraction while splitting words”, “fake comments based on frequency of duplicate comment or reviewer”. This paper addresses this problem based on online product comments posted on Amazon website and proposes an ideal flow chart and algorithm to address these problems.

Author(s):  
Hamid Reza Marateb ◽  
Mislav Jordanic ◽  
Monica Rojas-Martínez ◽  
Joan Francesc Alonso ◽  
Leidy Yanet Serna ◽  
...  

2014 ◽  
Vol 539 ◽  
pp. 464-468
Author(s):  
Zhi Min Wang

The paper introduces segmentation ideas in the pretreatment process of web page. By page segmentation technique to extract the accurate information in the extract region, the region was processed to extract according to the rules of ontology extraction , and ultimately get the information you need. Through experiments on two real datasets and compare with related work, experimental results show that this method can achieve good extraction results.


JAMIA Open ◽  
2021 ◽  
Vol 4 (3) ◽  
Author(s):  
Briton Park ◽  
Nicholas Altieri ◽  
John DeNero ◽  
Anobel Y Odisho ◽  
Bin Yu

Abstract Objective We develop natural language processing (NLP) methods capable of accurately classifying tumor attributes from pathology reports given minimal labeled examples. Our hierarchical cancer to cancer transfer (HCTC) and zero-shot string similarity (ZSS) methods are designed to exploit shared information between cancers and auxiliary class features, respectively, to boost performance using enriched annotations which give both location-based information and document level labels for each pathology report. Materials and Methods Our data consists of 250 pathology reports each for kidney, colon, and lung cancer from 2002 to 2019 from a single institution (UCSF). For each report, we classified 5 attributes: procedure, tumor location, histology, grade, and presence of lymphovascular invasion. We develop novel NLP techniques involving transfer learning and string similarity trained on enriched annotations. We compare HCTC and ZSS methods to the state-of-the-art including conventional machine learning methods as well as deep learning methods. Results For our HCTC method, we see an improvement of up to 0.1 micro-F1 score and 0.04 macro-F1 averaged across cancer and applicable attributes. For our ZSS method, we see an improvement of up to 0.26 micro-F1 and 0.23 macro-F1 averaged across cancer and applicable attributes. These comparisons are made after adjusting training data sizes to correct for the 20% increase in annotation time for enriched annotations compared to ordinary annotations. Conclusions Methods based on transfer learning across cancers and augmenting information methods with string similarity priors can significantly reduce the amount of labeled data needed for accurate information extraction from pathology reports.


Author(s):  
Bernard Espinasse ◽  
Sébastien Fournier ◽  
Fred Freitas ◽  
Shereen Albitar ◽  
Rinaldo Lima

Due to Web size and diversity of information, relevant information gathering on the Web turns out to be a highly complex task. The main problem with most information retrieval approaches is neglecting pages’ context, given their inner deficiency: search engines are based on keyword indexing, which cannot capture context. Considering restricted domains, taking into account contexts, with the use of domain ontology, may lead to more relevant and accurate information gathering. In the last years, we have conducted research with this hypothesis, and proposed an agent- and ontology-based restricted-domain cooperative information gathering approach accordingly, that can be instantiated in information gathering systems for specific domains, such as academia, tourism, etc. In this chapter, the authors present this approach, a generic software architecture, named AGATHE-2, which is a full-fledged scalable multi-agent system. Besides offering an in-depth treatment for these domains due to the use of domain ontology, this new version uses machine learning techniques over linguistic information in order to accelerate the knowledge acquisition necessary for the task of information extraction over the Web pages. AGATHE-2 is an agent and ontology-based system that collects and classifies relevant Web pages about a restricted domain, using the BWI (Boosted Wrapper Induction), a machine-learning algorithm, to perform adaptive information extraction.


2014 ◽  
Vol 40 (3) ◽  
pp. 116-121 ◽  
Author(s):  
Kuldeep Chaurasia ◽  
Pradeep Kumar Garg

The growing availability of the satellite data has augmented the need of information extraction that can be utilized in various application including topographic map updation, city planning, pattern recognition and machine vision etc. The accurate information extraction from satellite images involves the integration of additional measures such as texture, shape etc. In this paper, investigation on extraction of topographic objects from satellite images by incorporating the texture information and data fusion has been made. The applicability of various texture measures based on the gray level co-occurrence matrix along with the effect of varying pixel window is also discussed. The classification results indicate that homogeneity texture image generated using 3*3 window size is best suitable for topographic objects extraction. The best classification results with overall accuracy 85.0% and kappa coefficient 0.80 are obtained when classification is performed on fused image (Multispectral + PAN + Texture).


Author(s):  
Brenda Scholtz ◽  
Thashen Padayachy ◽  
Oluwande Adewoyin

This article presents findings from pilot testing of elements of an information extraction (IE) prototype designed to assist legal researchers in engaging with case law databases. The prototype that was piloted seeks to extract, from legal case documents, relevant and accurate information on cases referred to (CRTs) in the source cases. Testing of CRT extraction from 50 source cases resulted in only 38% (n = 19) of the extractions providing an accurate number of CRTs. In respect of the prototype’s extraction of CRT attributes (case title, date, journal, and action), none of the 50 extractions produced fully accurate attribute information. The article outlines the prototype, the pilot testing process, and the test findings, and then concludes with a discussion of where the prototype needs to be improved.


Author(s):  
Jingqi Wang ◽  
Yuankai Ren ◽  
Zhi Zhang ◽  
Hua Xu ◽  
Yaoyun Zhang

Chemical reactions and experimental conditions are fundamental information for chemical research and pharmaceutical applications. However, the latest information of chemical reactions is usually embedded in the free text of patents. The rapidly accumulating chemical patents urge automatic tools based on natural language processing (NLP) techniques for efficient and accurate information extraction. This work describes the participation of the Melax Tech team in the CLEF 2020—ChEMU Task of Chemical Reaction Extraction from Patent. The task consisted of two subtasks: (1) named entity recognition to identify compounds and different semantic roles in the chemical reaction and (2) event extraction to identify event triggers of chemical reaction and their relations with the semantic roles recognized in subtask 1. To build an end-to-end system with high performance, multiple strategies tailored to chemical patents were applied and evaluated, ranging from optimizing the tokenization, pre-training patent language models based on self-supervision, to domain knowledge-based rules. Our hybrid approaches combining different strategies achieved state-of-the-art results in both subtasks, with the top-ranked F1 of 0.957 for entity recognition and the top-ranked F1 of 0.9536 for event extraction, indicating that the proposed approaches are promising.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
M. Saef Ullah Miah ◽  
Junaida Sulaiman ◽  
Talha Bin Sarwar ◽  
Kamal Z. Zamli ◽  
Rajan Jose

Keywords perform a significant role in selecting various topic-related documents quite easily. Topics or keywords assigned by humans or experts provide accurate information. However, this practice is quite expensive in terms of resources and time management. Hence, it is more satisfying to utilize automated keyword extraction techniques. Nevertheless, before beginning the automated process, it is necessary to check and confirm how similar expert-provided and algorithm-generated keywords are. This paper presents an experimental analysis of similarity scores of keywords generated by different supervised and unsupervised automated keyword extraction algorithms with expert-provided keywords from the electric double layer capacitor (EDLC) domain. The paper also analyses which texts provide better keywords such as positive sentences or all sentences of the document. From the unsupervised algorithms, YAKE, TopicRank, MultipartiteRank, and KPMiner are employed for keyword extraction. From the supervised algorithms, KEA and WINGNUS are employed for keyword extraction. To assess the similarity of the extracted keywords with expert-provided keywords, Jaccard, Cosine, and Cosine with word vector similarity indexes are employed in this study. The experiment shows that the MultipartiteRank keyword extraction technique measured with cosine with word vector similarity index produces the best result with 92% similarity with expert-provided keywords. This study can help the NLP researchers working with the EDLC domain or recommender systems to select more suitable keyword extraction and similarity index calculation techniques.


Sign in / Sign up

Export Citation Format

Share Document