Accurate Information Extraction from Customer Comments Posted Online

doi:10.35940/ijrte.d7731.118419

Accurate Information Extraction from Customer Comments Posted Online

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7731.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 2151-2153

Keyword(s):

Information Extraction ◽

Flow Chart ◽

Online Review ◽

Accurate Information ◽

Keyword Extraction ◽

Hidden Information ◽

Ideal Flow

Customer comments form an integral part for identification of failures and success of a product. Buying patterns of a customer greatly depends on the pattern of comments posted online. Online review/comments can be broadly classified into positive, negative and neutral. Many tools available in market can be used for their classification. However, there are various flaws in classifying methods that can tweak the result of these comments such as “Unidentified/Hidden information in neutral comments”, “Wrong keyword extraction while splitting words”, “fake comments based on frequency of duplicate comment or reviewer”. This paper addresses this problem based on online product comments posted on Amazon website and proposes an ideal flow chart and algorithm to address these problems.

Download Full-text

Reliable and accurate information extraction from surface electromyographic signals

Modelling and Analysis of Active Biopotential Signals in Healthcare, Volume 1 ◽

10.1088/978-0-7503-3279-8ch7 ◽

2020 ◽

Author(s):

Hamid Reza Marateb ◽

Mislav Jordanic ◽

Monica Rojas-Martínez ◽

Joan Francesc Alonso ◽

Leidy Yanet Serna ◽

...

Keyword(s):

Information Extraction ◽

Accurate Information ◽

Electromyographic Signals

Download Full-text

Accurate information extraction for quantitative financial events

Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11 ◽

10.1145/2063576.2064001 ◽

2011 ◽

Cited By ~ 5

Author(s):

Hassan H. Malik ◽

Vikas S. Bhardwaj ◽

Huascar Fiorletta

Keyword(s):

Information Extraction ◽

Accurate Information ◽

Financial Events

Download Full-text

Computationally effective algorithm for information extraction and online review mining

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics - WIMS '12 ◽

10.1145/2254129.2254207 ◽

2012 ◽

Cited By ~ 3

Author(s):

Boris Kraychev ◽

Ivan Koychev

Keyword(s):

Information Extraction ◽

Online Review ◽

Effective Algorithm ◽

Review Mining

Download Full-text

Research on Web Intelligent Information Extraction Method

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.539.464 ◽

2014 ◽

Vol 539 ◽

pp. 464-468

Author(s):

Zhi Min Wang

Keyword(s):

Information Extraction ◽

Extraction Method ◽

Experimental Results ◽

Accurate Information ◽

Web Page ◽

Page Segmentation ◽

Segmentation Technique ◽

Ontology Extraction ◽

Intelligent Information

The paper introduces segmentation ideas in the pretreatment process of web page. By page segmentation technique to extract the accurate information in the extract region, the region was processed to extract according to the rules of ontology extraction , and ultimately get the information you need. Through experiments on two real datasets and compare with related work, experimental results show that this method can achieve good extraction results.

Download Full-text

Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity

JAMIA Open ◽

10.1093/jamiaopen/ooab085 ◽

2021 ◽

Vol 4 (3) ◽

Author(s):

Briton Park ◽

Nicholas Altieri ◽

John DeNero ◽

Anobel Y Odisho ◽

Bin Yu

Keyword(s):

Natural Language ◽

Information Extraction ◽

Transfer Learning ◽

Language Processing ◽

Training Data ◽

Accurate Information ◽

Pathology Report ◽

Learning Methods ◽

String Similarity ◽

Pathology Reports

Abstract Objective We develop natural language processing (NLP) methods capable of accurately classifying tumor attributes from pathology reports given minimal labeled examples. Our hierarchical cancer to cancer transfer (HCTC) and zero-shot string similarity (ZSS) methods are designed to exploit shared information between cancers and auxiliary class features, respectively, to boost performance using enriched annotations which give both location-based information and document level labels for each pathology report. Materials and Methods Our data consists of 250 pathology reports each for kidney, colon, and lung cancer from 2002 to 2019 from a single institution (UCSF). For each report, we classified 5 attributes: procedure, tumor location, histology, grade, and presence of lymphovascular invasion. We develop novel NLP techniques involving transfer learning and string similarity trained on enriched annotations. We compare HCTC and ZSS methods to the state-of-the-art including conventional machine learning methods as well as deep learning methods. Results For our HCTC method, we see an improvement of up to 0.1 micro-F1 score and 0.04 macro-F1 averaged across cancer and applicable attributes. For our ZSS method, we see an improvement of up to 0.26 micro-F1 and 0.23 macro-F1 averaged across cancer and applicable attributes. These comparisons are made after adjusting training data sizes to correct for the 20% increase in annotation time for enriched annotations compared to ordinary annotations. Conclusions Methods based on transfer learning across cancers and augmenting information methods with string similarity priors can significantly reduce the amount of labeled data needed for accurate information extraction from pathology reports.

Download Full-text

AGATHE-2

E-Business Applications for Product Development and Competitive Growth ◽

10.4018/978-1-60960-132-4.ch012 ◽

2011 ◽

pp. 236-260

Author(s):

Bernard Espinasse ◽

Sébastien Fournier ◽

Fred Freitas ◽

Shereen Albitar ◽

Rinaldo Lima

Keyword(s):

Machine Learning ◽

Information Extraction ◽

Learning Algorithm ◽

Domain Ontology ◽

Information Gathering ◽

Machine Learning Techniques ◽

Accurate Information ◽

Web Pages ◽

Restricted Domain ◽

The Web

Due to Web size and diversity of information, relevant information gathering on the Web turns out to be a highly complex task. The main problem with most information retrieval approaches is neglecting pages’ context, given their inner deficiency: search engines are based on keyword indexing, which cannot capture context. Considering restricted domains, taking into account contexts, with the use of domain ontology, may lead to more relevant and accurate information gathering. In the last years, we have conducted research with this hypothesis, and proposed an agent- and ontology-based restricted-domain cooperative information gathering approach accordingly, that can be instantiated in information gathering systems for specific domains, such as academia, tourism, etc. In this chapter, the authors present this approach, a generic software architecture, named AGATHE-2, which is a full-fledged scalable multi-agent system. Besides offering an in-depth treatment for these domains due to the use of domain ontology, this new version uses machine learning techniques over linguistic information in order to accelerate the knowledge acquisition necessary for the task of information extraction over the Web pages. AGATHE-2 is an agent and ontology-based system that collects and classifies relevant Web pages about a restricted domain, using the BWI (Boosted Wrapper Induction), a machine-learning algorithm, to perform adaptive information extraction.

Download Full-text

THE ROLE OF TEXTURE INFORMATION AND DATA FUSION IN TOPOGRAPHIC OBJECTS EXTRACTION FROM SATELLITE DATA

Geodesy and Cartography ◽

10.3846/20296991.2014.962814 ◽

2014 ◽

Vol 40 (3) ◽

pp. 116-121 ◽

Cited By ~ 1

Author(s):

Kuldeep Chaurasia ◽

Pradeep Kumar Garg

Keyword(s):

Data Fusion ◽

Information Extraction ◽

Satellite Data ◽

Satellite Images ◽

City Planning ◽

Window Size ◽

Accurate Information ◽

Topographic Map ◽

Texture Information ◽

Occurrence Matrix

The growing availability of the satellite data has augmented the need of information extraction that can be utilized in various application including topographic map updation, city planning, pattern recognition and machine vision etc. The accurate information extraction from satellite images involves the integration of additional measures such as texture, shape etc. In this paper, investigation on extraction of topographic objects from satellite images by incorporating the texture information and data fusion has been made. The applicability of various texture measures based on the gray level co-occurrence matrix along with the effect of varying pixel window is also discussed. The classification results indicate that homogeneity texture image generated using 3*3 window size is best suitable for topographic objects extraction. The best classification results with overall accuracy 85.0% and kappa coefficient 0.80 are obtained when classification is performed on fused image (Multispectral + PAN + Texture).

Download Full-text

Pilot Testing of an Information Extraction (IE) Prototype for Legal Research

The African Journal of Information and Communication ◽

10.23962/10539/29192 ◽

2020 ◽

pp. 1-20

Author(s):

Brenda Scholtz ◽

Thashen Padayachy ◽

Oluwande Adewoyin

Keyword(s):

Information Extraction ◽

Case Law ◽

Accurate Information ◽

Legal Research ◽

Pilot Testing ◽

Legal Case ◽

Attribute Information

This article presents findings from pilot testing of elements of an information extraction (IE) prototype designed to assist legal researchers in engaging with case law databases. The prototype that was piloted seeks to extract, from legal case documents, relevant and accurate information on cases referred to (CRTs) in the source cases. Testing of CRT extraction from 50 source cases resulted in only 38% (n = 19) of the extractions providing an accurate number of CRTs. In respect of the prototype’s extraction of CRT attributes (case title, date, journal, and action), none of the 50 extractions produced fully accurate attribute information. The article outlines the prototype, the pilot testing process, and the test findings, and then concludes with a discussion of where the prototype needs to be improved.

Download Full-text

From Tokenization to Self-Supervision: Building a High-Performance Information Extraction System for Chemical Reactions in Patents

Frontiers in Research Metrics and Analytics ◽

10.3389/frma.2021.691105 ◽

2021 ◽

Vol 6 ◽

Author(s):

Jingqi Wang ◽

Yuankai Ren ◽

Zhi Zhang ◽

Hua Xu ◽

Yaoyun Zhang

Keyword(s):

Information Extraction ◽

Chemical Reactions ◽

Chemical Reaction ◽

High Performance ◽

Event Extraction ◽

Entity Recognition ◽

Language Models ◽

Accurate Information ◽

Free Text ◽

Semantic Roles

Chemical reactions and experimental conditions are fundamental information for chemical research and pharmaceutical applications. However, the latest information of chemical reactions is usually embedded in the free text of patents. The rapidly accumulating chemical patents urge automatic tools based on natural language processing (NLP) techniques for efficient and accurate information extraction. This work describes the participation of the Melax Tech team in the CLEF 2020—ChEMU Task of Chemical Reaction Extraction from Patent. The task consisted of two subtasks: (1) named entity recognition to identify compounds and different semantic roles in the chemical reaction and (2) event extraction to identify event triggers of chemical reaction and their relations with the semantic roles recognized in subtask 1. To build an end-to-end system with high performance, multiple strategies tailored to chemical patents were applied and evaluated, ranging from optimizing the tokenization, pre-training patent language models based on self-supervision, to domain knowledge-based rules. Our hybrid approaches combining different strategies achieved state-of-the-art results in both subtasks, with the top-ranked F1 of 0.957 for entity recognition and the top-ranked F1 of 0.9536 for event extraction, indicating that the proposed approaches are promising.

Download Full-text

Study of Keyword Extraction Techniques for Electric Double-Layer Capacitor Domain Using Text Similarity Indexes: An Experimental Analysis

Complexity ◽

10.1155/2021/8192320 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

M. Saef Ullah Miah ◽

Junaida Sulaiman ◽

Talha Bin Sarwar ◽

Kamal Z. Zamli ◽

Rajan Jose

Keyword(s):

Double Layer ◽

Experimental Analysis ◽

Time Management ◽

Electric Double Layer ◽

Similarity Index ◽

Accurate Information ◽

Keyword Extraction ◽

Electric Double Layer Capacitor ◽

Extraction Techniques ◽

Double Layer Capacitor

Keywords perform a significant role in selecting various topic-related documents quite easily. Topics or keywords assigned by humans or experts provide accurate information. However, this practice is quite expensive in terms of resources and time management. Hence, it is more satisfying to utilize automated keyword extraction techniques. Nevertheless, before beginning the automated process, it is necessary to check and confirm how similar expert-provided and algorithm-generated keywords are. This paper presents an experimental analysis of similarity scores of keywords generated by different supervised and unsupervised automated keyword extraction algorithms with expert-provided keywords from the electric double layer capacitor (EDLC) domain. The paper also analyses which texts provide better keywords such as positive sentences or all sentences of the document. From the unsupervised algorithms, YAKE, TopicRank, MultipartiteRank, and KPMiner are employed for keyword extraction. From the supervised algorithms, KEA and WINGNUS are employed for keyword extraction. To assess the similarity of the extracted keywords with expert-provided keywords, Jaccard, Cosine, and Cosine with word vector similarity indexes are employed in this study. The experiment shows that the MultipartiteRank keyword extraction technique measured with cosine with word vector similarity index produces the best result with 92% similarity with expert-provided keywords. This study can help the NLP researchers working with the EDLC domain or recommender systems to select more suitable keyword extraction and similarity index calculation techniques.

Download Full-text