Accurate information extraction for quantitative financial events

The paper introduces segmentation ideas in the pretreatment process of web page. By page segmentation technique to extract the accurate information in the extract region, the region was processed to extract according to the rules of ontology extraction , and ultimately get the information you need. Through experiments on two real datasets and compare with related work, experimental results show that this method can achieve good extraction results.

Download Full-text

Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity

JAMIA Open ◽

10.1093/jamiaopen/ooab085 ◽

2021 ◽

Vol 4 (3) ◽

Author(s):

Briton Park ◽

Nicholas Altieri ◽

John DeNero ◽

Anobel Y Odisho ◽

Bin Yu

Keyword(s):

Natural Language ◽

Information Extraction ◽

Transfer Learning ◽

Language Processing ◽

Training Data ◽

Accurate Information ◽

Pathology Report ◽

Learning Methods ◽

String Similarity ◽

Pathology Reports

Abstract Objective We develop natural language processing (NLP) methods capable of accurately classifying tumor attributes from pathology reports given minimal labeled examples. Our hierarchical cancer to cancer transfer (HCTC) and zero-shot string similarity (ZSS) methods are designed to exploit shared information between cancers and auxiliary class features, respectively, to boost performance using enriched annotations which give both location-based information and document level labels for each pathology report. Materials and Methods Our data consists of 250 pathology reports each for kidney, colon, and lung cancer from 2002 to 2019 from a single institution (UCSF). For each report, we classified 5 attributes: procedure, tumor location, histology, grade, and presence of lymphovascular invasion. We develop novel NLP techniques involving transfer learning and string similarity trained on enriched annotations. We compare HCTC and ZSS methods to the state-of-the-art including conventional machine learning methods as well as deep learning methods. Results For our HCTC method, we see an improvement of up to 0.1 micro-F1 score and 0.04 macro-F1 averaged across cancer and applicable attributes. For our ZSS method, we see an improvement of up to 0.26 micro-F1 and 0.23 macro-F1 averaged across cancer and applicable attributes. These comparisons are made after adjusting training data sizes to correct for the 20% increase in annotation time for enriched annotations compared to ordinary annotations. Conclusions Methods based on transfer learning across cancers and augmenting information methods with string similarity priors can significantly reduce the amount of labeled data needed for accurate information extraction from pathology reports.

Download Full-text

AGATHE-2

E-Business Applications for Product Development and Competitive Growth ◽

10.4018/978-1-60960-132-4.ch012 ◽

2011 ◽

pp. 236-260

Author(s):

Bernard Espinasse ◽

Sébastien Fournier ◽

Fred Freitas ◽

Shereen Albitar ◽

Rinaldo Lima

Keyword(s):

Machine Learning ◽

Information Extraction ◽

Learning Algorithm ◽

Domain Ontology ◽

Information Gathering ◽

Machine Learning Techniques ◽

Accurate Information ◽

Web Pages ◽

Restricted Domain ◽

The Web

Due to Web size and diversity of information, relevant information gathering on the Web turns out to be a highly complex task. The main problem with most information retrieval approaches is neglecting pages’ context, given their inner deficiency: search engines are based on keyword indexing, which cannot capture context. Considering restricted domains, taking into account contexts, with the use of domain ontology, may lead to more relevant and accurate information gathering. In the last years, we have conducted research with this hypothesis, and proposed an agent- and ontology-based restricted-domain cooperative information gathering approach accordingly, that can be instantiated in information gathering systems for specific domains, such as academia, tourism, etc. In this chapter, the authors present this approach, a generic software architecture, named AGATHE-2, which is a full-fledged scalable multi-agent system. Besides offering an in-depth treatment for these domains due to the use of domain ontology, this new version uses machine learning techniques over linguistic information in order to accelerate the knowledge acquisition necessary for the task of information extraction over the Web pages. AGATHE-2 is an agent and ontology-based system that collects and classifies relevant Web pages about a restricted domain, using the BWI (Boosted Wrapper Induction), a machine-learning algorithm, to perform adaptive information extraction.

Download Full-text

Accurate Information Extraction from Customer Comments Posted Online

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7731.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 2151-2153

Keyword(s):

Information Extraction ◽

Flow Chart ◽

Online Review ◽

Accurate Information ◽

Keyword Extraction ◽

Hidden Information ◽

Ideal Flow

Customer comments form an integral part for identification of failures and success of a product. Buying patterns of a customer greatly depends on the pattern of comments posted online. Online review/comments can be broadly classified into positive, negative and neutral. Many tools available in market can be used for their classification. However, there are various flaws in classifying methods that can tweak the result of these comments such as “Unidentified/Hidden information in neutral comments”, “Wrong keyword extraction while splitting words”, “fake comments based on frequency of duplicate comment or reviewer”. This paper addresses this problem based on online product comments posted on Amazon website and proposes an ideal flow chart and algorithm to address these problems.

Download Full-text

THE ROLE OF TEXTURE INFORMATION AND DATA FUSION IN TOPOGRAPHIC OBJECTS EXTRACTION FROM SATELLITE DATA

Geodesy and Cartography ◽

10.3846/20296991.2014.962814 ◽

2014 ◽

Vol 40 (3) ◽

pp. 116-121 ◽

Cited By ~ 1

Author(s):

Kuldeep Chaurasia ◽

Pradeep Kumar Garg

Keyword(s):

Data Fusion ◽

Information Extraction ◽

Satellite Data ◽

Satellite Images ◽

City Planning ◽

Window Size ◽

Accurate Information ◽

Topographic Map ◽

Texture Information ◽

Occurrence Matrix

The growing availability of the satellite data has augmented the need of information extraction that can be utilized in various application including topographic map updation, city planning, pattern recognition and machine vision etc. The accurate information extraction from satellite images involves the integration of additional measures such as texture, shape etc. In this paper, investigation on extraction of topographic objects from satellite images by incorporating the texture information and data fusion has been made. The applicability of various texture measures based on the gray level co-occurrence matrix along with the effect of varying pixel window is also discussed. The classification results indicate that homogeneity texture image generated using 3*3 window size is best suitable for topographic objects extraction. The best classification results with overall accuracy 85.0% and kappa coefficient 0.80 are obtained when classification is performed on fused image (Multispectral + PAN + Texture).

Download Full-text

Pilot Testing of an Information Extraction (IE) Prototype for Legal Research

The African Journal of Information and Communication ◽

10.23962/10539/29192 ◽

2020 ◽

pp. 1-20

Author(s):

Brenda Scholtz ◽

Thashen Padayachy ◽

Oluwande Adewoyin

Keyword(s):

Information Extraction ◽

Case Law ◽

Accurate Information ◽

Legal Research ◽

Pilot Testing ◽

Legal Case ◽

Attribute Information

This article presents findings from pilot testing of elements of an information extraction (IE) prototype designed to assist legal researchers in engaging with case law databases. The prototype that was piloted seeks to extract, from legal case documents, relevant and accurate information on cases referred to (CRTs) in the source cases. Testing of CRT extraction from 50 source cases resulted in only 38% (n = 19) of the extractions providing an accurate number of CRTs. In respect of the prototype’s extraction of CRT attributes (case title, date, journal, and action), none of the 50 extractions produced fully accurate attribute information. The article outlines the prototype, the pilot testing process, and the test findings, and then concludes with a discussion of where the prototype needs to be improved.

Download Full-text

From Tokenization to Self-Supervision: Building a High-Performance Information Extraction System for Chemical Reactions in Patents

Frontiers in Research Metrics and Analytics ◽

10.3389/frma.2021.691105 ◽

2021 ◽

Vol 6 ◽

Author(s):

Jingqi Wang ◽

Yuankai Ren ◽

Zhi Zhang ◽

Hua Xu ◽

Yaoyun Zhang

Keyword(s):

Information Extraction ◽

Chemical Reactions ◽

Chemical Reaction ◽

High Performance ◽

Event Extraction ◽

Entity Recognition ◽

Language Models ◽

Accurate Information ◽

Free Text ◽

Semantic Roles

Chemical reactions and experimental conditions are fundamental information for chemical research and pharmaceutical applications. However, the latest information of chemical reactions is usually embedded in the free text of patents. The rapidly accumulating chemical patents urge automatic tools based on natural language processing (NLP) techniques for efficient and accurate information extraction. This work describes the participation of the Melax Tech team in the CLEF 2020—ChEMU Task of Chemical Reaction Extraction from Patent. The task consisted of two subtasks: (1) named entity recognition to identify compounds and different semantic roles in the chemical reaction and (2) event extraction to identify event triggers of chemical reaction and their relations with the semantic roles recognized in subtask 1. To build an end-to-end system with high performance, multiple strategies tailored to chemical patents were applied and evaluated, ranging from optimizing the tokenization, pre-training patent language models based on self-supervision, to domain knowledge-based rules. Our hybrid approaches combining different strategies achieved state-of-the-art results in both subtasks, with the top-ranked F1 of 0.957 for entity recognition and the top-ranked F1 of 0.9536 for event extraction, indicating that the proposed approaches are promising.

Download Full-text

Stopping-power determination for compound by EELS

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100172474 ◽

1994 ◽

Vol 52 ◽

pp. 948-949 ◽

Cited By ~ 1

Author(s):

David C. Joy ◽

Suichu Luo ◽

John R. Dunlap ◽

Dick Williams ◽

Siqi Cao

Keyword(s):

Energy Loss ◽

Electron Energy ◽

Electron Energy Loss Spectroscopy ◽

Materials Science ◽

Average Energy ◽

Stopping Power ◽

Accurate Information ◽

Power Calculation ◽

Coulomb Collisions ◽

Electron Energy Loss

In Physics, Chemistry, Materials Science, Biology and Medicine, it is very important to have accurate information about the stopping power of various media for electrons, that is the average energy loss per unit pathlength due to inelastic Coulomb collisions with atomic electrons of the specimen along their trajectories. Techniques such as photoemission spectroscopy, Auger electron spectroscopy, and electron energy loss spectroscopy have been used in the measurements of electron-solid interaction. In this paper we present a comprehensive technique which combines experimental and theoretical work to determine the electron stopping power for various materials by electron energy loss spectroscopy (EELS ). As an example, we measured stopping power for Si, C, and their compound SiC. The method, results and discussion are described briefly as below.The stopping power calculation is based on the modified Bethe formula at low energy:where Neff and Ieff are the effective values of the mean ionization potential, and the number of electrons participating in the process respectively. Neff and Ieff can be obtained from the sum rule relations as we discussed before3 using the energy loss function Im(−1/ε).

Download Full-text

Applications of SEM/EDXA in cement clinker investigation

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100177489 ◽

1990 ◽

Vol 48 (4) ◽

pp. 870-871

Author(s):

Arezki Tagnit-Hamou ◽

Shondeep L. Sarkar

Keyword(s):

Mineralogical Composition ◽

Physicochemical Characteristics ◽

Cement Clinker ◽

Accurate Information ◽

Phase Identification ◽

History Of ◽

Bulk Chemical ◽

Resolution Imaging ◽

Proven Method ◽

Instantaneous Chemical

All the desired properties of cement primarily depend on the physicochemical characteristics of clinker from which the cement is produced. The mineralogical composition of the clinker forms the most important parameter influencing these properties.Optical microscopy provides reasonably accurate information pertaining to the thermal history of the clinker, while XRDA still remains the proven method of phase identification, and bulk chemical composition of the clinker can be readily obtained from XRFA. Nevertheless, all these microanalytical techniques are somewhat limited in their applications, and SEM/EDXA combination fills this gap uniquely by virtue of its high resolution imaging capability and possibility of instantaneous chemical analysis of individual phases.Inhomogeneities and impurities in the raw meal, influence of kiln conditions such as sintering and cooling rate being directly related to the microstructure can be effectively determined by SEM/EDXA. In addition, several physical characteristics of cement, such as rhcology, grindability and hydraulicity also depend on the clinker microstructure.

Download Full-text

Accurate information extraction for quantitative financial events

Reliable and accurate information extraction from surface electromyographic signals

Research on Web Intelligent Information Extraction Method

Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity

AGATHE-2

Accurate Information Extraction from Customer Comments Posted Online

THE ROLE OF TEXTURE INFORMATION AND DATA FUSION IN TOPOGRAPHIC OBJECTS EXTRACTION FROM SATELLITE DATA

Pilot Testing of an Information Extraction (IE) Prototype for Legal Research

From Tokenization to Self-Supervision: Building a High-Performance Information Extraction System for Chemical Reactions in Patents

Stopping-power determination for compound by EELS

Applications of SEM/EDXA in cement clinker investigation

Export Citation Format