scholarly journals EliIE: An open-source information extraction system for clinical trial eligibility criteria

2017 ◽  
Vol 24 (6) ◽  
pp. 1062-1071 ◽  
Author(s):  
Tian Kang ◽  
Shaodian Zhang ◽  
Youlan Tang ◽  
Gregory W Hruby ◽  
Alexander Rusanov ◽  
...  

Abstract Objective To develop an open-source information extraction system called Eligibility Criteria Information Extraction (EliIE) for parsing and formalizing free-text clinical research eligibility criteria (EC) following Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) version 5.0. Materials and Methods EliIE parses EC in 4 steps: (1) clinical entity and attribute recognition, (2) negation detection, (3) relation extraction, and (4) concept normalization and output structuring. Informaticians and domain experts were recruited to design an annotation guideline and generate a training corpus of annotated EC for 230 Alzheimer’s clinical trials, which were represented as queries against the OMOP CDM and included 8008 entities, 3550 attributes, and 3529 relations. A sequence labeling–based method was developed for automatic entity and attribute recognition. Negation detection was supported by NegEx and a set of predefined rules. Relation extraction was achieved by a support vector machine classifier. We further performed terminology-based concept normalization and output structuring. Results In task-specific evaluations, the best F1 score for entity recognition was 0.79, and for relation extraction was 0.89. The accuracy of negation detection was 0.94. The overall accuracy for query formalization was 0.71 in an end-to-end evaluation. Conclusions This study presents EliIE, an OMOP CDM–based information extraction system for automatic structuring and formalization of free-text EC. According to our evaluation, machine learning-based EliIE outperforms existing systems and shows promise to improve.

1995 ◽  
Vol 1 (4) ◽  
pp. 363-388 ◽  
Author(s):  
R. Evans ◽  
R. Gaizauskas ◽  
L. J. Cahill ◽  
J. Walker ◽  
J. Richardson ◽  
...  

AbstractThe Portable Extendable Traffic Information Collator (POETIC) is an information extraction system that extracts traffic information from free text occurring in police incident logs and initiates (simulated) broadcasts of traffic bulletins to motorists when appropriate. POETIC is a second stage prototype system; the initial prototype (TIC, see Evans and Hartley 1990) was limited to the practices and requirements of a single police force. In POETIC, the architecture and data representations have been generalised to make the system tailorable to many different police force ‘domains’. In this paper we describe these developments, and report on tests of the system on authentic input data from three police domains.


2018 ◽  
Vol 25 (2) ◽  
pp. 287-306 ◽  
Author(s):  
Cleiton Fernando Lima Sena ◽  
Daniela Barreiro Claro

AbstractNowadays, there is an increasing amount of digital data. In the case of the Web, daily, a vast collection of data is generated, whose contents are heterogeneous. A significant portion of this data is available in a natural language format. Open Information Extraction (Open IE) enables the extraction of facts from large quantities of texts written in natural language. In this work, we propose an Open IE method to extract facts from texts written in Portuguese. We developed two new rules that generalize the inference by transitivity and by symmetry. Consequently, this approach increases the number of implicit facts in a sentence. Our novel symmetric inference approach is based on a list of symmetric features. Our results confirmed that our method outstands close works both in precision and number of valid extractions. Considering the number of minimal facts, our approach is equivalent to the most relevant methods in the literature.


Author(s):  
Shuang Peng ◽  
Mengdi Zhou ◽  
Minghui Yang ◽  
Haitao Mi ◽  
Shaosheng Cao ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document