An end-to-end joint model for evidence information extraction from court record document

2020 ◽  
Vol 57 (6) ◽  
pp. 102305 ◽  
Author(s):  
Donghong Ji ◽  
Peng Tao ◽  
Hao Fei ◽  
Yafeng Ren
2017 ◽  
Author(s):  
Rasmus Berg Palm ◽  
Dirk Hovy ◽  
Florian Laws ◽  
Ole Winther

2020 ◽  
Author(s):  
Xi Yang ◽  
Hansi Zhang ◽  
Xing He ◽  
Jiang Bian ◽  
Yonghui Wu

BACKGROUND Patients’ family history (FH) is a critical risk factor associated with numerous diseases. However, FH information is not well captured in the structured database but often documented in clinical narratives. Natural language processing (NLP) is the key technology to extract patients’ FH from clinical narratives. In 2019, the National NLP Clinical Challenge (n2c2) organized shared tasks to solicit NLP methods for FH information extraction. OBJECTIVE This study presents our end-to-end FH extraction system developed during the 2019 n2c2 open shared task as well as the new transformer-based models that we developed after the challenge. We seek to develop a machine learning–based solution for FH information extraction without task-specific rules created by hand. METHODS We developed deep learning–based systems for FH concept extraction and relation identification. We explored deep learning models including long short-term memory-conditional random fields and bidirectional encoder representations from transformers (BERT) as well as developed ensemble models using a majority voting strategy. To further optimize performance, we systematically compared 3 different strategies to use BERT output representations for relation identification. RESULTS Our system was among the top-ranked systems (3 out of 21) in the challenge. Our best system achieved micro-averaged F1 scores of 0.7944 and 0.6544 for concept extraction and relation identification, respectively. After challenge, we further explored new transformer-based models and improved the performances of both subtasks to 0.8249 and 0.6775, respectively. For relation identification, our system achieved a performance comparable to the best system (0.6810) reported in the challenge. CONCLUSIONS This study demonstrated the feasibility of utilizing deep learning methods to extract FH information from clinical narratives.


Author(s):  
Muhammad Zeshan Afzal ◽  
Khurram Azeem Hashmi ◽  
Alain Pagani ◽  
Marcus Liwicki ◽  
Didier Stricker

This work presents an approach for detecting mathematical formulas in scanned document images. The proposed approach is end-to-end trainable. Since many OCR engines cannot reliably work with the formulas, it is essential to isolate them to obtain the clean text for information extraction from the document. Our proposed pipeline comprises a hybrid task cascade network with deformable convolutions and a Resnext101 backbone. Both of these modifications help in better detection. We evaluate the proposed approaches on the ICDAR-2017 POD and Marmot datasets and achieve an overall accuracy of 96% for the ICDAR-2017 POD dataset. We achieve an overall reduction of error of 13%. Furthermore, the results on Marmot datasets are improved for the isolated and embedded formulas. We achieved an accuracy of 98.78% for the isolated formula and 90.21% overall accuracy for embedded formulas. Consequently, it results in an error reduction rate of 43% for isolated and 17.9% for embedded formulas.


Author(s):  
Zekai Wang ◽  
Hongzhi Liu ◽  
Yingpeng Du ◽  
Zhonghai Wu ◽  
Xing Zhang

Most of heterogeneous information network (HIN) based recommendation models are based on the user and item modeling with meta-paths. However, they always model users and items in isolation under each meta-path, which may lead to information extraction misled. In addition, they only consider structural features of HINs when modeling users and items during exploring HINs, which may lead to useful information for recommendation lost irreversibly. To address these problems, we propose a HIN based unified embedding model for recommendation, called HueRec. We assume there exist some common characteristics under different meta-paths for each user or item, and use data from all meta-paths to learn unified users’ and items’ representations. So the interrelation between meta-paths are utilized to alleviate the problems of data sparsity and noises on one meta-path. Different from existing models which first explore HINs then make recommendations, we combine these two parts into an end-to-end model to avoid useful information lost in initial phases. In addition, we embed all users, items and meta-paths into related latent spaces. Therefore, we can measure users’ preferences on meta-paths to improve the performances of personalized recommendation. Extensive experiments show HueRec consistently outperforms state-of-the-art methods.


Sensors ◽  
2019 ◽  
Vol 19 (10) ◽  
pp. 2278 ◽  
Author(s):  
Dian Zhang ◽  
Brendan Heery ◽  
Maria O’Neil ◽  
Suzanne Little ◽  
Noel E. O’Connor ◽  
...  

Understanding hydrological processes in large, open areas, such as catchments, and further modelling these processes are still open research questions. The system proposed in this work provides an automatic end-to-end pipeline from data collection to information extraction that can potentially assist hydrologists to better understand the hydrological processes using a data-driven approach. In this work, the performance of a low-cost off-the-shelf self contained sensor unit, which was originally designed and used to monitor liquid levels, such as AdBlue, fuel, lubricants etc., in a sealed tank environment, is first examined. This process validates that the sensor does provide accurate water level information for open water level monitoring tasks. Utilising the dataset collected from eight sensor units, an end-to-end pipeline of automating the data collection, data processing and information extraction processes is proposed. Within the pipeline, a data-driven anomaly detection method that automatically extracts rapid changes in measurement trends at a catchment scale. The lag-time of the test site (Dodder catchment Dublin, Ireland) is also analyzed. Subsequently, the water level response in the catchment due to storm events during the 27 month deployment period is illustrated. To support reproducible and collaborative research, the collected dataset and the source code of this work will be publicly available for research purposes.


10.2196/22982 ◽  
2020 ◽  
Vol 8 (12) ◽  
pp. e22982
Author(s):  
Xi Yang ◽  
Hansi Zhang ◽  
Xing He ◽  
Jiang Bian ◽  
Yonghui Wu

Background Patients’ family history (FH) is a critical risk factor associated with numerous diseases. However, FH information is not well captured in the structured database but often documented in clinical narratives. Natural language processing (NLP) is the key technology to extract patients’ FH from clinical narratives. In 2019, the National NLP Clinical Challenge (n2c2) organized shared tasks to solicit NLP methods for FH information extraction. Objective This study presents our end-to-end FH extraction system developed during the 2019 n2c2 open shared task as well as the new transformer-based models that we developed after the challenge. We seek to develop a machine learning–based solution for FH information extraction without task-specific rules created by hand. Methods We developed deep learning–based systems for FH concept extraction and relation identification. We explored deep learning models including long short-term memory-conditional random fields and bidirectional encoder representations from transformers (BERT) as well as developed ensemble models using a majority voting strategy. To further optimize performance, we systematically compared 3 different strategies to use BERT output representations for relation identification. Results Our system was among the top-ranked systems (3 out of 21) in the challenge. Our best system achieved micro-averaged F1 scores of 0.7944 and 0.6544 for concept extraction and relation identification, respectively. After challenge, we further explored new transformer-based models and improved the performances of both subtasks to 0.8249 and 0.6775, respectively. For relation identification, our system achieved a performance comparable to the best system (0.6810) reported in the challenge. Conclusions This study demonstrated the feasibility of utilizing deep learning methods to extract FH information from clinical narratives.


Sign in / Sign up

Export Citation Format

Share Document