scholarly journals The structural extraction of Chinese medical narratives

2018 ◽  
Author(s):  
Rongzhi Zhang ◽  
Haifei Zhang ◽  
Zhiyu Yao ◽  
Zhengxing Huang

AbstractMedical narratives document a vast amount of clinical data. This data has a valuable secondary purpose, as it may be used to optimize health service delivery and improve the quality of medical care. However, medical narratives are typically recorded in an unstructured manner, which complicates the process of extracting the structured information required for optimization. In this paper, we address this problem by applying and comparing two models, a rule-based model and a model based on conditional random fields (CRFs), to a data set of Chinese medical narratives. Among 4626 manually annotated Chinese medical narratives, collected from Shanxi Dayi Hospital in China, the rule-based model achieved 95.87% precision, 69.82% recall, and an F-score of 80.80%, and the CRF-based model realized 95.99% precision, 65.11% recall, and a 77.59% F-score. These experimental results demonstrate the efficacy of both proposed models for structural extraction from Chinese medical narratives.

2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Abbas Akkasi ◽  
Ekrem Varoğlu ◽  
Nazife Dimililer

Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.


2021 ◽  
Vol 13 (3) ◽  
pp. 465
Author(s):  
Shuyang Wang ◽  
Xiaodong Mu ◽  
Dongfang Yang ◽  
Hao He ◽  
Peng Zhao

Road extraction from remote sensing images is of great significance to urban planning, navigation, disaster assessment, and other applications. Although deep neural networks have shown a strong ability in road extraction, it remains a challenging task due to complex circumstances and factors such as occlusion. To improve the accuracy and connectivity of road extraction, we propose an inner convolution integrated encoder-decoder network with the post-processing of directional conditional random fields. Firstly, we design an inner convolutional network which can propagate information slice-by-slice within feature maps, thus enhancing the learning of road topology and linear features. Additionally, we present the directional conditional random fields to improve the quality of the extracted road by adding the direction of roads to the energy function of the conditional random fields. The experimental results on the Massachusetts road dataset show that the proposed approach achieves high-quality segmentation results, with the F1-score of 84.6%, which outperforms other comparable “state-of-the-art” approaches. The visualization results prove that the proposed approach is able to effectively extract roads from remote sensing images and can solve the road connectivity problem produced by occlusions to some extent.


2020 ◽  
Vol 10 (7) ◽  
pp. 2303 ◽  
Author(s):  
Mariana Dias ◽  
João Boné ◽  
João C. Ferreira ◽  
Ricardo Ribeiro ◽  
Rui Maia

The process of protecting sensitive data is continually growing and becoming increasingly important, especially as a result of the directives and laws imposed by the European Union. The effort to create automatic systems is continuous, but, in most cases, the processes behind them are still manual or semi-automatic. In this work, we have developed a component that can extract and classify sensitive data, from unstructured text information in European Portuguese. The objective was to create a system that allows organizations to understand their data and comply with legal and security purposes. We studied a hybrid approach to the problem of Named Entity Recognition for the Portuguese language. This approach combines several techniques such as rule-based/lexical-based models, machine learning algorithms, and neural networks. The rule-based and lexical-based approaches were used only for a set of specific classes. For the remaining classes of entities, two statistical models were tested—Conditional Random Fields and Random Forest and, finally, a Bidirectional-LSTM approach as experimented. Regarding the statistical models, we realized that Conditional Random Fields is the one that can obtain the best results, with a f1-score of 65.50%. With the Bi-LSTM approach, we have achieved a result of 83.01%. The corpora used for training and testing were HAREM Golden Collection, SIGARRA News Corpus, and DataSense NER Corpus.


Author(s):  
Oliver Ray ◽  
Amy Conroy ◽  
Rozano Imansyah

This paper introduces a method called SUmmarisation with Majority Opinion (SUMO) that integrates and extends two prior approaches for abstractively and extractively summarising UK House of Lords cases. We show how combining two previously distinct lines of work allows us to better address the challenges resulting from this court’s unusual tradition of publishing the opinions of multiple judges with no formal statement of the reasoning (if any) agreed by a majority. We do this by applying natural language processing and machine learning, Conditional Random Fields (CRFs), to a data set we created by fusing together expert-annotated sentence labels from the HOLJ corpus of rhetorical role summary relevance with the ASMO corpus of agreement statement and majority opinion. By using CRFs and a bespoke summary generator on our enriched data set, we show a significant quantitative F1-score improvement in rhetorical role and relevance classification of 10–15% over the state-of-the-art SUM system; and we show a significant qualitative improvement in the quality of our summaries, which closely resemble gold-standard multi-judge abstracts according to a proof-of-principle user study.


Author(s):  
Yasanthi Hirimutugoda

Proteins are the workhorses of the cell that perform biological functions by interacting with other proteins. Many statistical methods for protein-protein interaction (PPI) have been studied without considering time-dependent changes in networks and the functionalities. I introduced a novel method that models PPI networks as being dynamic in nature and evolving time-varying multivariate distribution with Conditional Random Fields (CRF). This research is directed towards implementing this new combinatorial algorithm on massively parallel architectures such as Graphics Processing Units (GPUs) for efficient computations for large scale bioinformatics datasets. I compared Conditional Random Fields (CRF) and the proposed novel method using CRF combined with the Block Coordinate Descent algorithm for human protein-protein interaction data set. Both are implemented on GPU-Accelerated Computing Architecture and the proposed novel method showed the advantages in predicting protein-protein interaction sites. I also show that the proposed approach is more efficient in 6.13% than standalone CRF++ in predicting protein-protein interaction sites.


2020 ◽  
Vol 26 (6) ◽  
pp. 677-690
Author(s):  
Kareem Darwish ◽  
Mohammed Attia ◽  
Hamdy Mubarak ◽  
Younes Samih ◽  
Ahmed Abdelali ◽  
...  

AbstractThis work introduces robust multi-dialectal part of speech tagging trained on an annotated data set of Arabic tweets in four major dialect groups: Egyptian, Levantine, Gulf, and Maghrebi. We implement two different sequence tagging approaches. The first uses conditional random fields (CRFs), while the second combines word- and character-based representations in a deep neural network with stacked layers of convolutional and recurrent networks with a CRF output layer. We successfully exploit a variety of features that help generalize our models, such as Brown clusters and stem templates. Also, we develop robust joint models that tag multi-dialectal tweets and outperform uni-dialectal taggers. We achieve a combined accuracy of 92.4% across all dialects, with per dialect results ranging between 90.2% and 95.4%. We obtained the results using a train/dev/test split of 70/10/20 for a data set of 350 tweets per dialect.


Author(s):  
Nikitin A.E. ◽  
Znamenskiy I.А ◽  
Shikhova Yu.A. ◽  
Kuzmina I.V. ◽  
Melchenko D.S. ◽  
...  

This study provides a retrospective analysis of work to ensure high quality of medical care in an unfavorable epidemic situation. The consequence of COVID-19 was the implementation of a program to prevent the spread of infection, the re-profiling of medical institu-tions, and the introduction of restrictive and anti-epidemic measures. The experience of our work has shown the effectiveness of changing the order of med-ical care, the organization of the functioning of de-partments and patient routing. The study reflects the measures implemented in the hospital departments, the Department of clinical and laboratory diagnostics, radiation diagnostics and pathology Department. To ensure the safety of patients, it was decided to place patients on a single bed according to the type of infec-tious boxes. The safety of employees was ensured by the use of personal protective equipment, minimiza-tion of contact time with patients, and preventive weekly examination of staff for SARS-CoV-2 infection. The organized and well-coordinated work of the en-tire staff of the institution made it possible to prevent the spread of COVID-19 among employees, to detect cases of infection in a timely manner, and to carry out appropriate isolation and monitoring measures. At the time of completion of infectious diseases departments, the mortality rate among patients was less than 9%. Our experience in reorganizing a multi-specialty facil-ity can be used in the future when working with pa-tients who have undergone COVID-19, as well as in the context of a worsening epidemic situation.


Sign in / Sign up

Export Citation Format

Share Document