sentence boundary
Recently Published Documents


TOTAL DOCUMENTS

97
(FIVE YEARS 21)

H-INDEX

11
(FIVE YEARS 1)

2022 ◽  
Vol 3 (1) ◽  
pp. 1-14
Author(s):  
Kahyun Lee ◽  
Mehmet Kayaalp ◽  
Sam Henry ◽  
Özlem Uzuner

Many modern entity recognition systems, including the current state-of-the-art de-identification systems, are based on bidirectional long short-term memory (biLSTM) units augmented by a conditional random field (CRF) sequence optimizer. These systems process the input sentence by sentence. This approach prevents the systems from capturing dependencies over sentence boundaries and makes accurate sentence boundary detection a prerequisite. Since sentence boundary detection can be problematic especially in clinical reports, where dependencies and co-references across sentence boundaries are abundant, these systems have clear limitations. In this study, we built a new system on the framework of one of the current state-of-the-art de-identification systems, NeuroNER, to overcome these limitations. This new system incorporates context embeddings through forward and backward n -grams without using sentence boundaries. Our context-enhanced de-identification (CEDI) system captures dependencies over sentence boundaries and bypasses the sentence boundary detection problem altogether. We enhanced this system with deep affix features and an attention mechanism to capture the pertinent parts of the input. The CEDI system outperforms NeuroNER on the 2006 i2b2 de-identification challenge dataset, the 2014 i2b2 shared task de-identification dataset, and the 2016 CEGS N-GRID de-identification dataset ( p < 0.01 ). All datasets comprise narrative clinical reports in English but contain different note types varying from discharge summaries to psychiatric notes. Enhancing CEDI with deep affix features and the attention mechanism further increased performance.


2021 ◽  
Vol 4 (1) ◽  
pp. 38
Author(s):  
Joan Santoso ◽  
Esther Irawati Setiawan ◽  
Christian Nathaniel Purwanto ◽  
Fachrul Kurniawan

Detecting the sentence boundary is one of the crucial pre-processing steps in natural language processing. It can define the boundary of a sentence since the border between a sentence, and another sentence might be ambiguous. Because there are multiple separators and dynamic sentence patterns, using a full stop at the end of a sentence is sometimes inappropriate. This research uses a deep learning approach to split each sentence from an Indonesian news document. Hence, there is no need to define any handcrafted features or rules. In Part of Speech Tagging and Named Entity Recognition, we use sequence labeling to determine sentence boundaries. Two labels will be used, namely O as a non-boundary token and E as the last token marker in the sentence. To do this, we used the Bi-LSTM approach, which has been widely used in sequence labeling. We have proved that our approach works for Indonesian text using pre-trained embedding in Indonesian, as in previous studies. This study achieved an F1-Score value of 98.49 percent. When compared to previous studies, the achieved performance represents a significant increase in outcomes..


Author(s):  
Asma Mekki ◽  
Inès Zribi ◽  
Mariem Ellouze ◽  
Lamia Hadrich Belguith

Author(s):  
Natalia Nikolenkova ◽  

The article deals with the punctuation of Church Slavonic texts in the mid-17th century and the process of forming a sentence as a unit of written text that begins at this time. First of all, it is shown that the rules for the marking sentence boundary are described in the Grammar of 1648: this is the use of the dot sign and the use of a capital letter after it. Examining the translation of Blau’s Atlas made in the second half of the 50s in Moscow, we find that both the translators and the scribes of the whitewash copies sought to comply with these new norms. The result of the activity of the scribes influenced the punctuation of Church Slavonic printed books of the second half of the 17th century.


2020 ◽  
Vol 35 (10) ◽  
pp. 1239-1256 ◽  
Author(s):  
Anne Helder ◽  
Charles A. Perfetti ◽  
Paul van den Broek
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document