scholarly journals A Context-Enhanced De-identification System

2022 ◽  
Vol 3 (1) ◽  
pp. 1-14
Author(s):  
Kahyun Lee ◽  
Mehmet Kayaalp ◽  
Sam Henry ◽  
Özlem Uzuner

Many modern entity recognition systems, including the current state-of-the-art de-identification systems, are based on bidirectional long short-term memory (biLSTM) units augmented by a conditional random field (CRF) sequence optimizer. These systems process the input sentence by sentence. This approach prevents the systems from capturing dependencies over sentence boundaries and makes accurate sentence boundary detection a prerequisite. Since sentence boundary detection can be problematic especially in clinical reports, where dependencies and co-references across sentence boundaries are abundant, these systems have clear limitations. In this study, we built a new system on the framework of one of the current state-of-the-art de-identification systems, NeuroNER, to overcome these limitations. This new system incorporates context embeddings through forward and backward n -grams without using sentence boundaries. Our context-enhanced de-identification (CEDI) system captures dependencies over sentence boundaries and bypasses the sentence boundary detection problem altogether. We enhanced this system with deep affix features and an attention mechanism to capture the pertinent parts of the input. The CEDI system outperforms NeuroNER on the 2006 i2b2 de-identification challenge dataset, the 2014 i2b2 shared task de-identification dataset, and the 2016 CEGS N-GRID de-identification dataset ( p < 0.01 ). All datasets comprise narrative clinical reports in English but contain different note types varying from discharge summaries to psychiatric notes. Enhancing CEDI with deep affix features and the attention mechanism further increased performance.

2018 ◽  
Vol 8 (5) ◽  
pp. 529-546
Author(s):  
Christofer Laurell ◽  
Sten Soderman

PurposeThe purpose of this paper is to provide a systematic review of articles on sport published in leading business studies journals within marketing, organisational studies and strategy.Design/methodology/approachBased on a review of 38 identified articles within the subfields of marketing, strategy and organisation studies published between 2000 and 2015, the articles’ topical, theoretical and methodological orientation within the studied subfields were analysed followed by a cross-subfield analysis.FindingsThe authors identify considerable differences in topical, theoretical and methodological orientation among the studied subfields’ associated articles. Overall, the authors also find that articles across all subfields tend to be focussed on contributing to mature theory, even though the subfield of marketing in particular exhibits contributions to nascent theory in contrast to organisation studies and strategy.Originality/valueThis paper contributes by illustrating the current state of research that is devoted or related to the phenomenon of sport within three subfields in business studies. Furthermore, the authors discuss the role played by leading business studies journalsvis-à-vissport sector-specific journals and offer avenues for future research.


Information ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 45 ◽  
Author(s):  
Shardrom Johnson ◽  
Sherlock Shen ◽  
Yuanchen Liu

Usually taken as linguistic features by Part-Of-Speech (POS) tagging, Named Entity Recognition (NER) is a major task in Natural Language Processing (NLP). In this paper, we put forward a new comprehensive-embedding, considering three aspects, namely character-embedding, word-embedding, and pos-embedding stitched in the order we give, and thus get their dependencies, based on which we propose a new Character–Word–Position Combined BiLSTM-Attention (CWPC_BiAtt) for the Chinese NER task. Comprehensive-embedding via the Bidirectional Llong Short-Term Memory (BiLSTM) layer can get the connection between the historical and future information, and then employ the attention mechanism to capture the connection between the content of the sentence at the current position and that at any location. Finally, we utilize Conditional Random Field (CRF) to decode the entire tagging sequence. Experiments show that CWPC_BiAtt model we proposed is well qualified for the NER task on Microsoft Research Asia (MSRA) dataset and Weibo NER corpus. A high precision and recall were obtained, which verified the stability of the model. Position-embedding in comprehensive-embedding can compensate for attention-mechanism to provide position information for the disordered sequence, which shows that comprehensive-embedding has completeness. Looking at the entire model, our proposed CWPC_BiAtt has three distinct characteristics: completeness, simplicity, and stability. Our proposed CWPC_BiAtt model achieved the highest F-score, achieving the state-of-the-art performance in the MSRA dataset and Weibo NER corpus.


2019 ◽  
Vol 55 (2) ◽  
pp. 239-269
Author(s):  
Michał Marcińczuk ◽  
Aleksander Wawer

Abstract In this article we discuss the current state-of-the-art for named entity recognition for Polish. We present publicly available resources and open-source tools for named entity recognition. The overview includes various kind of resources, i.e. guidelines, annotated corpora (NKJP, KPWr, CEN, PST) and lexicons (NELexiconS, PNET, Gazetteer). We present the major NER tools for Polish (Sprout, NERF, Liner2, Parallel LSTM-CRFs and PolDeepNer) and discuss their performance on the reference datasets. In the article we cover identification of named entity mentions in the running text, local and global entity categorization, fine- and coarse-grained categorization and lemmatization of proper names.


Author(s):  
Moemmur Shahzad ◽  
Ayesha Amin ◽  
Diego Esteves ◽  
Axel-Cyrille Ngonga Ngomo

We investigate the problem of named entity recognition in the user-generated text such as social media posts. This task is rendered particularly difficult by the restricted length and limited grammatical coherence of this data type. Current state-of-the-art approaches rely on external sources such as gazetteers to alleviate some of these restrictions. We present a neural model able to outperform state of the art on this task without recurring to gazetteers or similar external sources of information. Our approach relies on word-, character-, and sentence-level information for NER in short-text. Social media posts like tweets often have associated images that may provide auxiliary context relevant to understand these texts. Hence, we also incorporate visual information and introduce an attention component which computes attention weight probabilities over textual and text-relevant visual contexts separately. Our model outperforms the current state of the art on various NER datasets. On WNUT 2016 and 2017, our model achieved 53.48\% and 50.52\% F1 score, respectively. With Multimodal model, our system also outperforms the current SOTA with an F1 score of 74\% on the multimodal dataset. Our evaluation further suggests that our model also goes beyond the current state-of-the-art on newswire data, hence corroborating its suitability for various NER tasks.


Author(s):  
Shengqiong Wu ◽  
Hao Fei ◽  
Yafeng Ren ◽  
Donghong Ji ◽  
Jingye Li

In this paper, we propose to enhance the pair-wise aspect and opinion terms extraction (PAOTE) task by incorporating rich syntactic knowledge. We first build a syntax fusion encoder for encoding syntactic features, including a label-aware graph convolutional network (LAGCN) for modeling the dependency edges and labels, as well as the POS tags unifiedly, and a local-attention module encoding POS tags for better term boundary detection. During pairing, we then adopt Biaffine and Triaffine scoring for high-order aspect-opinion term pairing, in the meantime re-harnessing the syntax-enriched representations in LAGCN for syntactic-aware scoring. Experimental results on four benchmark datasets demonstrate that our model outperforms current state-of-the-art baselines, meanwhile yielding explainable predictions with syntactic knowledge.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Marco Humbel ◽  
Julianne Nyhan ◽  
Andreas Vlachidis ◽  
Kim Sloan ◽  
Alexandra Ortolja-Baird

PurposeBy mapping-out the capabilities, challenges and limitations of named-entity recognition (NER), this article aims to synthesise the state of the art of NER in the context of the early modern research field and to inform discussions about the kind of resources, methods and directions that may be pursued to enrich the application of the technique going forward.Design/methodology/approachThrough an extensive literature review, this article maps out the current capabilities, challenges and limitations of NER and establishes the state of the art of the technique in the context of the early modern, digitally augmented research field. It also presents a new case study of NER research undertaken by Enlightenment Architectures: Sir Hans Sloane's Catalogues of his Collections (2016–2021), a Leverhulme funded research project and collaboration between the British Museum and University College London, with contributing expertise from the British Library and the Natural History Museum.FindingsCurrently, it is not possible to benchmark the capabilities of NER as applied to documents of the early modern period. The authors also draw attention to the situated nature of authority files, and current conceptualisations of NER, leading them to the conclusion that more robust reporting and critical analysis of NER approaches and findings is required.Research limitations/implicationsThis article examines NER as applied to early modern textual sources, which are mostly studied by Humanists. As addressed in this article, detailed reporting of NER processes and outcomes is not necessarily valued by the disciplines of the Humanities, with the result that it can be difficult to locate relevant data and metrics in project outputs. The authors have tried to mitigate this by contacting projects discussed in this paper directly, to further verify the details they report here.Practical implicationsThe authors suggest that a forum is needed where tools are evaluated according to community standards. Within the wider NER community, the MUC and ConLL corpora are used for such experimental set-ups and are accompanied by a conference series, and may be seen as a useful model for this. The ultimate nature of such a forum must be discussed with the whole research community of the early modern domain.Social implicationsNER is an algorithmic intervention that transforms data according to certain rules-, patterns- or training data and ultimately affects how the authors interpret the results. The creation, use and promotion of algorithmic technologies like NER is not a neutral process, and neither is their output A more critical understanding of the role and impact of NER on early modern documents and research and focalization of some of the data- and human-centric aspects of NER routines that are currently overlooked are called for in this paper.Originality/valueThis article presents a state of the art snapshot of NER, its applications and potential, in the context of early modern research. It also seeks to inform discussions about the kinds of resources, methods and directions that may be pursued to enrich the application of NER going forward. It draws attention to the situated nature of authority files, and current conceptualisations of NER, and concludes that more robust reporting of NER approaches and findings are urgently required. The Appendix sets out a comprehensive summary of digital tools and resources surveyed in this article.


2014 ◽  
Vol 27 (8) ◽  
pp. 1257-1264 ◽  
Author(s):  
John Dumay

Purpose – The purpose of this paper is to offer reflections and critique not only on the current state of the art for intellectual capital research (ICR) from an interdisciplinary accounting research (IAR) perspective, but also its future directions. Design/methodology/approach – This paper offers a critical reflection based on the author's observations as an IC researcher, reviewer and editor. The author also supports the arguments with some evidence from the research about IC research. Findings – The author argues that most ICR is falling short of achieving “the most advanced level of knowledge and technology” of the art because it inherits flaws from prior research, thus threatening its legitimacy and impact. Research limitations/implications – The author argues that researchers need to go back to the methodological drawing board when designing IAR so future research can achieve its full potential. To do so researchers also need their research to be transformational to engender change, and to be transdisciplinary, which encompasses research beyond the current boundaries of accounting and management. Originality/value – The author identifies and introduces three research shortcuts that prevent ICR projects from being state of the art being copycat, Furphy and technophobic research which provide insights into why not all ICR research is not “state of the art”.


2019 ◽  
Vol 36 (4) ◽  
pp. 465-491 ◽  
Author(s):  
Gianluca Piero Maria Virgilio

Purpose The purpose of this paper is to provide the current state of knowledge about the Flash Crash. It has been one of the remarkable events of the decade and its causes are still a matter of debate. Design/methodology/approach This paper reviews the literature since the early days to most recent findings, and critically compares the most important hypotheses about the possible causes of the crisis. Findings Among the causes of the Flash Crash, the literature has propsed the following: a large selling program triggering the sales wave, small but not negligible delays suffered by the exchange computers, the micro-structure of the financial markets, the price fall leading to margin cover and forced sales, some types of feedback loops leading to downward price spiral, stop-loss orders coupled with scarce liquidity that triggered price reduction. On its turn leading to further stop-loss activation, the use of Intermarket Sweep Orders, that is, orders that sacrificed search for the best price to speed of execution, and dumb algorithms. Originality/value The results of the previous section are condensed in a set of policy implications and recommendations.


2019 ◽  
Vol 26 (6) ◽  
pp. 1505-1523 ◽  
Author(s):  
Peyman Badakhshan ◽  
Kieran Conboy ◽  
Thomas Grisold ◽  
Jan vom Brocke

Purpose Business Process Management (BPM) is key for successful organisational management. However, BPM techniques are often criticized for their inability to deal with continuous and significant change and uncertainty. Following recent calls to make BPM more agile and flexible towards change, this study presents the results of a systematic literature review (SLR) of agile concepts in BPM. Analysing and synthesising previous works and drawing on agility research in the field of IS, this paper introduces a framework for agile BPM. Integrating different components that define agility in the context of BPM, this framework offers a number of important implications. On the theoretical side, the authors argue that the concept of agile BPM departs in some important ways from traditional BPM research. This, in turn, points to various opportunities for future research. On the practical side, the authors suggest that emerging technologies, such as process mining, embody important features that help organisations to be more responsive to change. The paper aims to discuss these issues. Design/methodology/approach To assess the state of the art of agility in the BPM research, the authors conducted an SLR. More specifically, the authors drew on the approach of vom Brocke et al. (2009, 2015), which consists of five steps: defining the scope of the review; conceptualising the topic; searching for literature; analysing and synthesising literature; and developing a research agenda. Findings This study presents the results of a systematic review of agile concepts in BPM. This study then proposes a resulting research framework that can be used to strengthen the concept of agile BPM and provides an agenda for research in this rapidly growing and increasingly necessary area of BPM. Originality/value In this paper, the authors establish a shared understanding of agile BPM and develop an agile BPM framework that represents the current state as well as implications for research and practice in agile BPM.


Author(s):  
Greg Durrett ◽  
Dan Klein

We present a joint model of three core tasks in the entity analysis stack: coreference resolution (within-document clustering), named entity recognition (coarse semantic typing), and entity linking (matching to Wikipedia entities). Our model is formally a structured conditional random field. Unary factors encode local features from strong baselines for each task. We then add binary and ternary factors to capture cross-task interactions, such as the constraint that coreferent mentions have the same semantic type. On the ACE 2005 and OntoNotes datasets, we achieve state-of-the-art results for all three tasks. Moreover, joint modeling improves performance on each task over strong independent baselines.


Sign in / Sign up

Export Citation Format

Share Document