sequence labeling
Recently Published Documents


TOTAL DOCUMENTS

273
(FIVE YEARS 153)

H-INDEX

13
(FIVE YEARS 5)

Author(s):  
Xianwen Liao ◽  
Yongzhong Huang ◽  
Peng Yang ◽  
Lei Chen

By defining the computable word segmentation unit and studying its probability characteristics, we establish an unsupervised statistical language model (SLM) for a new pre-trained sequence labeling framework in this article. The proposed SLM is an optimization model, and its objective is to maximize the total binding force of all candidate word segmentation units in sentences under the condition of no annotated datasets and vocabularies. To solve SLM, we design a recursive divide-and-conquer dynamic programming algorithm. By integrating SLM with the popular sequence labeling models, Vietnamese word segmentation, part-of-speech tagging and named entity recognition experiments are performed. The experimental results show that our SLM can effectively promote the performance of sequence labeling tasks. Just using less than 10% of training data and without using a dictionary, the performance of our sequence labeling framework is better than the state-of-the-art Vietnamese word segmentation toolkit VnCoreNLP on the cross-dataset test. SLM has no hyper-parameter to be tuned, and it is completely unsupervised and applicable to any other analytic language. Thus, it has good domain adaptability.


Author(s):  
Hu Feifei ◽  
Zeng Shibo ◽  
Hong Danke ◽  
Zhang Situo ◽  
Song yongwei ◽  
...  

As the decision-making brain for power system operation, grid regulation and operation is a comprehensive decision-making control that combines a large amount of data, mechanism analysis, operating procedures and professional experience, and a new generation of artificial intelligence development ideas and evolution characterized by data-driven and knowledge-guided. The directions are very close. However, the current scheduling control is still based on experience and manual analysis. The massive and diverse data of the control center and the lack of logical models between the plans require a large amount of experience and knowledge associations by the control personnel. There are more repetitive human brain labor and relatively low intelligence. Therefore, deep learning is applied to the learning of power control knowledge, and a semantic understanding network based on deep Long Short Term Memory is proposed. It uses sequence labeling to extract in-depth semantic related information of different keywords and query questions, and finds key information about language problems in order to achieve fine-grained and precise query. Experiments show that the proposed network model is superior to the previous methods, and it achieves better performance in the joint extraction of fine-grained evaluation words and evaluation objects, extracts the key information and deep semantic information of query problems and corresponding cases, and realizes power scheduling based on voice interaction The model can be effectively applied in the field of power dispatching and solve a large number of problems in power dispatching and control.


2021 ◽  
Vol 22 (S1) ◽  
Author(s):  
Ying Xiong ◽  
Shuai Chen ◽  
Buzhou Tang ◽  
Qingcai Chen ◽  
Xiaolong Wang ◽  
...  

Abstract Background Biomedical named entity recognition (NER) is a fundamental task of biomedical text mining that finds the boundaries of entity mentions in biomedical text and determines their entity type. To accelerate the development of biomedical NER techniques in Spanish, the PharmaCoNER organizers launched a competition to recognize pharmacological substances, compounds, and proteins. Biomedical NER is usually recognized as a sequence labeling task, and almost all state-of-the-art sequence labeling methods ignore the meaning of different entity types. In this paper, we investigate some methods to introduce the meaning of entity types in deep learning methods for biomedical NER and apply them to the PharmaCoNER 2019 challenge. The meaning of each entity type is represented by its definition information. Material and method We investigate how to use entity definition information in the following two methods: (1) SQuad-style machine reading comprehension (MRC) methods that treat entity definition information as query and biomedical text as context and predict answer spans as entities. (2) Span-level one-pass (SOne) methods that predict entity spans of one type by one type and introduce entity type meaning, which is represented by entity definition information. All models are trained and tested on the PharmaCoNER 2019 corpus, and their performance is evaluated by strict micro-average precision, recall, and F1-score. Results Entity definition information brings improvements to both SQuad-style MRC and SOne methods by about 0.003 in micro-averaged F1-score. The SQuad-style MRC model using entity definition information as query achieves the best performance with a micro-averaged precision of 0.9225, a recall of 0.9050, and an F1-score of 0.9137, respectively. It outperforms the best model of the PharmaCoNER 2019 challenge by 0.0032 in F1-score. Compared with the state-of-the-art model without using manually-crafted features, our model obtains a 1% improvement in F1-score, which is significant. These results indicate that entity definition information is useful for deep learning methods on biomedical NER. Conclusion Our entity definition information enhanced models achieve the state-of-the-art micro-average F1 score of 0.9137, which implies that entity definition information has a positive impact on biomedical NER detection. In the future, we will explore more entity definition information from knowledge graph.


2021 ◽  
Author(s):  
Shubo Tian ◽  
Pengfei Yin ◽  
Hansi Zhang ◽  
Arslan Erdengasileng ◽  
Jiang Bian ◽  
...  

To enable electronic screening of eligible patients for clinical trials, free-text clinical trial eligibility criteria should be translated to a computable format. Natural language processing (NLP) techniques have the potential to automate this process. In this study, we explored a supervised multi-input multi-output (MIMO) sequence labeling model to parse eligibility criteria into combinations of fact and condition tuples. Our experiments on a small manually annotated training dataset showed that that the performance of the MIMO framework with a BERT-based encoder using all the input sequences achieved an overall lenient-level AUROC of 0.61. Although the performance is suboptimal, representing eligibility criteria into logical and semantically clear tuples can potentially make subsequent translation of these tuples into database queries more reliable.


Author(s):  
Minxiang Ye ◽  
Vladimir Stankovic ◽  
Lina Stankovic ◽  
Srdjan Lulic ◽  
Andras Anderla ◽  
...  

2021 ◽  
Author(s):  
Dongyub Lee ◽  
Byeongil Ko ◽  
Myeong Cheol Shin ◽  
Taesun Whang ◽  
Daniel Lee ◽  
...  
Keyword(s):  

2021 ◽  
Author(s):  
Shunxiang Zhang ◽  
Han qing Xu ◽  
Guang li Zhu ◽  
Xiang Chen ◽  
Kuang Ching Li

Abstract New sentiment words in product reviews are valuable resources that are directly close to users. The data processing of new sentiment word extraction can provide information service better for users, and provide theoretical support for the related research of edge computing. Traditional methods for extracting new sentiment words generally ignored the context and syntactic information, which leads to the low accuracy and recall rate in the process of extracting new sentiment words. To tackle the mentioned issue, we proposed a data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews. Firstly, the probability that the new word is a sentiment word is calculated through the location rules derived from the sequence labeling result, and the candidate set of new sentiment words is obtained according to the probability. Then, the candidate set of new sentiment words is supplemented with the method of matching appositive words based on edit distance. Finally, the final set of new sentiment words is collected through fine-grained filtering, including the calculation of Point Mutual Information (PMI) and difference coefficient of positive and negative corpus (DC-PNC). The experimental results illustrate the effectiveness of new sentiment words extracted by the proposed method which can obviously improve the accuracy and recall rate of sentiment analysis.


Author(s):  
Yaqing Wang ◽  
Subhabrata Mukherjee ◽  
Haoda Chu ◽  
Yuancheng Tu ◽  
Ming Wu ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document