scholarly journals Improvement in Domain-Specific Named Entity Recognition by Utilizing the Real-World Data

2017 ◽  
Vol 24 (5) ◽  
pp. 655-668
Author(s):  
Suzushi Tomori ◽  
Takashi Ninomiya ◽  
Shinsuke Mori
BMC Cancer ◽  
2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Tae-Hwan Kim ◽  
Hun Do Cho ◽  
Yong Won Choi ◽  
Hyun Woo Lee ◽  
Seok Yun Kang ◽  
...  

Abstract Background Since the results of the ToGA trial were published, trastuzumab-based chemotherapy has been used as the standard first-line treatment for HER2-positive recurrent or primary metastatic gastric cancer (RPMGC). However, the real-world data has been rarely reported. Therefore, we investigated the outcomes of trastuzumab-based chemotherapy in a single center. Methods This study analyzed the real-world data of 47 patients with HER2-positive RPMGC treated with trastuzumab-based chemotherapy in a single institution. Results With the median follow-up duration of 18.8 months in survivors, the median overall survival (OS) and progression-free survival were 12.8 and 6.9 months, respectively, and the overall response rate was 64%. Eastern Cooperative Oncology Group performance status 2 and massive amount of ascites were independent poor prognostic factors for OS, while surgical resection before or after chemotherapy was associated with favorable OS, in multivariate analysis. In addition, 5 patients who underwent conversion surgery after chemotherapy demonstrated an encouraging median OS of 30.8 months, all with R0 resection. Conclusions Trastuzumab-based chemotherapy in patients with HER2-positive RPMGC in the real world demonstrated outcomes almost comparable to those of the ToGA trial. Moreover, conversion surgery can be actively considered in fit patients with a favorable response after trastuzumab-based chemotherapy.


Named Entity Recognition is the process wherein named entities which are designators of a sentence are identified. Designators of a sentence are domain specific. The proposed system identifies named entities in Malayalam language belonging to tourism domain which generally includes names of persons, places, organizations, dates etc. The system uses word, part of speech and lexicalized features to find the probability of a word belonging to a named entity category and to do the appropriate classification. Probability is calculated based on supervised machine learning using word and part of speech features present in a tagged training corpus and using certain rules applied based on lexicalized features.


2020 ◽  
Author(s):  
Usman Naseem ◽  
Matloob Khushi ◽  
Vinay Reddy ◽  
Sakthivel Rajendran ◽  
Imran Razzak ◽  
...  

Abstract Background: In recent years, with the growing amount of biomedical documents, coupled with advancement in natural language processing algorithms, the research on biomedical named entity recognition (BioNER) has increased exponentially. However, BioNER research is challenging as NER in the biomedical domain are: (i) often restricted due to limited amount of training data, (ii) an entity can refer to multiple types and concepts depending on its context and, (iii) heavy reliance on acronyms that are sub-domain specific. Existing BioNER approaches often neglect these issues and directly adopt the state-of-the-art (SOTA) models trained in general corpora which often yields unsatisfactory results. Results: We propose biomedical ALBERT (A Lite Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) - bioALBERT - an effective domain-specific pre-trained language model trained on huge biomedical corpus designed to capture biomedical context-dependent NER. We adopted self-supervised loss function used in ALBERT that targets on modelling inter-sentence coherence to better learn context-dependent representations and incorporated parameter reduction strategies to minimise memory usage and enhance the training time in BioNER. In our experiments, BioALBERT outperformed comparative SOTA BioNER models on eight biomedical NER benchmark datasets with four different entity types. The performance is increased for; (i) disease type corpora by 7.47% (NCBI-disease) and 10.63% (BC5CDR-disease); (ii) drug-chem type corpora by 4.61% (BC5CDR-Chem) and 3.89 (BC4CHEMD); (iii) gene-protein type corpora by 12.25% (BC2GM) and 6.42% (JNLPBA); and (iv) Species type corpora by 6.19% (LINNAEUS) and 23.71% (Species-800) is observed which leads to a state-of-the-art results. Conclusions: The performance of proposed model on four different biomedical entity types shows that our model is robust and generalizable in recognizing biomedical entities in text. We trained four different variants of BioALBERT models which are available for the research community to be used in future research.


2018 ◽  
Vol 44 (8) ◽  
pp. 1191-1198 ◽  
Author(s):  
Alberto Carmona-Bayonas ◽  
Paula Jiménez-Fonseca ◽  
Isabel Echavarria ◽  
Manuel Sánchez Cánovas ◽  
Gema Aguado ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document