Feasibility of Feature-based Indexing, Clustering, and Search of Clinical Trials

SummaryBackground: When standard therapies fail, clinical trials provide experimental treatment opportunities for patients with drug-resistant illnesses or terminal diseases. Clinical Trials can also provide free treatment and education for individuals who otherwise may not have access to such care. To find relevant clinical trials, patients often search online; however, they often encounter a significant barrier due to the large number of trials and in-effective indexing methods for reducing the trial search space.Objectives: This study explores the feasibility of feature-based indexing, clustering, and search of clinical trials and informs designs to automate these processes.Methods: We decomposed 80 randomly selected stage III breast cancer clinical trials into a vector of eligibility features, which were organized into a hierarchy. We clustered trials based on their eligibility feature similarities. In a simulated search process, manually selected features were used to generate specific eligibility questions to filter trials iteratively.Results: We extracted 1,437 distinct eligi -bility features and achieved an inter-rater agreement of 0.73 for feature extraction for 37 frequent features occurring in more than 20 trials. Using all the 1,437 features we stratified the 80 trials into six clusters containing trials recruiting similar patients by patient-characteristic features, five clusters by disease-characteristic features, and two clusters by mixed features. Most of the features were mapped to one or more Unified Medical Language System (UMLS) concepts, demonstrating the utility of named entity recognition prior to mapping with the UMLS for automatic feature extraction.Conclusions: It is feasible to develop feature-based indexing and clustering methods for clinical trials to identify trials with similar target populations and to improve trial search efficiency.

Download Full-text

A Feature Based Simple Machine Learning Approach with Word Embeddings to Named Entity Recognition on Tweets

Natural Language Processing and Information Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-319-59569-6_30 ◽

2017 ◽

pp. 254-259 ◽

Cited By ~ 2

Author(s):

Mete Taşpınar ◽

Murat Can Ganiz ◽

Tankut Acarman

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Approach ◽

Word Embeddings ◽

Named Entity ◽

Simple Machine ◽

Machine Learning Approach ◽

Feature Based

Download Full-text

A Word Similarity Feature-based Semi-supervised Approach for Named Entity Recognition

2019 International Conference on System Science and Engineering (ICSSE) ◽

10.1109/icsse.2019.8823539 ◽

2019 ◽

Author(s):

Ze Wang ◽

Zhongyang Han ◽

Jun Zhao ◽

Wei Wang ◽

Feng Jin

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Word Similarity ◽

Named Entity ◽

Feature Based

Download Full-text

A Sequence-to-Set Network for Nested Named Entity Recognition

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/542 ◽

2021 ◽

Author(s):

Zeqi Tan ◽

Yongliang Shen ◽

Shuai Zhang ◽

Weiming Lu ◽

Yueting Zhuang

Keyword(s):

Language Processing ◽

Named Entity Recognition ◽

Recognition Task ◽

Search Space ◽

Entity Recognition ◽

Bipartite Matching ◽

Named Entity ◽

Proposed Model ◽

Fixed Set ◽

Sequence Method

Named entity recognition (NER) is a widely studied task in natural language processing. Recently, a growing number of studies have focused on the nested NER. The span-based methods, considering the entity recognition as a span classification task, can deal with nested entities naturally. But they suffer from the huge search space and the lack of interactions between entities. To address these issues, we propose a novel sequence-to-set neural network for nested NER. Instead of specifying candidate spans in advance, we provide a fixed set of learnable vectors to learn the patterns of the valuable spans. We utilize a non-autoregressive decoder to predict the final set of entities in one pass, in which we are able to capture dependencies between entities. Compared with the sequence-to-sequence method, our model is more suitable for such unordered recognition task as it is insensitive to the label order. In addition, we utilize the loss function based on bipartite matching to compute the overall training loss. Experimental results show that our proposed model achieves state-of-the-art on three nested NER corpora: ACE 2004, ACE 2005 and KBP 2017. The code is available at https://github.com/zqtan1024/sequence-to-set.

Download Full-text

A Probabilistic Feature Based Maximum Entropy Model for Chinese Named Entity Recognition

Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead - Lecture Notes in Computer Science ◽

10.1007/11940098_20 ◽

2006 ◽

pp. 189-196 ◽

Cited By ~ 2

Author(s):

Suxiang Zhang ◽

Xiaojie Wang ◽

Juan Wen ◽

Ying Qin ◽

Yixin Zhong

Keyword(s):

Maximum Entropy ◽

Named Entity Recognition ◽

Entity Recognition ◽

Maximum Entropy Model ◽

Entropy Model ◽

Named Entity ◽

Feature Based

Download Full-text

Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00386 ◽

2021 ◽

Vol 9 ◽

pp. 586-604

Author(s):

Abbas Ghaddar ◽

Philippe Langlais ◽

Ahmad Rashid ◽

Mehdi Rezagholizadeh

Keyword(s):

Data Augmentation ◽

State Of The Art ◽

Contextual Information ◽

Named Entity Recognition ◽

Entity Recognition ◽

Context Aware ◽

Named Entity ◽

Feature Based ◽

Adversarial Training ◽

Novel Model

Abstract In this work, we examine the ability of NER models to use contextual information when predicting the type of an ambiguous entity. We introduce NRB, a new testbed carefully designed to diagnose Name Regularity Bias of NER models. Our results indicate that all state-of-the-art models we tested show such a bias; BERT fine-tuned models significantly outperforming feature-based (LSTM-CRF) ones on NRB, despite having comparable (sometimes lower) performance on standard benchmarks. To mitigate this bias, we propose a novel model-agnostic training method that adds learnable adversarial noise to some entity mentions, thus enforcing models to focus more strongly on the contextual signal, leading to significant gains on NRB. Combining it with two other training strategies, data augmentation and parameter freezing, leads to further gains.

Download Full-text

A Detailed Analysis and Improvement of Feature-Based Named Entity Recognition for Turkish

Speech and Computer - Lecture Notes in Computer Science ◽

10.1007/978-3-030-26061-3_2 ◽

2019 ◽

pp. 9-19

Author(s):

Arda Akdemir ◽

Tunga Güngör

Keyword(s):

Detailed Analysis ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Feature Based

Download Full-text

KLOSURE: Closing in on open–ended patient questionnaires with text mining

Journal of Biomedical Semantics ◽

10.1186/s13326-019-0215-3 ◽

2019 ◽

Vol 10 (S1) ◽

Cited By ~ 3

Author(s):

Irena Spasić ◽

David Owen ◽

Andrew Smith ◽

Kate Button

Keyword(s):

Feature Extraction ◽

Text Mining ◽

Clinical Decision Making ◽

Named Entity Recognition ◽

Clinical Decision ◽

Entity Recognition ◽

Free Text ◽

Feature Vectors ◽

Patient Questionnaires

Abstract Background Knee injury and Osteoarthritis Outcome Score (KOOS) is an instrument used to quantify patients’ perceptions about their knee condition and associated problems. It is administered as a 42-item closed-ended questionnaire in which patients are asked to self-assess five outcomes: pain, other symptoms, activities of daily living, sport and recreation activities, and quality of life. We developed KLOG as a 10-item open-ended version of the KOOS questionnaire in an attempt to obtain deeper insight into patients’ opinions including their unmet needs. However, the open–ended nature of the questionnaire incurs analytical overhead associated with the interpretation of responses. The goal of this study was to automate such analysis. We implemented KLOSURE as a system for mining free–text responses to the KLOG questionnaire. It consists of two subsystems, one concerned with feature extraction and the other one concerned with classification of feature vectors. Feature extraction is performed by a set of four modules whose main functionalities are linguistic pre-processing, sentiment analysis, named entity recognition and lexicon lookup respectively. Outputs produced by each module are combined into feature vectors. The structure of feature vectors will vary across the KLOG questions. Finally, Weka, a machine learning workbench, was used for classification of feature vectors. Results The precision of the system varied between 62.8 and 95.3%, whereas the recall varied from 58.3 to 87.6% across the 10 questions. The overall performance in terms of F–measure varied between 59.0 and 91.3% with an average of 74.4% and a standard deviation of 8.8. Conclusions We demonstrated the feasibility of mining open-ended patient questionnaires. By automatically mapping free text answers onto a Likert scale, we can effectively measure the progress of rehabilitation over time. In comparison to traditional closed-ended questionnaires, our approach offers much richer information that can be utilised to support clinical decision making. In conclusion, we demonstrated how text mining can be used to combine the benefits of qualitative and quantitative analysis of patient experiences.

Download Full-text

Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2019.103252 ◽

2019 ◽

Vol 96 ◽

pp. 103252 ◽

Cited By ~ 11

Author(s):

Yao Chen ◽

Changjiang Zhou ◽

Tianxin Li ◽

Hong Wu ◽

Xia Zhao ◽

...

Keyword(s):

Adverse Drug Event ◽

Named Entity Recognition ◽

Entity Recognition ◽

Drug Event ◽

Named Entity ◽

Lexical Feature ◽

Feature Based

Download Full-text

Improving Feature Extraction in Named Entity Recognition Based on Maximum Entropy Model

2006 International Conference on Machine Learning and Cybernetics ◽

10.1109/icmlc.2006.258916 ◽

2006 ◽

Cited By ~ 3

Author(s):

Wei Jiang ◽

Yi Guan ◽

Xiao-long Wang

Keyword(s):

Feature Extraction ◽

Maximum Entropy ◽

Named Entity Recognition ◽

Entity Recognition ◽

Maximum Entropy Model ◽

Entropy Model ◽

Named Entity

Download Full-text

A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Evaluation Campaign

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/34/4/13163 ◽

2019 ◽

Vol 34 (4) ◽

pp. 311-321 ◽

Cited By ~ 1

Author(s):

Minh Quang Nhat Pham

Keyword(s):

Named Entity Recognition ◽

Recognition System ◽

Word Embedding ◽

Entity Recognition ◽

Shape Features ◽

Encoding Scheme ◽

Named Entity ◽

Sequence Labeling ◽

Feature Based ◽

Word Shape

In this report, we describe our participant named-entity recognition system at VLSP 2018 evaluation campaign. We formalized the task as a sequence labeling problem using BIO encoding scheme. We applied a feature-based model which combines word, word-shape features, Brown-cluster-based features, and word-embedding-based features. We compare several methods to deal with nested entities in the dataset. We showed that combining tags of entities at all levels for training a sequence labeling model (joint-tag model) improved the accuracy of nested named-entity recognition.

Download Full-text