scholarly journals Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning

2021 ◽  
Vol 21 (S2) ◽  
Author(s):  
Kun Zeng ◽  
Yibin Xu ◽  
Ge Lin ◽  
Likeng Liang ◽  
Tianyong Hao

Abstract Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.

2008 ◽  
Vol 18 (1) ◽  
pp. 123-138 ◽  
Author(s):  
Milos Radovanovic ◽  
Mirjana Ivanovic

Motivated by applying Text Categorization to classification of Web search results, this paper describes an extensive experimental study of the impact of bag-of- words document representations on the performance of five major classifiers - Na?ve Bayes, SVM, Voted Perceptron, kNN and C4.5. The texts, representing short Web-page descriptions sorted into a large hierarchy of topics, are taken from the dmoz Open Directory Web-page ontology, and classifiers are trained to automatically determine the topics which may be relevant to a previously unseen Web-page. Different transformations of input data: stemming, normalization, logtf and idf, together with dimensionality reduction, are found to have a statistically significant improving or degrading effect on classification performance measured by classical metrics - accuracy, precision, recall, F1 and F2. The emphasis of the study is not on determining the best document representation which corresponds to each classifier, but rather on describing the effects of every individual transformation on classification, together with their mutual relationships. .


2020 ◽  
Author(s):  
Eudocia Q Lee ◽  
Michael Weller ◽  
Joohee Sul ◽  
Stephen J Bagley ◽  
Solmaz Sahebjam ◽  
...  

Abstract Building on an initiative to enhance clinical trial participation involving the Society for Neuro-Oncology, the Response Assessment in Neuro-Oncology Working Group, patient advocacy groups, clinical trial cooperative groups, and other partners, we evaluate the impact of eligibility criteria and trial conduct on neuro-oncology clinical trial participation. Clinical trials often carry forward eligibility criteria from prior studies that may be overly restrictive and unnecessary and needlessly limit patient accrual. Inclusion and exclusion criteria should be evaluated based on the goals and design of the study and whether they impact patient safety and/or treatment efficacy. In addition, we evaluate clinical trial conduct as a barrier to accrual and discuss strategies to minimize such barriers for neuro-oncology trials.


Sensors ◽  
2019 ◽  
Vol 20 (1) ◽  
pp. 98 ◽  
Author(s):  
Krzysztof Kamycki ◽  
Tomasz Kapuscinski ◽  
Mariusz Oszust

In this paper, a novel data augmentation method for time-series classification is proposed. In the introduced method, a new time-series is obtained in warped space between suboptimally aligned input examples of different lengths. Specifically, the alignment is carried out constraining the warping path and reducing its flexibility. It is shown that the resultant synthetic time-series can form new class boundaries and enrich the training dataset. In this work, the comparative evaluation of the proposed augmentation method against related techniques on representative multivariate time-series datasets is presented. The performance of methods is examined using the nearest neighbor classifier with the dynamic time warping (NN-DTW), LogDet divergence-based metric learning with triplet constraints (LDMLT), and the recently introduced time-series cluster kernel (NN-TCK). The impact of the augmentation on the classification performance is investigated, taking into account entire datasets and cases with a small number of training examples. The extensive evaluation reveals that the introduced method outperforms related augmentation algorithms in terms of the obtained classification accuracy.


2021 ◽  
Vol 0 (0) ◽  
pp. 0-0
Author(s):  
Lauren S. Singer ◽  
Alexander Z. Feldman ◽  
Robin A. Buerki ◽  
Craig M. Horbinski ◽  
Rimas V. Lukas ◽  
...  

2021 ◽  
Author(s):  
Jiali Nie ◽  
Wenke Tan ◽  
Houmei Zhang

Abstract Automatic modulation classification (AMC) plays an increasingly vital role in cognitive radio (CR), cognitive electronic warfare, and other areas. It aims at classifying the modulated modes of the received signals accurately and provides a guarantee for the subsequent detailed parameter identification. Deep learning (DL) methods allow the computer to automatically learn the pattern features and integrate features into the process of building the model, thereby reducing the incompleteness caused by artificial design features. At the same time, the DL methods have been applied in the AMC field as its powerful ability to process complex data and have achieved excellent performance in recent years. In this paper, we propose a deep ensemble learning AMC network, which uses a multi-model ensemble method to fuse multiple DL features. Specifically, different DL models are integrated by ensemble learning, which enhances the learning ability of the single model. With the proposed ensemble model trained on a measured wireless signal dataset, we conclude that the ensemble structure of Inception and CLDNN can fuse spatial features and temporal features, and achieve state-of-the-art performance in AMC tasks. Besides, the impact of the inphase/quadrature (I/Q) sample-length on wireless signals is further investigated, and find that the classification accuracy of the deep ensemble model is improved by 0.7% to 10% compared to the single model under various sample-length. Simultaneously, we visualize convergence clustering with t-distributed stochastic neighbor embedding (t-SNE), and the visualization results prove that the deep ensemble model has a stronger clustering ability than a single model.


10.2196/17832 ◽  
2020 ◽  
Vol 8 (7) ◽  
pp. e17832
Author(s):  
Kun Zeng ◽  
Zhiwei Pan ◽  
Yibin Xu ◽  
Yingying Qu

Background Eligibility criteria are the main strategy for screening appropriate participants for clinical trials. Automatic analysis of clinical trial eligibility criteria by digital screening, leveraging natural language processing techniques, can improve recruitment efficiency and reduce the costs involved in promoting clinical research. Objective We aimed to create a natural language processing model to automatically classify clinical trial eligibility criteria. Methods We proposed a classifier for short text eligibility criteria based on ensemble learning, where a set of pretrained models was integrated. The pretrained models included state-of-the-art deep learning methods for training and classification, including Bidirectional Encoder Representations from Transformers (BERT), XLNet, and A Robustly Optimized BERT Pretraining Approach (RoBERTa). The classification results by the integrated models were combined as new features for training a Light Gradient Boosting Machine (LightGBM) model for eligibility criteria classification. Results Our proposed method obtained an accuracy of 0.846, a precision of 0.803, and a recall of 0.817 on a standard data set from a shared task of an international conference. The macro F1 value was 0.807, outperforming the state-of-the-art baseline methods on the shared task. Conclusions We designed a model for screening short text classification criteria for clinical trials based on multimodel ensemble learning. Through experiments, we concluded that performance was improved significantly with a model ensemble compared to a single model. The introduction of focal loss could reduce the impact of class imbalance to achieve better performance.


2020 ◽  
Vol 6 (4) ◽  
pp. 24
Author(s):  
Michalis Giannopoulos ◽  
Anastasia Aidini ◽  
Anastasia Pentari ◽  
Konstantina Fotiadou ◽  
Panagiotis Tsakalides

Multispectral sensors constitute a core Earth observation image technology generating massive high-dimensional observations. To address the communication and storage constraints of remote sensing platforms, lossy data compression becomes necessary, but it unavoidably introduces unwanted artifacts. In this work, we consider the encoding of multispectral observations into high-order tensor structures which can naturally capture multi-dimensional dependencies and correlations, and we propose a resource-efficient compression scheme based on quantized low-rank tensor completion. The proposed method is also applicable to the case of missing observations due to environmental conditions, such as cloud cover. To quantify the performance of compression, we consider both typical image quality metrics as well as the impact on state-of-the-art deep learning-based land-cover classification schemes. Experimental analysis on observations from the ESA Sentinel-2 satellite reveals that even minimal compression can have negative effects on classification performance which can be efficiently addressed by our proposed recovery scheme.


2017 ◽  
Vol 24 (4) ◽  
pp. 781-787 ◽  
Author(s):  
Kevin Zhang ◽  
Dina Demner-Fushman

Abstract Objective:To develop automated classification methods for eligibility criteria in ClinicalTrials.gov to facilitate patient-trial matching for specific populations such as persons living with HIV or pregnant women. Materials and Methods:We annotated 891 interventional cancer trials from ClinicalTrials.gov based on their eligibility for human immunodeficiency virus (HIV)-positive patients using their eligibility criteria. These annotations were used to develop classifiers based on regular expressions and machine learning (ML). After evaluating classification of cancer trials for eligibility of HIV-positive patients, we sought to evaluate the generalizability of our approach to more general diseases and conditions. We annotated the eligibility criteria for 1570 of the most recent interventional trials from ClinicalTrials.gov for HIV-positive and pregnancy eligibility, and the classifiers were retrained and reevaluated using these data. Results:On the cancer-HIV dataset, the baseline regex model, the bag-of-words ML classifier, and the ML classifier with named entity recognition (NER) achieved macro-averaged F2 scores of 0.77, 0.87, and 0.87, respectively; the addition of NER did not result in a significant performance improvement. On the general dataset, ML + NER achieved macro-averaged F2 scores of 0.91 and 0.85 for HIV and pregnancy, respectively. Discussion and Conclusion:The eligibility status of specific patient populations, such as persons living with HIV and pregnant women, for clinical trials is of interest to both patients and clinicians. We show that it is feasible to develop a high-performing, automated trial classification system for eligibility status that can be integrated into consumer-facing search engines as well as patient-trial matching systems.


Sign in / Sign up

Export Citation Format

Share Document