Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning

Abstract Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.

Download Full-text

Document representations for classification of short web-page descriptions

Yugoslav journal of operations research ◽

10.2298/yjor0801123r ◽

2008 ◽

Vol 18 (1) ◽

pp. 123-138 ◽

Cited By ~ 1

Author(s):

Milos Radovanovic ◽

Mirjana Ivanovic

Keyword(s):

Text Categorization ◽

Web Search ◽

Classification Performance ◽

Document Representation ◽

Web Page ◽

Ve Bayes ◽

Large Hierarchy ◽

The Impact ◽

Document Representations

Motivated by applying Text Categorization to classification of Web search results, this paper describes an extensive experimental study of the impact of bag-of- words document representations on the performance of five major classifiers - Na?ve Bayes, SVM, Voted Perceptron, kNN and C4.5. The texts, representing short Web-page descriptions sorted into a large hierarchy of topics, are taken from the dmoz Open Directory Web-page ontology, and classifiers are trained to automatically determine the topics which may be relevant to a previously unseen Web-page. Different transformations of input data: stemming, normalization, logtf and idf, together with dimensionality reduction, are found to have a statistically significant improving or degrading effect on classification performance measured by classical metrics - accuracy, precision, recall, F1 and F2. The emphasis of the study is not on determining the best document representation which corresponds to each classifier, but rather on describing the effects of every individual transformation on classification, together with their mutual relationships. .

Download Full-text

Optimizing eligibility criteria and clinical trial conduct to enhance clinical trial participation for primary brain tumor patients

Neuro-Oncology ◽

10.1093/neuonc/noaa015 ◽

2020 ◽

Author(s):

Eudocia Q Lee ◽

Michael Weller ◽

Joohee Sul ◽

Stephen J Bagley ◽

Solmaz Sahebjam ◽

...

Keyword(s):

Clinical Trial ◽

Response Assessment ◽

Primary Brain Tumor ◽

Trial Participation ◽

Cooperative Groups ◽

Clinical Trial Participation ◽

Eligibility Criteria ◽

Trial Conduct ◽

The Impact ◽

Clinical Trial Conduct

Abstract Building on an initiative to enhance clinical trial participation involving the Society for Neuro-Oncology, the Response Assessment in Neuro-Oncology Working Group, patient advocacy groups, clinical trial cooperative groups, and other partners, we evaluate the impact of eligibility criteria and trial conduct on neuro-oncology clinical trial participation. Clinical trials often carry forward eligibility criteria from prior studies that may be overly restrictive and unnecessary and needlessly limit patient accrual. Inclusion and exclusion criteria should be evaluated based on the goals and design of the study and whether they impact patient safety and/or treatment efficacy. In addition, we evaluate clinical trial conduct as a barrier to accrual and discuss strategies to minimize such barriers for neuro-oncology trials.

Download Full-text

Data Augmentation with Suboptimal Warping for Time-Series Classification

Sensors ◽

10.3390/s20010098 ◽

2019 ◽

Vol 20 (1) ◽

pp. 98 ◽

Cited By ~ 3

Author(s):

Krzysztof Kamycki ◽

Tomasz Kapuscinski ◽

Mariusz Oszust

Keyword(s):

Time Series ◽

Data Augmentation ◽

Nearest Neighbor ◽

Multivariate Time Series ◽

Metric Learning ◽

Classification Performance ◽

Training Dataset ◽

Time Series Classification ◽

Extensive Evaluation ◽

The Impact

In this paper, a novel data augmentation method for time-series classification is proposed. In the introduced method, a new time-series is obtained in warped space between suboptimally aligned input examples of different lengths. Specifically, the alignment is carried out constraining the warping path and reducing its flexibility. It is shown that the resultant synthetic time-series can form new class boundaries and enrich the training dataset. In this work, the comparative evaluation of the proposed augmentation method against related techniques on representative multivariate time-series datasets is presented. The performance of methods is examined using the nearest neighbor classifier with the dynamic time warping (NN-DTW), LogDet divergence-based metric learning with triplet constraints (LDMLT), and the recently introduced time-series cluster kernel (NN-TCK). The impact of the augmentation on the classification performance is investigated, taking into account entire datasets and cases with a small number of training examples. The extensive evaluation reveals that the introduced method outperforms related augmentation algorithms in terms of the obtained classification accuracy.

Download Full-text

Automated classification of fauna in seabed photographs: the impact of training and validation dataset size, with considerations for the class imbalance

Progress In Oceanography ◽

10.1016/j.pocean.2021.102612 ◽

2021 ◽

pp. 102612

Author(s):

Jennifer M. Durden ◽

Brett Hosking ◽

Brian J. Bett ◽

Danelle Cline ◽

Henry A. Ruhl

Keyword(s):

Class Imbalance ◽

Validation Dataset ◽

Automated Classification ◽

Dataset Size ◽

The Impact

Download Full-text

The impact of the molecular classification of glioblastoma on the interpretation of therapeutic clinical trial results

Chinese Clinical Oncology ◽

10.21037/cco-21-33 ◽

2021 ◽

Vol 0 (0) ◽

pp. 0-0

Author(s):

Lauren S. Singer ◽

Alexander Z. Feldman ◽

Robin A. Buerki ◽

Craig M. Horbinski ◽

Rimas V. Lukas ◽

...

Keyword(s):

Clinical Trial ◽

Molecular Classification ◽

Clinical Trial Results ◽

The Impact

Download Full-text

Deep Ensemble Learning for Automatic Modulation Classification

10.21203/rs.3.rs-927161/v1 ◽

2021 ◽

Author(s):

Jiali Nie ◽

Wenke Tan ◽

Houmei Zhang

Keyword(s):

Ensemble Learning ◽

Vital Role ◽

Learning Ability ◽

Sample Length ◽

Complex Data ◽

Ensemble Model ◽

Modulation Classification ◽

Single Model ◽

Automatic Modulation Classification ◽

The Impact

Abstract Automatic modulation classification (AMC) plays an increasingly vital role in cognitive radio (CR), cognitive electronic warfare, and other areas. It aims at classifying the modulated modes of the received signals accurately and provides a guarantee for the subsequent detailed parameter identification. Deep learning (DL) methods allow the computer to automatically learn the pattern features and integrate features into the process of building the model, thereby reducing the incompleteness caused by artificial design features. At the same time, the DL methods have been applied in the AMC field as its powerful ability to process complex data and have achieved excellent performance in recent years. In this paper, we propose a deep ensemble learning AMC network, which uses a multi-model ensemble method to fuse multiple DL features. Specifically, different DL models are integrated by ensemble learning, which enhances the learning ability of the single model. With the proposed ensemble model trained on a measured wireless signal dataset, we conclude that the ensemble structure of Inception and CLDNN can fuse spatial features and temporal features, and achieve state-of-the-art performance in AMC tasks. Besides, the impact of the inphase/quadrature (I/Q) sample-length on wireless signals is further investigated, and find that the classification accuracy of the deep ensemble model is improved by 0.7% to 10% compared to the single model under various sample-length. Simultaneously, we visualize convergence clustering with t-distributed stochastic neighbor embedding (t-SNE), and the visualization results prove that the deep ensemble model has a stronger clustering ability than a single model.

Download Full-text

An Ensemble Learning Strategy for Eligibility Criteria Text Classification for Clinical Trial Recruitment: Algorithm Development and Validation

JMIR Medical Informatics ◽

10.2196/17832 ◽

2020 ◽

Vol 8 (7) ◽

pp. e17832

Author(s):

Kun Zeng ◽

Zhiwei Pan ◽

Yibin Xu ◽

Yingying Qu

Keyword(s):

Clinical Trial ◽

Clinical Trials ◽

Natural Language Processing ◽

Ensemble Learning ◽

Language Processing ◽

Text Classification ◽

State Of The Art ◽

Shared Task ◽

Eligibility Criteria ◽

Short Text

Background Eligibility criteria are the main strategy for screening appropriate participants for clinical trials. Automatic analysis of clinical trial eligibility criteria by digital screening, leveraging natural language processing techniques, can improve recruitment efficiency and reduce the costs involved in promoting clinical research. Objective We aimed to create a natural language processing model to automatically classify clinical trial eligibility criteria. Methods We proposed a classifier for short text eligibility criteria based on ensemble learning, where a set of pretrained models was integrated. The pretrained models included state-of-the-art deep learning methods for training and classification, including Bidirectional Encoder Representations from Transformers (BERT), XLNet, and A Robustly Optimized BERT Pretraining Approach (RoBERTa). The classification results by the integrated models were combined as new features for training a Light Gradient Boosting Machine (LightGBM) model for eligibility criteria classification. Results Our proposed method obtained an accuracy of 0.846, a precision of 0.803, and a recall of 0.817 on a standard data set from a shared task of an international conference. The macro F1 value was 0.807, outperforming the state-of-the-art baseline methods on the shared task. Conclusions We designed a model for screening short text classification criteria for clinical trials based on multimodel ensemble learning. Through experiments, we concluded that performance was improved significantly with a model ensemble compared to a single model. The introduction of focal loss could reduce the impact of class imbalance to achieve better performance.

Download Full-text

Classification of Compressed Remote Sensing Multispectral Images via Convolutional Neural Networks

Journal of Imaging ◽

10.3390/jimaging6040024 ◽

2020 ◽

Vol 6 (4) ◽

pp. 24

Author(s):

Michalis Giannopoulos ◽

Anastasia Aidini ◽

Anastasia Pentari ◽

Konstantina Fotiadou ◽

Panagiotis Tsakalides

Keyword(s):

Remote Sensing ◽

Classification Performance ◽

Low Rank ◽

Multispectral Images ◽

Negative Effects ◽

Compression Scheme ◽

Sensing Platforms ◽

The Impact ◽

And Storage

Multispectral sensors constitute a core Earth observation image technology generating massive high-dimensional observations. To address the communication and storage constraints of remote sensing platforms, lossy data compression becomes necessary, but it unavoidably introduces unwanted artifacts. In this work, we consider the encoding of multispectral observations into high-order tensor structures which can naturally capture multi-dimensional dependencies and correlations, and we propose a resource-efficient compression scheme based on quantized low-rank tensor completion. The proposed method is also applicable to the case of missing observations due to environmental conditions, such as cloud cover. To quantify the performance of compression, we consider both typical image quality metrics as well as the impact on state-of-the-art deep learning-based land-cover classification schemes. Experimental analysis on observations from the ESA Sentinel-2 satellite reveals that even minimal compression can have negative effects on classification performance which can be efficiently addressed by our proposed recovery scheme.

Download Full-text

Automated classification of eligibility criteria in clinical trials to facilitate patient-trial matching for specific patient populations

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocw176 ◽

2017 ◽

Vol 24 (4) ◽

pp. 781-787 ◽

Cited By ~ 7

Author(s):

Kevin Zhang ◽

Dina Demner-Fushman

Keyword(s):

Clinical Trials ◽

Pregnant Women ◽

Hiv Positive ◽

Automated Classification ◽

Eligibility Criteria ◽

Specific Patient ◽

Persons Living With Hiv ◽

Cancer Trials ◽

Living With Hiv

Abstract Objective:To develop automated classification methods for eligibility criteria in ClinicalTrials.gov to facilitate patient-trial matching for specific populations such as persons living with HIV or pregnant women. Materials and Methods:We annotated 891 interventional cancer trials from ClinicalTrials.gov based on their eligibility for human immunodeficiency virus (HIV)-positive patients using their eligibility criteria. These annotations were used to develop classifiers based on regular expressions and machine learning (ML). After evaluating classification of cancer trials for eligibility of HIV-positive patients, we sought to evaluate the generalizability of our approach to more general diseases and conditions. We annotated the eligibility criteria for 1570 of the most recent interventional trials from ClinicalTrials.gov for HIV-positive and pregnancy eligibility, and the classifiers were retrained and reevaluated using these data. Results:On the cancer-HIV dataset, the baseline regex model, the bag-of-words ML classifier, and the ML classifier with named entity recognition (NER) achieved macro-averaged F2 scores of 0.77, 0.87, and 0.87, respectively; the addition of NER did not result in a significant performance improvement. On the general dataset, ML + NER achieved macro-averaged F2 scores of 0.91 and 0.85 for HIV and pregnancy, respectively. Discussion and Conclusion:The eligibility status of specific patient populations, such as persons living with HIV and pregnant women, for clinical trials is of interest to both patients and clinicians. We show that it is feasible to develop a high-performing, automated trial classification system for eligibility status that can be integrated into consumer-facing search engines as well as patient-trial matching systems.

Download Full-text

Statistical analysis of the impact of distortion (correction) on an automated classification of celiac disease

2011 17th International Conference on Digital Signal Processing (DSP) ◽

10.1109/icdsp.2011.6004900 ◽

2011 ◽

Cited By ~ 12

Author(s):

M. Liedlgruber ◽

A. Uhl ◽

A. Vecsei

Keyword(s):

Celiac Disease ◽

Statistical Analysis ◽

Distortion Correction ◽

Automated Classification ◽

The Impact

Download Full-text