Ensemble learning‐based classification of microarray cancer data on tree‐based features

Guesh Dagnew; B.H. Shekar

doi:10.1049/ccs2.12003

Classification of Multi-class Microarray Cancer Data Using Ensemble Learning Method

Data Analytics and Learning - Lecture Notes in Networks and Systems ◽

10.1007/978-981-13-2514-4_24 ◽

2018 ◽

pp. 279-292

Author(s):

B. H. Shekar ◽

Guesh Dagnew

Keyword(s):

Ensemble Learning ◽

Learning Method ◽

Cancer Data

Download Full-text

Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01492-z ◽

2021 ◽

Vol 21 (S2) ◽

Author(s):

Kun Zeng ◽

Yibin Xu ◽

Ge Lin ◽

Likeng Liang ◽

Tianyong Hao

Keyword(s):

Clinical Trial ◽

Ensemble Learning ◽

Metric Learning ◽

Classification Performance ◽

Ensemble Model ◽

Automated Classification ◽

Eligibility Criteria ◽

Data Imbalance ◽

The Impact

Abstract Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.

Download Full-text

A Novel Hybrid Feature Selection and Ensemble Learning Framework for Unbalanced Cancer Data Diagnosis with Transcriptome and Functional Proteomic

IEEE Access ◽

10.1109/access.2021.3070428 ◽

2021 ◽

pp. 1-1

Author(s):

Xianfang Tang ◽

Lijun Cai ◽

Yajie Meng ◽

Changlong Gu ◽

Jialiang Yang ◽

...

Keyword(s):

Feature Selection ◽

Ensemble Learning ◽

Cancer Data ◽

Learning Framework

Download Full-text

Multiclass classification of leukemia cancer data using Fuzzy Support Vector Machine (FSVM) with feature selection using Principal Component Analysis (PCA)

Journal of Physics Conference Series ◽

10.1088/1742-6596/1725/1/012012 ◽

2021 ◽

Vol 1725 ◽

pp. 012012

Author(s):

I R Fauzi ◽

Z Rustam ◽

A Wibowo

Keyword(s):

Principal Component Analysis ◽

Support Vector Machine ◽

Feature Selection ◽

Principal Component ◽

Component Analysis ◽

Multiclass Classification ◽

Support Vector ◽

Fuzzy Support Vector Machine ◽

Cancer Data

Download Full-text

A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data

International Journal on Soft Computing Artificial Intelligence and Applications ◽

10.5121/ijscai.2016.5301 ◽

2016 ◽

Vol 5 (2/3) ◽

pp. 01-21 ◽

Cited By ~ 1

Author(s):

Doreswamy ◽

Umme Salma M

Keyword(s):

Breast Cancer ◽

Breast Cancer Data ◽

Cancer Data

Download Full-text

Benchmarking joint multi-omics dimensionality reduction approaches for cancer study

10.1101/2020.01.14.905760 ◽

2020 ◽

Cited By ~ 3

Author(s):

Laura Cantini ◽

Pooya Zakeri ◽

Celine Hernandez ◽

Aurelien Naldi ◽

Denis Thieffry ◽

...

Keyword(s):

Dimensionality Reduction ◽

Ground Truth ◽

Systematic Evaluation ◽

Omics Data ◽

Biological Processes ◽

Cancer Data ◽

Practical Guidelines ◽

Cell Data ◽

Omics Data Integration

AbstractHigh-dimensional multi-omics data are now standard in biology. They can greatly enhance our understanding of biological systems when effectively integrated. To achieve this multi-omics data integration, Joint Dimensionality Reduction (jDR) methods are among the most efficient approaches. However, several jDR methods are available, urging the need for a comprehensive benchmark with practical guidelines.We performed a systematic evaluation of nine representative jDR methods using three complementary benchmarks. First, we evaluated their performances in retrieving ground-truth sample clustering from simulated multi-omics datasets. Second, we used TCGA cancer data to assess their strengths in predicting survival, clinical annotations and known pathways/biological processes. Finally, we assessed their classification of multi-omics single-cell data.From these in-depth comparisons, we observed that intNMF performs best in clustering, while MCIA offers a consistent and effective behavior across many contexts. The full code of this benchmark is implemented in a Jupyter notebook - multi-omics mix (momix) - to foster reproducibility, and support data producers, users and future developers.

Download Full-text

Ensemble learning based classifier to predict depression caused due to pandemic

Journal of Physics Conference Series ◽

10.1088/1742-6596/2089/1/012026 ◽

2021 ◽

Vol 2089 (1) ◽

pp. 012026

Author(s):

P Vaishali ◽

P L S Kumari

Keyword(s):

Neural Network ◽

Ensemble Learning ◽

Text Classification ◽

Virus Disease ◽

Model Accuracy ◽

Research Article ◽

The World ◽

Social Gathering ◽

Second Wave

Abstract Pandemic caused due to Corona Virus Disease 2019 (COVID-19) affected each and every person life throughout the world. First wave of COVID-19 followed by second wave made situation more panic. Government declared Lockdown imposed strict prohibition on social gathering, unnecessary outing, travelling, and education. During home quarantine, people shared opinion, expressed views, feelings on social media. Home isolation and quarantine affected mental health of people which may lead to depression. Hence in this research article depression is predicted by implementing Neural Network based model. At first level this model implements text classification of COVID-19 based Tweets. Neural network model accuracy is 86.85%. In next level, using same tweet dataset as input, Ensemble learning based model is constructed. This model uses one of the boosting techniques known as Adaboost. Model is executed by varying Train-test-validation ratio. It is observed that accuracy of the model is improved. The model showed accuracy of 99.33 % successfully in every execution. Obtained results are compared with previous work in same area.

Download Full-text