Classification of Multi-class Microarray Cancer Data Using Ensemble Learning Method

Author(s):  
B. H. Shekar ◽  
Guesh Dagnew
2021 ◽  
Vol 13 (19) ◽  
pp. 3945
Author(s):  
Bin Wang ◽  
Linghui Xia ◽  
Dongmei Song ◽  
Zhongwei Li ◽  
Ning Wang

Sea ice information in the Arctic region is essential for climatic change monitoring and ship navigation. Although many sea ice classification methods have been put forward, the accuracy and usability of classification systems can still be improved. In this paper, a two-round weight voting strategy-based ensemble learning method is proposed for refining sea ice classification. The proposed method includes three main steps. (1) The preferable features of sea ice are constituted by polarization features (HH, HV, HH/HV) and the top six GLCM-derived texture features via a random forest. (2) The initial classification maps can then be generated by an ensemble learning method, which includes six base classifiers (NB, DT, KNN, LR, ANN, and SVM). The tuned voting weights by a genetic algorithm are employed to obtain the category score matrix and, further, the first coarse classification result. (3) Some pixels may be misclassified due to their corresponding numerically close score value. By introducing an experiential score threshold, each pixel is identified as a fuzzy or an explicit pixel. The fuzzy pixels can then be further rectified based on the local similarity of the neighboring explicit pixels, thereby yielding the final precise classification result. The proposed method was examined on 18 Sentinel-1 EW images, which were captured in the Northeast Passage from November 2019 to April 2020. The experiments show that the proposed method can effectively maintain the edge profile of sea ice and restrain noise from SAR. It is superior to the current mainstream ensemble learning algorithms with the overall accuracy reaching 97%. The main contribution of this study is proposing a superior weight voting strategy in the ensemble learning method for sea ice classification of Sentinel-1 imagery, which is of great significance for guiding secure ship navigation and ice hazard forecasting in winter.


Author(s):  
Adem Doganer

In this study, different models were created to reduce bias by ensemble learning methods. Reducing the bias error will improve the classification performance. In order to increase the classification performance, the most appropriate ensemble learning method and ideal sample size were investigated. Bias values and learning performances of different ensemble learning methods were compared. AdaBoost ensemble learning method provided the lowest bias value with n: 250 sample size while Stacking ensemble learning method provided the lowest bias value with n: 500, n: 750, n: 1000, n: 2000, n: 4000, n: 6000, n: 8000, n: 10000, and n: 20000 sample sizes. When the learning performances were compared, AdaBoost ensemble learning method and RBF classifier achieved the best performance with n: 250 sample size (ACC = 0.956, AUC: 0.987). The AdaBoost ensemble learning method and REPTree classifier achieved the best performance with n: 20000 sample size (ACC = 0.990, AUC = 0.999). In conclusion, for reduction of bias, methods based on stacking displayed a higher performance compared to other methods.


2021 ◽  
Vol 21 (S2) ◽  
Author(s):  
Kun Zeng ◽  
Yibin Xu ◽  
Ge Lin ◽  
Likeng Liang ◽  
Tianyong Hao

Abstract Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Xianfang Tang ◽  
Lijun Cai ◽  
Yajie Meng ◽  
Changlong Gu ◽  
Jialiang Yang ◽  
...  

2021 ◽  
pp. 107949
Author(s):  
Yifan Fan ◽  
Xiaotian Ding ◽  
Jindong Wu ◽  
Jian Ge ◽  
Yuguo Li

Sign in / Sign up

Export Citation Format

Share Document