scholarly journals TWO-WAY METRIC LEARNING WITH MAJORITY AND MINORITY SUBSETS FOR CLASSIFICATION OF LARGE EXTREMELY IMBALANCED FACE DATASET

Author(s):  
Ashu Kaushik ◽  
Seba Susan
Keyword(s):  
2021 ◽  
Vol 21 (S2) ◽  
Author(s):  
Kun Zeng ◽  
Yibin Xu ◽  
Ge Lin ◽  
Likeng Liang ◽  
Tianyong Hao

Abstract Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.


2020 ◽  
Vol 12 (10) ◽  
pp. 1593
Author(s):  
Hongying Liu ◽  
Ruyi Luo ◽  
Fanhua Shang ◽  
Xuechun Meng ◽  
Shuiping Gou ◽  
...  

Recently, classification methods based on deep learning have attained sound results for the classification of Polarimetric synthetic aperture radar (PolSAR) data. However, they generally require a great deal of labeled data to train their models, which limits their potential real-world applications. This paper proposes a novel semi-supervised deep metric learning network (SSDMLN) for feature learning and classification of PolSAR data. Inspired by distance metric learning, we construct a network, which transforms the linear mapping of metric learning into the non-linear projection in the layer-by-layer learning. With the prior knowledge of the sample categories, the network also learns a distance metric under which all pairs of similarly labeled samples are closer and dissimilar samples have larger relative distances. Moreover, we introduce a new manifold regularization to reduce the distance between neighboring samples since they are more likely to be homogeneous. The categorizing is achieved by using a simple classifier. Several experiments on both synthetic and real-world PolSAR data from different sensors are conducted and they demonstrate the effectiveness of SSDMLN with limited labeled samples, and SSDMLN is superior to state-of-the-art methods.


2021 ◽  
Vol 11 (15) ◽  
pp. 6959
Author(s):  
Zaky Dzulfikri ◽  
Pin-Wei Su ◽  
Chih-Yung Huang

Stamping processes remain crucial in manufacturing processes; therefore, diagnosing the condition of stamping tools is critical. One of the challenges in diagnosing stamping tool conditions is that traditionally, the tools need to be visually checked, and the production processes thus need to be halted. With the development of Industry 4.0, intelligent monitoring systems have been developed by using accelerometers and algorithms to diagnose the wear classification of stamping tools. Although several deep learning models such as the convolutional neural network (CNN), auto encoder (AE), and recurrent neural network (RNN) models have demonstrated promising results for classifying complex signals including accelerometer signals, the practicality of those methods are restricted due to the flexibility of adding new classes and low accuracy when faced to low numbers of samples per class. In this study, we applied deep metric learning (DML) methods to overcome these problems. DML involves extracting meaningful features using feature extraction modules to map inputs into embedding features. We compared the probability method, the contrastive method, and a triplet network to determine which method was most suitable for our case. The experimental results revealed that, compared with other models, a triplet network can be more effectively trained with limited training data. The triplet network demonstrated the best test results of the compared methods in the noised test data. Finally, when tested using unseen class, the triplet network and the probability method demonstrated similar results.


2020 ◽  
Author(s):  
Hwejin Jung ◽  
Bogyu Park ◽  
Sangmun Lee ◽  
Seungwoo Hyun ◽  
Jinah Lee ◽  
...  

AbstractIn karyotyping, the classification of chromosomes is a tedious, complicated, and time-consuming process. It requires extremely careful analysis of chromosomes by well-trained cytogeneticists. To assist cytogeneticists in karyotyping, we introduce Proxy-ResNeXt-CBAM which is a metric learning based network using proxies with a convolutional block attention module (CBAM) designed for chromosome classification. RexNeXt-50 is used as a backbone network. To apply metric learning, the fully connected linear layer of the backbone network (ResNeXt-50) is removed and is replaced with CBAM. The similarity between embeddings, which are the outputs of the metric learning network, and proxies are measured for network training.Proxy-ResNeXt-CBAM is validated on a public chromosome image dataset, and it achieves an accuracy of 95.86%, a precision of 95.87%, a recall of 95.9%, and an F-1 score of 95.79%. Proxy-ResNeXt-CBAM which is the metric learning network using proxies outperforms the baseline networks. In addition, the results of our embedding analysis demonstrate the effectiveness of using proxies in metric learning for optimizing deep convolutional neural networks. As the embedding analysis results show, Proxy-ResNeXt-CBAM obtains a 94.78% Recall@1 in image retrieval, and the embeddings of each chromosome are well clustered according to their similarity.


2021 ◽  
Author(s):  
D N S Ravi Kumar ◽  
G T Sundarrajan ◽  
S D Sundarsingh Jebaseelan ◽  
M. Pushpavalli ◽  
A Rameshbabu ◽  
...  

2012 ◽  
Vol 24 (11) ◽  
pp. 2825-2851 ◽  
Author(s):  
Shereen Fouad ◽  
Peter Tino

Many pattern analysis problems require classification of examples into naturally ordered classes. In such cases, nominal classification schemes will ignore the class order relationships, which can have a detrimental effect on classification accuracy. This article introduces two novel ordinal learning vector quantization (LVQ) schemes, with metric learning, specifically designed for classifying data items into ordered classes. In ordinal LVQ, unlike in nominal LVQ, the class order information is used during training in selecting the class prototypes to be adapted, as well as in determining the exact manner in which the prototypes get updated. Prototype-based models in general are more amenable to interpretations and can often be constructed at a smaller computational cost than alternative nonlinear classification models. Experiments demonstrate that the proposed ordinal LVQ formulations compare favorably with their nominal counterparts. Moreover, our methods achieve competitive performance against existing benchmark ordinal regression models.


Sign in / Sign up

Export Citation Format

Share Document