instance hardness Latest Research Papers

Using instance hardness measures in curriculum learning

10.5753/eniac.2021.18251 ◽

2021 ◽

Author(s):

Gustavo H. Nunes ◽

Gustavo O. Martins ◽

Carlos H. Q. Forster ◽

Ana C. Lorena

Keyword(s):

Machine Learning ◽

Convergence Speed ◽

Machine Learning Techniques ◽

Difficulty Level ◽

Learning Techniques ◽

Training Strategies ◽

Instance Hardness ◽

Difficult Cases

Curriculum learning consists of training strategies for machine learning techniques in which the easiest observations are presented first, progressing into more diﬃcult cases as training proceeds. For assembling the curriculum, it is necessary to order the observations a dataset has according to their diﬃculty. This work investigates how instance hardness measures, which can be used to assess the diﬃculty level of each observation in a dataset from diﬀerent perspectives, can be used to assemble a curriculum. Experiments with four CIFAR-100 sub-problems have demonstrated the feasibility of using the instance hardness measures, the main advantage is on convergence speed and some datasets accuracy gains can also be verified.

Download Full-text

Computational identification of multiple lysine PTM sites by analyzing the instance hardness and feature importance

Scientific Reports ◽

10.1038/s41598-021-98458-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sabit Ahmed ◽

Afrida Rahman ◽

Md. Al Mehedi Hasan ◽

Shamim Ahmad ◽

S. M. Shovan

Keyword(s):

Cell Biology ◽

Molecular Mechanisms ◽

Feature Representation ◽

Computational Method ◽

Post Translational Modifications ◽

Redundant Data ◽

Feature Selection Approach ◽

Instance Hardness ◽

User Friendly ◽

Better Than

AbstractIdentification of post-translational modifications (PTM) is significant in the study of computational proteomics, cell biology, pathogenesis, and drug development due to its role in many bio-molecular mechanisms. Though there are several computational tools to identify individual PTMs, only three predictors have been established to predict multiple PTMs at the same lysine residue. Furthermore, detailed analysis and assessment on dataset balancing and the significance of different feature encoding techniques for a suitable multi-PTM prediction model are still lacking. This study introduces a computational method named ’iMul-kSite’ for predicting acetylation, crotonylation, methylation, succinylation, and glutarylation, from an unrecognized peptide sample with one, multiple, or no modifications. After successfully eliminating the redundant data samples from the majority class by analyzing the hardness of the sequence-coupling information, feature representation has been optimized by adopting the combination of ANOVA F-Test and incremental feature selection approach. The proposed predictor predicts multi-label PTM sites with 92.83% accuracy using the top 100 features. It has also achieved a 93.36% aiming rate and 96.23% coverage rate, which are much better than the existing state-of-the-art predictors on the validation test. This performance indicates that ’iMul-kSite’ can be used as a supportive tool for further K-PTM study. For the convenience of the experimental scientists, ’iMul-kSite’ has been deployed as a user-friendly web-server at http://103.99.176.239/iMul-kSite.

Download Full-text

A novel ensemble method for classification in imbalanced datasets using split balancing technique based on instance hardness (sBal_IH)

Neural Computing and Applications ◽

10.1007/s00521-020-05570-7 ◽

2021 ◽

Author(s):

Halimu Chongomweru ◽

Asem Kasem

Keyword(s):

Ensemble Method ◽

Imbalanced Datasets ◽

Instance Hardness

Download Full-text

Measuring Instance Hardness Using Data Complexity Measures

Intelligent Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-030-61380-8_33 ◽

2020 ◽

pp. 483-497

Author(s):

José L. M. Arruda ◽

Ricardo B. C. Prudêncio ◽

Ana C. Lorena

Keyword(s):

Data Complexity ◽

Complexity Measures ◽

Using Data ◽

Instance Hardness

Download Full-text

Cost Sensitive Evaluation of Instance Hardness in Machine Learning

Machine Learning and Knowledge Discovery in Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-030-46147-8_6 ◽

2020 ◽

pp. 86-102

Author(s):

Ricardo B. C. Prudêncio

Keyword(s):

Machine Learning ◽

Instance Hardness

Download Full-text

Instance Hardness as a Decision Criterion on Dynamic Ensemble Structure

2019 8th Brazilian Conference on Intelligent Systems (BRACIS) ◽

10.1109/bracis.2019.00028 ◽

2019 ◽

Cited By ~ 1

Author(s):

Carine Dantas ◽

Romulo Nunes ◽

Anne Canuto ◽

Joao Xavier-Junior

Keyword(s):

Decision Criterion ◽

Ensemble Structure ◽

Instance Hardness

Download Full-text

Study of Undersampling Method: Instance Hardness Threshold with Various Estimators for Hate Speech Classification

IJITEE (International Journal of Information Technology and Electrical Engineering) ◽

10.22146/ijitee.42152 ◽

2018 ◽

Vol 2 (2) ◽

Cited By ~ 3

Author(s):

Naufal Azmi Verdikha ◽

Teguh Bharata Adji ◽

Adhistya Erna Permanasari

Keyword(s):

Social Media ◽

Hate Speech ◽

Imbalanced Data ◽

Poor Performance ◽

Training Data ◽

Weighting Method ◽

Imbalanced Data Classification ◽

Data Problem ◽

Speech Classification ◽

Instance Hardness

A text classification system is needed to address the problem of hate speech in social media. However, texts of hate speech are very hard to find in social media. This will make the distribution of training data to be unbalanced (imbalanced data). Classification with imbalanced data will make a poor performance. There are several methods to solve the problem of classification with imbalanced data. One of them is undersampling with Instance Hardness Threshold (IHT) method. IHT method balances the dataset by eliminating data that are frequently misclassified. To find those data, IHT requires an estimator, which is a classifier. This research aims to compare estimators of IHT method to solve imbalanced data problem in hate speech classification using TF-IDF weighting method. This research uses the class ratio of dataset after undersampling, time of the undersampling process, and Index of Balanced Accuracy (IBA) evaluation to determine the best IHT method. The results of this research show that IHT method using the Logistic Regression (IHT(LR)) has the fastest undersampling process (1.91 s), perfectly balance dataset with the class ratio is 1:1, and has the best of IBA evaluation in all estimation process. This result makes IHT(LR) be the best method to solve the imbalanced data problem in hate speech classification.

Download Full-text