Lazy Learning Associative Classification with WkNN and DWkNN Algorithm

One of the algorithms, which prudently denote better outcomes than the traditional associative classification systems, is the Lazy learning associative classification (LLAC), where the processing of training data is delayed until a test instance is received, whereas in eager learning, before receiving queries, the system begins to process training data. Traditional method assumes that all items within a transaction is same, which is not always true. This paper recommends a new framework called lazy learning associative classification with WkNN (LLAC_WkNN) which uses weighted kNN method with LLAC, that gives a subset of rules when LLAC is applied to the dataset. In order to predict the class label of the unseen test case, the weighted kNN (WkNN) algorithm is then applied to this generated subset. This creates the enhanced accuracy of the classifier. The WkNN also gives an outlier more weight. By applying Dual Distance Weight to LLAC named as LLAC_DWkNN, this limitation of WkNN is resolved. LLAC_DWkNN gives less weightage to outliers, which improve the accuracy of the classifier, further. This algorithm has been applied to different datasets and the experiment results demonstrate that the proposed method is efficient as compared to the traditional and other existing systems.

Download Full-text

ASSOCIATIVE CLASSIFICATION OF MAMMOGRAMS BASED ON PARALLEL MINING OF IMAGE BLOCKS

Biomedical Engineering Applications Basis and Communications ◽

10.4015/s1016237212500470 ◽

2012 ◽

Vol 24 (06) ◽

pp. 513-524

Author(s):

Mohsen Alavash Shooshtari ◽

Keivan Maghooli ◽

Kambiz Badie

Keyword(s):

Association Rules ◽

Classification Systems ◽

Classification Model ◽

Automated Classification ◽

Associative Classification ◽

Classification Problems ◽

Association Rules Mining ◽

Parallel Mining ◽

Transactional Databases ◽

Unique Decision

One of the main objectives of data mining as a promising multidisciplinary field in computer science is to provide a classification model to be used for decision support purposes. In the medical imaging domain, mammograms classification is a difficult diagnostic task which calls for development of automated classification systems. Associative classification, as a special case of association rules mining, has been adopted in classification problems for years. In this paper, an associative classification framework based on parallel mining of image blocks is proposed to be used for mammograms discrimination. Indeed, association rules mining is applied to a commonly used mammography image database to classify digital mammograms into three categories, namely normal, benign and malign. In order to do so, first images are preprocessed and then features are extracted from non-overlapping image blocks and discretized for rule discovery. Association rules are then discovered through parallel mining of transactional databases which correspond to the image blocks, and finally are used within a unique decision-making scheme to predict the class of unknown samples. Finally, experiments are conducted to assess the effectiveness of the proposed framework. Results show that the proposed framework proved successful in terms of accuracy, precision, and recall, and suggest that the framework could be used as the core of any future associative classifier to support mammograms discrimination.

Download Full-text

A review of associative classification mining

The Knowledge Engineering Review ◽

10.1017/s0269888907001026 ◽

2007 ◽

Vol 22 (1) ◽

pp. 37-65 ◽

Cited By ~ 155

Author(s):

FADI THABTAH

Keyword(s):

Classification Systems ◽

Quality Data ◽

Data Sets ◽

Rule Discovery ◽

Associative Classification ◽

Future Directions ◽

Association Rule Discovery ◽

Rule Pruning ◽

Associative Classifiers ◽

Pruning Rule

AbstractAssociative classification mining is a promising approach in data mining that utilizes the association rule discovery techniques to construct classification systems, also known as associative classifiers. In the last few years, a number of associative classification algorithms have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative classification techniques with regards to the above criteria. Finally, future directions in associative classification, such as incremental learning and mining low-quality data sets, are also highlighted in this paper.

Download Full-text

A New Classification Approach Based on Multiple Classification Rules

Mathematical Problems in Engineering ◽

10.1155/2014/818253 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 2

Author(s):

Zhongmei Zhou

Keyword(s):

Classification Rule ◽

Classification Rules ◽

Associative Classification ◽

New Classification ◽

Class Label ◽

Rule Based ◽

Classification Approach ◽

Minimum Support ◽

Multiple Classification ◽

Rule Set

A good classifier can correctly predict new data for which the class label is unknown, so it is important to construct a high accuracy classifier. Hence, classification techniques are much useful in ubiquitous computing. Associative classification achieves higher classification accuracy than some traditional rule-based classification approaches. However, the approach also has two major deficiencies. First, it generates a very large number of association classification rules, especially when the minimum support is set to be low. It is difficult to select a high quality rule set for classification. Second, the accuracy of associative classification depends on the setting of the minimum support and the minimum confidence. In comparison with associative classification, some improved traditional rule-based classification approaches often produce a classification rule set that plays an important role in prediction. Thus, some improved traditional rule-based classification approaches not only achieve better efficiency than associative classification but also get higher accuracy. In this paper, we put forward a new classification approach called CMR (classification based on multiple classification rules). CMR combines the advantages of both associative classification and rule-based classification. Our experimental results show that CMR gets higher accuracy than some traditional rule-based classification methods.

Download Full-text

Lazy Learning Associative Classification with WkNN and DWkNN Algorithm

Authorea ◽

10.22541/au.158033197.71045685 ◽

2020 ◽

Author(s):

Preeti Tamrakar ◽

Syed Ibrahim S P

Keyword(s):

Lazy Learning ◽

Associative Classification

Download Full-text

Assessing predictors for new post translational modification sites: a case study on hydroxylation

10.1101/2020.02.17.952127 ◽

2020 ◽

Author(s):

Damiano Piovesan ◽

Andras Hatos ◽

Giovanni Minervini ◽

Federica Quaglia ◽

Alexander Miguel Monzon ◽

...

Keyword(s):

Protein Function ◽

Sequence Data ◽

High Specificity ◽

The Self ◽

Training Data ◽

Test Case ◽

Post Translational Modification ◽

Specificity And Sensitivity ◽

Training Examples ◽

Better Than

AbstractPost-translational modification (PTM) sites have become popular for predictor development. However, with the exception of phosphorylation and a handful of other examples, PTMs suffer from a limited number of available training examples and their sparsity in protein sequences. Here, proline hydroxylation is taken as an example to compare different methods and evaluate their performance on new experimentally determined sites. As a proxy for an effective experimental design, predictors require both high specificity and sensitivity. However, the self-reported performance is often not indicative of prediction quality and detection of new sites is not guaranteed. We have benchmarked seven published hydroxylation site predictors on two newly constructed independent datasets. The self-reported performance widely overestimates the real accuracy measured on independent datasets. No predictor performs better than random on new examples, indicating the refined models are not sufficiently general to detect new sites. The number of false positives is high and precision low, in particular for non-collagen proteins whose motifs are not conserved. In short, existing predictors for hydroxylation sites do not appear to generalize to new data. Caution is advised when dealing with PTM predictors in the absence of independent evaluations, in particular for unique specific sites such as those involved in signalling.Author SummaryMachine learning methods are extensively used by biologists to design and interpret experiments. Predictors which take the only sequence as input are of particular interest due to the large amount of sequence data available and self-reported performance is often very high. In this work, we evaluated post-translational modification (PTM) predictors for hydroxylation sites and found that they perform no better than random, in strong contrast to performances reported in the original publications. PTMs are chemical amino acids alterations providing the cell with conditional mechanisms to fine tune protein function, thereby regulating complex biological processes such as signalling and cell cycle. Hydroxylation sites are a good PTM test case due to the availability of a range of predictors and an abundance of newly experimentally detected modification sites. Poor performances in our results highlight the overlooked problem of predicting PTMs when best practices are not followed and training data are likely incomplete. Experimentalists should be careful when using PTM predictors blindly and more independent assessments are needed to separate the wheat from the chaff in the field.

Download Full-text

Rough Sets and Colonies of Artificial Ants for the Improvement of Training Sets

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d7325.049420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 995-1000

Keyword(s):

Rough Sets ◽

Supervised Classification ◽

High Efficiency ◽

Classification Systems ◽

Training Data ◽

Ant Colonies ◽

Artificial Ants ◽

Active Research ◽

Training Sets

Improving training sets is an area of active research within l to Artificial Intelligence. In particular, it is of particular interest in supervised classification systems, where the quality of training data is crucial. This paper presents a new method for the improvement of training sets, based on approximate sets and artificial ant colonies. The experimental study carried out with international databases allows us to guarantee the quality of the new algorithm, which has a high efficiency.

Download Full-text

On the Brittleness of Handwritten Digit Recognition Models

ISRN Machine Vision ◽

10.5402/2012/834127 ◽

2012 ◽

Vol 2012 ◽

pp. 1-10 ◽

Cited By ~ 4

Author(s):

Alexander K. Seewald

Keyword(s):

Error Rate ◽

Maximum Error ◽

Error Rates ◽

Classification Systems ◽

Feature Representation ◽

Training Data ◽

Handwritten Digit Recognition ◽

Digit Recognition ◽

Handwritten Digit ◽

The Relationship

Handwritten digit recognition is an important benchmark task in computer vision. Learning algorithms and feature representations which offer excellent performance for this task have been known for some time. Here, we focus on two major practical considerations: the relationship between the the amount of training data and error rate (corresponding to the effort to collect training data to build a model with a given maximum error rate) and the transferability of models' expertise between different datasets (corresponding to the usefulness for general handwritten digit recognition). While the relationship between amount of training data and error rate is very stable and to some extent independent of the specific dataset used—only the classifier and feature representation have significant effect—it has proven to be impossible to transfer low error rates on one or two pooled datasets to similarly low error rates on another dataset. We have called this weakness brittleness, inspired by an old Artificial Intelligence term that means the same thing. This weakness may be a general weakness of trained image classification systems.

Download Full-text

Investigation of Applying Physics Informed Neural Networks (PINN) and Variants on 2D Aerodynamics Problems

Volume 3: Computational Fluid Dynamics; Micro and Nano Fluid Dynamics ◽

10.1115/fedsm2020-20184 ◽

2020 ◽

Author(s):

Wee-Beng Tay ◽

Murali Damodaran ◽

Zhi-Da Teh ◽

Rahul Halder

Keyword(s):

Neural Network ◽

Neural Networks ◽

Artificial Neural Network ◽

Training Data ◽

Batch Size ◽

Test Case ◽

Good Prediction ◽

Time Varying ◽

Artificial Neural ◽

Artificial Neural Network Ann

Abstract Investigation of applying physics informed neural networks on the test case involving flow past Converging-Diverging (CD) Nozzle has been investigated. Both Artificial Neural Network (ANN) and Physics Informed Neural Network (PINN) are used to do the training and prediction. Results show that Artificial Neural Network (ANN) by itself is already able to give relatively good prediction. With the addition of PINN, the error reduces even more, although by only a relatively small amount. This is perhaps due to the already good prediction. The effects of batch size, training iteration and number of epochs on the prediction accuracy have already been tested. It is found that increasing batch size improves the prediction. On the other hand, increasing the training iteration may give poorer prediction due to overfitting. Lastly, in general, increasing epochs reduces the error. More investigations should be done in the future to further reduce the error while at the same time using less training data. More complicated cases with time varying results should also be included. Extrapolation of the results using PINN can also be tested.

Download Full-text