ProLSFEO-LDL: Prototype Selection and Label- Specific Feature Evolutionary Optimization for Label Distribution Learning

Label Distribution Learning (LDL) is a general learning framework that assigns an instance to a distribution over a set of labels rather than to a single label or multiple labels. Current LDL methods have proven their effectiveness in many real-life machine learning applications. In LDL problems, instance-based algorithms and particularly the adapted version of the k-nearest neighbors method for LDL (AA-kNN) has proven to be very competitive, achieving acceptable results and allowing an explainable model. However, it suffers from several handicaps: it needs large storage requirements, it is not efficient predicting and presents a low tolerance to noise. The purpose of this paper is to mitigate these effects by adding a data reduction stage. The technique devised, called Prototype selection and Label-Specific Feature Evolutionary Optimization for LDL (ProLSFEO-LDL), is a novel method to simultaneously address the prototype selection and the label-specific feature selection pre-processing techniques. Both techniques pose a complex optimization problem with a huge search space. Therefore, we have proposed a search method based on evolutionary algorithms that allows us to obtain a solution to both problems in a reasonable time. The effectiveness of the proposed ProLSFEO-LDL method is verified on several real-world LDL datasets, showing significant improvements in comparison with using raw datasets.

Download Full-text

Label Distribution Learning by Regularized Sample Self-Representation

Mathematical Problems in Engineering ◽

10.1155/2018/1090565 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11

Author(s):

Wenyuan Yang ◽

Chan Li ◽

Hong Zhao

Keyword(s):

Least Squares ◽

Linear Relationship ◽

Linear Combination ◽

State Of The Art ◽

Distribution Problem ◽

Learning Framework ◽

Multilabel Learning ◽

Label Distribution Learning ◽

Public Datasets ◽

Label Distribution

Multilabel learning that focuses on an instance of the corresponding related or unrelated label can solve many ambiguity problems. Label distribution learning (LDL) reflects the importance of the related label to an instance and offers a more general learning framework than multilabel learning. However, the current LDL algorithms ignore the linear relationship between the distribution of labels and the feature. In this paper, we propose a regularized sample self-representation (RSSR) approach for LDL. First, the label distribution problem is formalized by sample self-representation, whereby each label distribution can be represented as a linear combination of its relevant features. Second, the LDL problem is solved by L2-norm least-squares and L2,1-norm least-squares methods to reduce the effects of outliers and overfitting. The corresponding algorithms are named RSSR-LDL2 and RSSR-LDL21. Third, the proposed algorithms are compared with four state-of-the-art LDL algorithms using 12 public datasets and five evaluation metrics. The results demonstrate that the proposed algorithms can effectively identify the predictive label distribution and exhibit good performance in terms of distance and similarity evaluations.

Download Full-text

On-Shelf Utility Mining of Sequence Data

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3457570 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-31

Author(s):

Chunkai Zhang ◽

Zilin Du ◽

Yuting Yang ◽

Wensheng Gan ◽

Philip S. Yu

Keyword(s):

High Efficiency ◽

Sequence Data ◽

Real Life ◽

Search Space ◽

Upper Bounds ◽

Utility Mining ◽

Limited Memory ◽

Time Periods ◽

High Utility ◽

Synthetic Datasets

Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this article, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS + , to extract on-shelf high-utility sequential patterns. For further efficiency, we also design several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility ( TPEU ) and time reduced sequence utility ( TRSU ). In addition, two novel data structures are developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS + has wider real-life applications owing to its high efficiency.

Download Full-text

Active label distribution learning

Neurocomputing ◽

10.1016/j.neucom.2020.12.128 ◽

2021 ◽

Vol 436 ◽

pp. 12-21

Author(s):

Xinyue Dong ◽

Shilin Gu ◽

Wenzhang Zhuge ◽

Tingjin Luo ◽

Chenping Hou

Keyword(s):

Label Distribution Learning ◽

Label Distribution

Download Full-text

Legal document recommendation system: A cluster based pairwise similarity computation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189871 ◽

2021 ◽

pp. 1-13

Author(s):

Jenish Dhanani ◽

Rupa Mehta ◽

Dipti Rana

Keyword(s):

Recommender Systems ◽

Recommendation System ◽

Real Life ◽

Citation Network ◽

Search Space ◽

Pairwise Similarity ◽

Large Numbers ◽

Legal Document ◽

Legal Domain ◽

Similarity Scores

Legal practitioners analyze relevant previous judgments to prepare favorable and advantageous arguments for an ongoing case. In Legal domain, recommender systems (RS) effectively identify and recommend referentially and/or semantically relevant judgments. Due to the availability of enormous amounts of judgments, RS needs to compute pairwise similarity scores for all unique judgment pairs in advance, aiming to minimize the recommendation response time. This practice introduces the scalability issue as the number of pairs to be computed increases quadratically with the number of judgments i.e., O (n2). However, there is a limited number of pairs consisting of strong relevance among the judgments. Therefore, it is insignificant to compute similarities for pairs consisting of trivial relevance between judgments. To address the scalability issue, this research proposes a graph clustering based novel Legal Document Recommendation System (LDRS) that forms clusters of referentially similar judgments and within those clusters find semantically relevant judgments. Hence, pairwise similarity scores are computed for each cluster to restrict search space within-cluster only instead of the entire corpus. Thus, the proposed LDRS severely reduces the number of similarity computations that enable large numbers of judgments to be handled. It exploits a highly scalable Louvain approach to cluster judgment citation network, and Doc2Vec to capture the semantic relevance among judgments within a cluster. The efficacy and efficiency of the proposed LDRS are evaluated and analyzed using the large real-life judgments of the Supreme Court of India. The experimental results demonstrate the encouraging performance of proposed LDRS in terms of Accuracy, F1-Scores, MCC Scores, and computational complexity, which validates the applicability for scalable recommender systems.

Download Full-text

Towards Faster Mining of Disjunction-Based Concise Representations of Frequent Patterns

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213014500018 ◽

2014 ◽

Vol 23 (02) ◽

pp. 1450001

Author(s):

T. Hamrouni ◽

S. Ben Yahia ◽

E. Mephu Nguifo

Keyword(s):

Empirical Study ◽

Real Life ◽

Search Space ◽

Frequent Patterns ◽

Memory Consumption ◽

Efficient Tool ◽

Condensed Representation ◽

Benchmark Datasets ◽

Condensed Representations ◽

Amount Of Knowledge

In many real-life datasets, the number of extracted frequent patterns was shown to be huge, hampering the effective exploitation of such amount of knowledge by human experts. To overcome this limitation, exact condensed representations were introduced in order to offer a small-sized set of elements from which the faithful retrieval of all frequent patterns is possible. In this paper, we introduce a new exact condensed representation only based on particular elements from the disjunctive search space. In this space, a pattern is characterized by its disjunctive support, i.e., the frequency of complementary occurrences – instead of the ubiquitous co-occurrence link – of its items. For several benchmark datasets, this representation has been shown interesting in compactness terms compared to the pioneering approaches of the literature. In this respect, we mainly focus here on proposing an efficient tool for mining this representation. For this purpose, we introduce an algorithm, called DSSRM, dedicated to this task. We also propose several techniques to optimize its mining time as well as its memory consumption. The carried out empirical study on benchmark datasets shows that DSSRM is faster by several orders of magnitude than the MEP algorithm.

Download Full-text

A fine-grained arc-consistency algorithm for non-normalized constraint satisfaction problems

International Journal of Applied Mathematics and Computer Science ◽

10.2478/v10006-011-0058-2 ◽

2011 ◽

Vol 21 (4) ◽

pp. 733-744 ◽

Cited By ~ 1

Author(s):

Marlene Arangú ◽

Miguel Salido

Keyword(s):

Constraint Satisfaction ◽

Constraint Programming ◽

Real Life ◽

Search Space ◽

Constraint Satisfaction Problems ◽

Fine Grained ◽

Arc Consistency ◽

Life Problems ◽

Programming Techniques ◽

Np Complete

A fine-grained arc-consistency algorithm for non-normalized constraint satisfaction problems Constraint programming is a powerful software technology for solving numerous real-life problems. Many of these problems can be modeled as Constraint Satisfaction Problems (CSPs) and solved using constraint programming techniques. However, solving a CSP is NP-complete so filtering techniques to reduce the search space are still necessary. Arc-consistency algorithms are widely used to prune the search space. The concept of arc-consistency is bidirectional, i.e., it must be ensured in both directions of the constraint (direct and inverse constraints). Two of the most well-known and frequently used arc-consistency algorithms for filtering CSPs are AC3 and AC4. These algorithms repeatedly carry out revisions and require support checks for identifying and deleting all unsupported values from the domains. Nevertheless, many revisions are ineffective, i.e., they cannot delete any value and consume a lot of checks and time. In this paper, we present AC4-OP, an optimized version of AC4 that manages the binary and non-normalized constraints in only one direction, storing the inverse founded supports for their later evaluation. Thus, it reduces the propagation phase avoiding unnecessary or ineffective checking. The use of AC4-OP reduces the number of constraint checks by 50% while pruning the same search space as AC4. The evaluation section shows the improvement of AC4-OP over AC4, AC6 and AC7 in random and non-normalized instances.

Download Full-text

Label distribution learning by maintaining label ranking relation

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2021.3099294 ◽

2021 ◽

pp. 1-1

Author(s):

Xiuyi Jia ◽

Xiaoxia Shen ◽

Weiwei Li ◽

Yunan Lu ◽

Jihua Zhu

Keyword(s):

Label Ranking ◽

Label Distribution Learning ◽

Label Distribution

Download Full-text

Label Distribution Learning by Exploiting Label Distribution Manifold

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2021.3103178 ◽

2021 ◽

pp. 1-14

Author(s):

Jing Wang ◽

Xin Geng

Keyword(s):

Label Distribution Learning ◽

Label Distribution

Download Full-text

Automated Design of Image Operators that Detect Interest Points

Evolutionary Computation ◽

10.1162/evco.2008.16.4.483 ◽

2008 ◽

Vol 16 (4) ◽

pp. 483-507 ◽

Cited By ~ 57

Author(s):

Leonardo Trujillo ◽

Gustavo Olague

Keyword(s):

Computer Vision ◽

Search Space ◽

Search Problem ◽

Interest Points ◽

Fitness Evaluation ◽

Modern Computer ◽

New Perspective ◽

Processing Techniques ◽

Point Detection ◽

Optimization Search

This work describes how evolutionary computation can be used to synthesize low-level image operators that detect interesting points on digital images. Interest point detection is an essential part of many modern computer vision systems that solve tasks such as object recognition, stereo correspondence, and image indexing, to name but a few. The design of the specialized operators is posed as an optimization/search problem that is solved with genetic programming (GP), a strategy still mostly unexplored by the computer vision community. The proposed approach automatically synthesizes operators that are competitive with state-of-the-art designs, taking into account an operator's geometric stability and the global separability of detected points during fitness evaluation. The GP search space is defined using simple primitive operations that are commonly found in point detectors proposed by the vision community. The experiments described in this paper extend previous results (Trujillo and Olague, 2006a,b) by presenting 15 new operators that were synthesized through the GP-based search. Some of the synthesized operators can be regarded as improved manmade designs because they employ well-known image processing techniques and achieve highly competitive performance. On the other hand, since the GP search also generates what can be considered as unconventional operators for point detection, these results provide a new perspective to feature extraction research.

Download Full-text

Mining Weighted Frequent Itemsets without Candidate Generation in Uncertain Databases

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622017500341 ◽

2017 ◽

Vol 16 (06) ◽

pp. 1549-1579 ◽

Cited By ~ 7

Author(s):

Jerry Chun-Wei Lin ◽

Wensheng Gan ◽

Philippe Fournier-Viger ◽

Tzung-Pei Hong ◽

Han-Chieh Chao

Keyword(s):

Real Life ◽

Search Space ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Uncertain Databases ◽

Two Phase ◽

Itemset Mining ◽

Meaningful Relationships ◽

Weighted Probability ◽

Novel Structures

Frequent itemset mining (FIM) is a fundamental set of techniques used to discover useful and meaningful relationships between items in transaction databases. In recent decades, extensions of FIM such as weighted frequent itemset mining (WFIM) and frequent itemset mining in uncertain databases (UFIM) have been proposed. WFIM considers that items may have different weight/importance. It can thus discover itemsets that are more useful and meaningful by ignoring irrelevant itemsets with lower weights. UFIM takes into account that data collected in a real-life environment may often be inaccurate, imprecise, or incomplete. Recently, these two ideas have been combined in the HEWI-Uapriori algorithm. This latter considers both item weights and transaction uncertainty to mine the high expected weighted itemsets (HEWIs) using a two-phase Apriori-based approach. Although the upper-bound proposed in HEWI-Uapriori can reduce the size of the search space, it still generates a large amount of candidates and uses a level-wise search. In this paper, a more efficient algorithm named HEWI-Utree is developed to efficiently mine HEWIs without performing multiple database scans and without generating candidates. This algorithm relies on three novel structures named element (E)-table, weighted-probability (WP)-table and WP-tree to maintain the information required for identifying and pruning unpromising itemsets early. Experimental results show that the proposed algorithm is generally much more efficient than traditional methods for WFIM and UFIM, as well as the state-of-the-art HEWI-Uapriori algorithm, in terms of runtime, memory consumption, and scalability.

Download Full-text