scholarly journals ProLSFEO-LDL: Prototype Selection and Label- Specific Feature Evolutionary Optimization for Label Distribution Learning

2020 ◽  
Vol 10 (9) ◽  
pp. 3089 ◽  
Author(s):  
Manuel González ◽  
José-Ramón Cano ◽  
Salvador García

Label Distribution Learning (LDL) is a general learning framework that assigns an instance to a distribution over a set of labels rather than to a single label or multiple labels. Current LDL methods have proven their effectiveness in many real-life machine learning applications. In LDL problems, instance-based algorithms and particularly the adapted version of the k-nearest neighbors method for LDL (AA-kNN) has proven to be very competitive, achieving acceptable results and allowing an explainable model. However, it suffers from several handicaps: it needs large storage requirements, it is not efficient predicting and presents a low tolerance to noise. The purpose of this paper is to mitigate these effects by adding a data reduction stage. The technique devised, called Prototype selection and Label-Specific Feature Evolutionary Optimization for LDL (ProLSFEO-LDL), is a novel method to simultaneously address the prototype selection and the label-specific feature selection pre-processing techniques. Both techniques pose a complex optimization problem with a huge search space. Therefore, we have proposed a search method based on evolutionary algorithms that allows us to obtain a solution to both problems in a reasonable time. The effectiveness of the proposed ProLSFEO-LDL method is verified on several real-world LDL datasets, showing significant improvements in comparison with using raw datasets.

2018 ◽  
Vol 2018 ◽  
pp. 1-11
Author(s):  
Wenyuan Yang ◽  
Chan Li ◽  
Hong Zhao

Multilabel learning that focuses on an instance of the corresponding related or unrelated label can solve many ambiguity problems. Label distribution learning (LDL) reflects the importance of the related label to an instance and offers a more general learning framework than multilabel learning. However, the current LDL algorithms ignore the linear relationship between the distribution of labels and the feature. In this paper, we propose a regularized sample self-representation (RSSR) approach for LDL. First, the label distribution problem is formalized by sample self-representation, whereby each label distribution can be represented as a linear combination of its relevant features. Second, the LDL problem is solved by L2-norm least-squares and L2,1-norm least-squares methods to reduce the effects of outliers and overfitting. The corresponding algorithms are named RSSR-LDL2 and RSSR-LDL21. Third, the proposed algorithms are compared with four state-of-the-art LDL algorithms using 12 public datasets and five evaluation metrics. The results demonstrate that the proposed algorithms can effectively identify the predictive label distribution and exhibit good performance in terms of distance and similarity evaluations.


2021 ◽  
Vol 16 (2) ◽  
pp. 1-31
Author(s):  
Chunkai Zhang ◽  
Zilin Du ◽  
Yuting Yang ◽  
Wensheng Gan ◽  
Philip S. Yu

Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this article, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS + , to extract on-shelf high-utility sequential patterns. For further efficiency, we also design several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility ( TPEU ) and time reduced sequence utility ( TRSU ). In addition, two novel data structures are developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS + has wider real-life applications owing to its high efficiency.


2021 ◽  
Vol 436 ◽  
pp. 12-21
Author(s):  
Xinyue Dong ◽  
Shilin Gu ◽  
Wenzhang Zhuge ◽  
Tingjin Luo ◽  
Chenping Hou

2021 ◽  
pp. 1-13
Author(s):  
Jenish Dhanani ◽  
Rupa Mehta ◽  
Dipti Rana

Legal practitioners analyze relevant previous judgments to prepare favorable and advantageous arguments for an ongoing case. In Legal domain, recommender systems (RS) effectively identify and recommend referentially and/or semantically relevant judgments. Due to the availability of enormous amounts of judgments, RS needs to compute pairwise similarity scores for all unique judgment pairs in advance, aiming to minimize the recommendation response time. This practice introduces the scalability issue as the number of pairs to be computed increases quadratically with the number of judgments i.e., O (n2). However, there is a limited number of pairs consisting of strong relevance among the judgments. Therefore, it is insignificant to compute similarities for pairs consisting of trivial relevance between judgments. To address the scalability issue, this research proposes a graph clustering based novel Legal Document Recommendation System (LDRS) that forms clusters of referentially similar judgments and within those clusters find semantically relevant judgments. Hence, pairwise similarity scores are computed for each cluster to restrict search space within-cluster only instead of the entire corpus. Thus, the proposed LDRS severely reduces the number of similarity computations that enable large numbers of judgments to be handled. It exploits a highly scalable Louvain approach to cluster judgment citation network, and Doc2Vec to capture the semantic relevance among judgments within a cluster. The efficacy and efficiency of the proposed LDRS are evaluated and analyzed using the large real-life judgments of the Supreme Court of India. The experimental results demonstrate the encouraging performance of proposed LDRS in terms of Accuracy, F1-Scores, MCC Scores, and computational complexity, which validates the applicability for scalable recommender systems.


2014 ◽  
Vol 23 (02) ◽  
pp. 1450001
Author(s):  
T. Hamrouni ◽  
S. Ben Yahia ◽  
E. Mephu Nguifo

In many real-life datasets, the number of extracted frequent patterns was shown to be huge, hampering the effective exploitation of such amount of knowledge by human experts. To overcome this limitation, exact condensed representations were introduced in order to offer a small-sized set of elements from which the faithful retrieval of all frequent patterns is possible. In this paper, we introduce a new exact condensed representation only based on particular elements from the disjunctive search space. In this space, a pattern is characterized by its disjunctive support, i.e., the frequency of complementary occurrences – instead of the ubiquitous co-occurrence link – of its items. For several benchmark datasets, this representation has been shown interesting in compactness terms compared to the pioneering approaches of the literature. In this respect, we mainly focus here on proposing an efficient tool for mining this representation. For this purpose, we introduce an algorithm, called DSSRM, dedicated to this task. We also propose several techniques to optimize its mining time as well as its memory consumption. The carried out empirical study on benchmark datasets shows that DSSRM is faster by several orders of magnitude than the MEP algorithm.


Author(s):  
Marlene Arangú ◽  
Miguel Salido

A fine-grained arc-consistency algorithm for non-normalized constraint satisfaction problems Constraint programming is a powerful software technology for solving numerous real-life problems. Many of these problems can be modeled as Constraint Satisfaction Problems (CSPs) and solved using constraint programming techniques. However, solving a CSP is NP-complete so filtering techniques to reduce the search space are still necessary. Arc-consistency algorithms are widely used to prune the search space. The concept of arc-consistency is bidirectional, i.e., it must be ensured in both directions of the constraint (direct and inverse constraints). Two of the most well-known and frequently used arc-consistency algorithms for filtering CSPs are AC3 and AC4. These algorithms repeatedly carry out revisions and require support checks for identifying and deleting all unsupported values from the domains. Nevertheless, many revisions are ineffective, i.e., they cannot delete any value and consume a lot of checks and time. In this paper, we present AC4-OP, an optimized version of AC4 that manages the binary and non-normalized constraints in only one direction, storing the inverse founded supports for their later evaluation. Thus, it reduces the propagation phase avoiding unnecessary or ineffective checking. The use of AC4-OP reduces the number of constraint checks by 50% while pruning the same search space as AC4. The evaluation section shows the improvement of AC4-OP over AC4, AC6 and AC7 in random and non-normalized instances.


Author(s):  
Xiuyi Jia ◽  
Xiaoxia Shen ◽  
Weiwei Li ◽  
Yunan Lu ◽  
Jihua Zhu

2008 ◽  
Vol 16 (4) ◽  
pp. 483-507 ◽  
Author(s):  
Leonardo Trujillo ◽  
Gustavo Olague

This work describes how evolutionary computation can be used to synthesize low-level image operators that detect interesting points on digital images. Interest point detection is an essential part of many modern computer vision systems that solve tasks such as object recognition, stereo correspondence, and image indexing, to name but a few. The design of the specialized operators is posed as an optimization/search problem that is solved with genetic programming (GP), a strategy still mostly unexplored by the computer vision community. The proposed approach automatically synthesizes operators that are competitive with state-of-the-art designs, taking into account an operator's geometric stability and the global separability of detected points during fitness evaluation. The GP search space is defined using simple primitive operations that are commonly found in point detectors proposed by the vision community. The experiments described in this paper extend previous results (Trujillo and Olague, 2006a,b) by presenting 15 new operators that were synthesized through the GP-based search. Some of the synthesized operators can be regarded as improved manmade designs because they employ well-known image processing techniques and achieve highly competitive performance. On the other hand, since the GP search also generates what can be considered as unconventional operators for point detection, these results provide a new perspective to feature extraction research.


2017 ◽  
Vol 16 (06) ◽  
pp. 1549-1579 ◽  
Author(s):  
Jerry Chun-Wei Lin ◽  
Wensheng Gan ◽  
Philippe Fournier-Viger ◽  
Tzung-Pei Hong ◽  
Han-Chieh Chao

Frequent itemset mining (FIM) is a fundamental set of techniques used to discover useful and meaningful relationships between items in transaction databases. In recent decades, extensions of FIM such as weighted frequent itemset mining (WFIM) and frequent itemset mining in uncertain databases (UFIM) have been proposed. WFIM considers that items may have different weight/importance. It can thus discover itemsets that are more useful and meaningful by ignoring irrelevant itemsets with lower weights. UFIM takes into account that data collected in a real-life environment may often be inaccurate, imprecise, or incomplete. Recently, these two ideas have been combined in the HEWI-Uapriori algorithm. This latter considers both item weights and transaction uncertainty to mine the high expected weighted itemsets (HEWIs) using a two-phase Apriori-based approach. Although the upper-bound proposed in HEWI-Uapriori can reduce the size of the search space, it still generates a large amount of candidates and uses a level-wise search. In this paper, a more efficient algorithm named HEWI-Utree is developed to efficiently mine HEWIs without performing multiple database scans and without generating candidates. This algorithm relies on three novel structures named element (E)-table, weighted-probability (WP)-table and WP-tree to maintain the information required for identifying and pruning unpromising itemsets early. Experimental results show that the proposed algorithm is generally much more efficient than traditional methods for WFIM and UFIM, as well as the state-of-the-art HEWI-Uapriori algorithm, in terms of runtime, memory consumption, and scalability.


Sign in / Sign up

Export Citation Format

Share Document