Legal document recommendation system: A cluster based pairwise similarity computation

Legal practitioners analyze relevant previous judgments to prepare favorable and advantageous arguments for an ongoing case. In Legal domain, recommender systems (RS) effectively identify and recommend referentially and/or semantically relevant judgments. Due to the availability of enormous amounts of judgments, RS needs to compute pairwise similarity scores for all unique judgment pairs in advance, aiming to minimize the recommendation response time. This practice introduces the scalability issue as the number of pairs to be computed increases quadratically with the number of judgments i.e., O (n2). However, there is a limited number of pairs consisting of strong relevance among the judgments. Therefore, it is insignificant to compute similarities for pairs consisting of trivial relevance between judgments. To address the scalability issue, this research proposes a graph clustering based novel Legal Document Recommendation System (LDRS) that forms clusters of referentially similar judgments and within those clusters find semantically relevant judgments. Hence, pairwise similarity scores are computed for each cluster to restrict search space within-cluster only instead of the entire corpus. Thus, the proposed LDRS severely reduces the number of similarity computations that enable large numbers of judgments to be handled. It exploits a highly scalable Louvain approach to cluster judgment citation network, and Doc2Vec to capture the semantic relevance among judgments within a cluster. The efficacy and efficiency of the proposed LDRS are evaluated and analyzed using the large real-life judgments of the Supreme Court of India. The experimental results demonstrate the encouraging performance of proposed LDRS in terms of Accuracy, F1-Scores, MCC Scores, and computational complexity, which validates the applicability for scalable recommender systems.

Download Full-text

Indian Judgment Categorization for Practicing Similar Judgment Identification

Data Preprocessing, Active Learning, and Cost Perceptive Approaches for Resolving Data Imbalance - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-7371-6.ch013 ◽

2021 ◽

pp. 232-241

Author(s):

Jenish Dhanani ◽

Rupa G. Mehta ◽

Dipti P. Rana ◽

Rahul Lad ◽

Amogh Agrawal ◽

...

Keyword(s):

Search Space ◽

Automated System ◽

Court Case ◽

Criminal Cases ◽

Legal Information ◽

Law System ◽

Legal Professionals ◽

Related Information ◽

Legal Document ◽

Legal Domain

Recently, legal information retrieval has emerged as an essential practice for the legal fraternity. In the legal domain, judgment is a specific kind of legal document, which discusses case-related information and the verdict of a court case. In the common law system, the legal professionals exploit relevant judgments to prepare arguments. Hence, an automated system is a vital demand to identify similar judgments effectively. The judgments can be broadly categorized into civil and criminal cases, where judgments with similar case matters can have strong relevance compared to judgments with different case matters. In similar judgment identification, categorized judgments can significantly prune search space by restrictive search within a specific case category. So, this chapter provides a novel methodology that classifies Indian judgments in either of the case matter. Crucial challenges like imbalance and intrinsic characteristics of legal data are also highlighted specific to similarity analysis of Indian judgments, which can be a motivating aspect to the research community.

Download Full-text

RECOMMENDER SYSTEMS: PREDICTING WITH MACHINE LEARNING BASED ON POPULATION-BASED ALGORITHM

Vestnik komp iuternykh i informatsionnykh tekhnologii ◽

10.14489/vkit.2020.01.pp.048-056 ◽

2020 ◽

pp. 48-56

Author(s):

S. I. Rodzin ◽

O. N. Rodzina

Keyword(s):

Collaborative Filtering ◽

Recommender Systems ◽

Hybrid Model ◽

Recommendation System ◽

Pearson Correlation ◽

Experimental Studies ◽

Search Space ◽

Population Based ◽

Content Filtering ◽

Population Based Algorithm

The article considers the formulation of the forecasting problem as well as such problems of recommender systems as data sparsity, cold start, scalability, synonymy, fraud, diversity, white crows. Combining the results of collaborative and content filtering gives us two possibilities. On the one hand, to weigh the results according to the content data. On the other hand, to shift these weights towards collaborative filtering as soon as data about a particular user appears. In turn, this improves the accuracy of the recommendations. The authors propose a hybrid model of a recommender system. Such a system includes the characteristics of collaborative and content filtering both. Also, the population-based algorithm for filtering and the architecture of a recommendation system based on it are described in the article. The algorithm consists of the following steps: study the search space; synthesis of solutions, i.e. points of this space; request quality assessment decisions or “fitness”; using it to make “natural selection”. Here we see the learning process about which areas of the search space contain the best solutions. The population of user “characteristics” encoded in the population-based algorithm supports a variety of input data in a hybrid model. The authors propose a coding structure for decisions in a population-based algorithm using the example of a recommender movie viewing system. Drift analysis evaluates the polynomial complexity of the algorithm. The authors demonstrate the results of experimental studies on an array of benchmarks. We also present an assessment of filtration efficiency based on a hybrid model and a population-based algorithm in comparison with the traditional method of collaborative filtering using the Pearson correlation coefficient. We can see that the prediction accuracy of the population-based algorithm is higher than that of the Pearson algorithm.

Download Full-text

An Interactive Recommendation System for Decision Making Based on the Characterization of Cognitive Tasks

Mathematical and Computational Applications ◽

10.3390/mca26020035 ◽

2021 ◽

Vol 26 (2) ◽

pp. 35

Author(s):

Teodoro Macias-Escobar ◽

Laura Cruz-Reyes ◽

César Medina-Trejo ◽

Claudia Gómez-Santillán ◽

Nelson Rangel-Valdez ◽

...

Keyword(s):

Decision Making ◽

Recommender Systems ◽

Recommender System ◽

Recommendation System ◽

Optimization Problems ◽

Real Life ◽

Argumentation Theory ◽

Cognitive Tasks ◽

Decision Aiding

The decision-making process can be complex and underestimated, where mismanagement could lead to poor results and excessive spending. This situation appears in highly complex multi-criteria problems such as the project portfolio selection (PPS) problem. Therefore, a recommender system becomes crucial to guide the solution search process. To our knowledge, most recommender systems that use argumentation theory are not proposed for multi-criteria optimization problems. Besides, most of the current recommender systems focused on PPS problems do not attempt to justify their recommendations. This work studies the characterization of cognitive tasks involved in the decision-aiding process to propose a framework for the Decision Aid Interactive Recommender System (DAIRS). The proposed system focuses on a user-system interaction that guides the search towards the best solution considering a decision-maker’s preferences. The developed framework uses argumentation theory supported by argumentation schemes, dialogue games, proof standards, and two state transition diagrams (STD) to generate and explain its recommendations to the user. This work presents a prototype of DAIRS to evaluate the user experience on multiple real-life case simulations through a usability measurement. The prototype and both STDs received a satisfying score and mostly overall acceptance by the test users.

Download Full-text

On-Shelf Utility Mining of Sequence Data

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3457570 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-31

Author(s):

Chunkai Zhang ◽

Zilin Du ◽

Yuting Yang ◽

Wensheng Gan ◽

Philip S. Yu

Keyword(s):

High Efficiency ◽

Sequence Data ◽

Real Life ◽

Search Space ◽

Upper Bounds ◽

Utility Mining ◽

Limited Memory ◽

Time Periods ◽

High Utility ◽

Synthetic Datasets

Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this article, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS + , to extract on-shelf high-utility sequential patterns. For further efficiency, we also design several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility ( TPEU ) and time reduced sequence utility ( TRSU ). In addition, two novel data structures are developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS + has wider real-life applications owing to its high efficiency.

Download Full-text

Towards Faster Mining of Disjunction-Based Concise Representations of Frequent Patterns

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213014500018 ◽

2014 ◽

Vol 23 (02) ◽

pp. 1450001

Author(s):

T. Hamrouni ◽

S. Ben Yahia ◽

E. Mephu Nguifo

Keyword(s):

Empirical Study ◽

Real Life ◽

Search Space ◽

Frequent Patterns ◽

Memory Consumption ◽

Efficient Tool ◽

Condensed Representation ◽

Benchmark Datasets ◽

Condensed Representations ◽

Amount Of Knowledge

In many real-life datasets, the number of extracted frequent patterns was shown to be huge, hampering the effective exploitation of such amount of knowledge by human experts. To overcome this limitation, exact condensed representations were introduced in order to offer a small-sized set of elements from which the faithful retrieval of all frequent patterns is possible. In this paper, we introduce a new exact condensed representation only based on particular elements from the disjunctive search space. In this space, a pattern is characterized by its disjunctive support, i.e., the frequency of complementary occurrences – instead of the ubiquitous co-occurrence link – of its items. For several benchmark datasets, this representation has been shown interesting in compactness terms compared to the pioneering approaches of the literature. In this respect, we mainly focus here on proposing an efficient tool for mining this representation. For this purpose, we introduce an algorithm, called DSSRM, dedicated to this task. We also propose several techniques to optimize its mining time as well as its memory consumption. The carried out empirical study on benchmark datasets shows that DSSRM is faster by several orders of magnitude than the MEP algorithm.

Download Full-text

A fine-grained arc-consistency algorithm for non-normalized constraint satisfaction problems

International Journal of Applied Mathematics and Computer Science ◽

10.2478/v10006-011-0058-2 ◽

2011 ◽

Vol 21 (4) ◽

pp. 733-744 ◽

Cited By ~ 1

Author(s):

Marlene Arangú ◽

Miguel Salido

Keyword(s):

Constraint Satisfaction ◽

Constraint Programming ◽

Real Life ◽

Search Space ◽

Constraint Satisfaction Problems ◽

Fine Grained ◽

Arc Consistency ◽

Life Problems ◽

Programming Techniques ◽

Np Complete

A fine-grained arc-consistency algorithm for non-normalized constraint satisfaction problems Constraint programming is a powerful software technology for solving numerous real-life problems. Many of these problems can be modeled as Constraint Satisfaction Problems (CSPs) and solved using constraint programming techniques. However, solving a CSP is NP-complete so filtering techniques to reduce the search space are still necessary. Arc-consistency algorithms are widely used to prune the search space. The concept of arc-consistency is bidirectional, i.e., it must be ensured in both directions of the constraint (direct and inverse constraints). Two of the most well-known and frequently used arc-consistency algorithms for filtering CSPs are AC3 and AC4. These algorithms repeatedly carry out revisions and require support checks for identifying and deleting all unsupported values from the domains. Nevertheless, many revisions are ineffective, i.e., they cannot delete any value and consume a lot of checks and time. In this paper, we present AC4-OP, an optimized version of AC4 that manages the binary and non-normalized constraints in only one direction, storing the inverse founded supports for their later evaluation. Thus, it reduces the propagation phase avoiding unnecessary or ineffective checking. The use of AC4-OP reduces the number of constraint checks by 50% while pruning the same search space as AC4. The evaluation section shows the improvement of AC4-OP over AC4, AC6 and AC7 in random and non-normalized instances.

Download Full-text

Toward citation recommender systems considering the article impact in the extended nearby citation network

Peer-to-Peer Networking and Applications ◽

10.1007/s12083-018-0687-4 ◽

2018 ◽

Vol 12 (5) ◽

pp. 1336-1345 ◽

Cited By ~ 2

Author(s):

Abdulrhman M. Alshareef ◽

Mohammed F. Alhamid ◽

Abdulmotaleb El Saddik

Keyword(s):

Recommender Systems ◽

Citation Network ◽

Article Impact

Download Full-text

Mining Weighted Frequent Itemsets without Candidate Generation in Uncertain Databases

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622017500341 ◽

2017 ◽

Vol 16 (06) ◽

pp. 1549-1579 ◽

Cited By ~ 7

Author(s):

Jerry Chun-Wei Lin ◽

Wensheng Gan ◽

Philippe Fournier-Viger ◽

Tzung-Pei Hong ◽

Han-Chieh Chao

Keyword(s):

Real Life ◽

Search Space ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Uncertain Databases ◽

Two Phase ◽

Itemset Mining ◽

Meaningful Relationships ◽

Weighted Probability ◽

Novel Structures

Frequent itemset mining (FIM) is a fundamental set of techniques used to discover useful and meaningful relationships between items in transaction databases. In recent decades, extensions of FIM such as weighted frequent itemset mining (WFIM) and frequent itemset mining in uncertain databases (UFIM) have been proposed. WFIM considers that items may have different weight/importance. It can thus discover itemsets that are more useful and meaningful by ignoring irrelevant itemsets with lower weights. UFIM takes into account that data collected in a real-life environment may often be inaccurate, imprecise, or incomplete. Recently, these two ideas have been combined in the HEWI-Uapriori algorithm. This latter considers both item weights and transaction uncertainty to mine the high expected weighted itemsets (HEWIs) using a two-phase Apriori-based approach. Although the upper-bound proposed in HEWI-Uapriori can reduce the size of the search space, it still generates a large amount of candidates and uses a level-wise search. In this paper, a more efficient algorithm named HEWI-Utree is developed to efficiently mine HEWIs without performing multiple database scans and without generating candidates. This algorithm relies on three novel structures named element (E)-table, weighted-probability (WP)-table and WP-tree to maintain the information required for identifying and pruning unpromising itemsets early. Experimental results show that the proposed algorithm is generally much more efficient than traditional methods for WFIM and UFIM, as well as the state-of-the-art HEWI-Uapriori algorithm, in terms of runtime, memory consumption, and scalability.

Download Full-text

Blood Coagulation Algorithm: A Novel Bio-Inspired Meta-Heuristic Algorithm for Global Optimization

Mathematics ◽

10.3390/math9233011 ◽

2021 ◽

Vol 9 (23) ◽

pp. 3011

Author(s):

Drishti Yadav

Keyword(s):

Blood Coagulation ◽

Heuristic Algorithms ◽

Cooperative Behavior ◽

Real Life ◽

Search Space ◽

Population Based ◽

Test Analysis ◽

Search Spaces ◽

Consistent Performance ◽

The Comparative Study

This paper introduces a novel population-based bio-inspired meta-heuristic optimization algorithm, called Blood Coagulation Algorithm (BCA). BCA derives inspiration from the process of blood coagulation in the human body. The underlying concepts and ideas behind the proposed algorithm are the cooperative behavior of thrombocytes and their intelligent strategy of clot formation. These behaviors are modeled and utilized to underscore intensification and diversification in a given search space. A comparison with various state-of-the-art meta-heuristic algorithms over a test suite of 23 renowned benchmark functions reflects the efficiency of BCA. An extensive investigation is conducted to analyze the performance, convergence behavior and computational complexity of BCA. The comparative study and statistical test analysis demonstrate that BCA offers very competitive and statistically significant results compared to other eminent meta-heuristic algorithms. Experimental results also show the consistent performance of BCA in high dimensional search spaces. Furthermore, we demonstrate the applicability of BCA on real-world applications by solving several real-life engineering problems.

Download Full-text

Data Clustering Using Sine Cosine Algorithm

Handbook of Research on Machine Learning Innovations and Trends - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-2229-4.ch031 ◽

2017 ◽

pp. 715-726 ◽

Cited By ~ 11

Author(s):

Vijay Kumar ◽

Dinesh Kumar

Keyword(s):

Real Life ◽

Quality Measures ◽

Search Space ◽

Search Method ◽

Local Optima ◽

Encoding Scheme ◽

Clustering Techniques ◽

Sine Cosine Algorithm ◽

Cluster Quality ◽

Optimal Cluster

The clustering techniques suffer from cluster centers initialization and local optima problems. In this chapter, the new metaheuristic algorithm, Sine Cosine Algorithm (SCA), is used as a search method to solve these problems. The SCA explores the search space of given dataset to find out the near-optimal cluster centers. The center based encoding scheme is used to evolve the cluster centers. The proposed SCA-based clustering technique is evaluated on four real-life datasets. The performance of SCA-based clustering is compared with recently developed clustering techniques. The experimental results reveal that SCA-based clustering gives better values in terms of cluster quality measures.

Download Full-text