scholarly journals Clustering by Similarity of Brazilian Legal Documents Using Natural Language Processing Approaches

2021 ◽  
Author(s):  
Raphael Souza de Oliveira ◽  
Erick Giovani Sperandio Nascimento

The Brazilian legal system postulates the expeditious resolution of judicial proceedings. However, legal courts are working under budgetary constraints and with reduced staff. As a way to face these restrictions, artificial intelligence (AI) has been tackling many complex problems in natural language processing (NLP). This work aims to detect the degree of similarity between judicial documents that can be achieved in the inference group using unsupervised learning, by applying three NLP techniques, namely term frequency-inverse document frequency (TF-IDF), Word2Vec CBoW, and Word2Vec Skip-gram, the last two being specialized with a Brazilian language corpus. We developed a template for grouping lawsuits, which is calculated based on the cosine distance between the elements of the group to its centroid. The Ordinary Appeal was chosen as a reference file since it triggers legal proceedings to follow to the higher court and because of the existence of a relevant contingent of lawsuits awaiting judgment. After the data-processing steps, documents had their content transformed into a vector representation, using the three NLP techniques. We notice that specialized word-embedding models—like Word2Vec—present better performance, making it possible to advance in the current state of the art in the area of NLP applied to the legal sector.

Author(s):  
Gourav Sharma

In this paper, we proposed an Automated Brain Tumor Prediction System which predicts Brain Tumor through symptoms in several diseases using Natural Language Processing (NLP). Term Frequency Inverse Document Frequency (TF-IDF) is used for calculating term weighting of terms on different disease’s symptoms. Cosine Similarity Measure and Euclidean Distance are used for calculating angular and linear distance respectively between diseases and symptoms for getting ranking of the Brain Tumor in the ranked diseases. A novel mathematical strategy is used here for predicting chance of Brain Tumor through symptoms in several diseases. According to the proposed novel mathematical strategy, the chance of the Brain Tumor is proportional to the obtained similarity value of the Brain Tumor when symptoms are queried and inversely proportional to the rank of the Brain Tumor in several diseases and the maximum similarity value of the Brain Tumor, where all symptoms of Brain Tumor are present.


2020 ◽  
Vol 9 (05) ◽  
pp. 25039-25046 ◽  
Author(s):  
Rahul C Kore ◽  
Prachi Ray ◽  
Priyanka Lade ◽  
Amit Nerurkar

Reading legal documents are tedious and sometimes it requires domain knowledge related to that document. It is hard to read the full legal document without missing the key important sentences. With increasing number of legal documents it would be convenient to get the essential information from the document without having to go through the whole document. The purpose of this study is to understand a large legal document within a short duration of time. Summarization gives flexibility and convenience to the reader. Using vector representation of words, text ranking algorithms, similarity techniques, this study gives a way to produce the highest ranked sentences. Summarization produces the result in such a way that it covers the most vital information of the document in a concise manner. The paper proposes how the different natural language processing concepts can be used to produce the desired result and give readers the relief from going through the whole complex document. This study definitively presents the steps that are required to achieve the aim and elaborates all the algorithms used at each and every step in the process.


2019 ◽  
Vol 7 (1) ◽  
pp. 1831-1840
Author(s):  
Bern Jonathan ◽  
Jay Idoan Sihotang ◽  
Stanley Martin

Introduction: Natural Language Processing is one part of Artificial Intelligence and Machine Learning to make an understanding of the interactions between computers and human (natural) languages. Sentiment analysis is one part of Natural Language Processing, that often used to analyze words based on the patterns of people in writing to find positive, negative, or neutral sentiments. Sentiment analysis is useful for knowing how users like something or not. Zomato is an application for rating restaurants. The rating has a review of the restaurant which can be used for sentiment analysis. Based on this, writers want to discuss the sentiment of the review to be predicted. Method: The method used for preprocessing the review is to make all words lowercase, tokenization, remove numbers and punctuation, stop words, and lemmatization. Then after that, we create word to vector with the term frequency-inverse document frequency (TF-IDF). The data that we process are 150,000 reviews. After that make positive with reviews that have a rating of 3 and above, negative with reviews that have a rating of 3 and below, and neutral who have a rating of 3. The author uses Split Test, 80% Data Training and 20% Data Testing. The metrics used to determine random forest classifiers are precision, recall, and accuracy. The accuracy of this research is 92%. Result: The precision of positive, negative, and neutral sentiment is 92%, 93%, 96%. The recall of positive, negative, and neutral sentiment are 99%, 89%, 73%. Average precision and recall are 93% and 87%. The 10 words that affect the results are: “bad”, “good”, “average”, “best”, “place”, “love”, “order”, “food”, “try”, and “nice”.


1990 ◽  
Vol 5 (4) ◽  
pp. 225-249 ◽  
Author(s):  
Ann Copestake ◽  
Karen Sparck Jones

AbstractThis paper reviews the current state of the art in natural language access to databases. This has been a long-standing area of work in natural language processing. But though some commercial systems are now available, providing front ends has proved much harder than was expected, and the necessary limitations on front ends have to be recognized. The paper discusses the issues, both general to language and task-specific, involved in front end design, and the way these have been addressed, concentrating on the work of the last decade. The focus is on the central process of translating a natural language question into a database query, but other supporting functions are also covered. The points are illustrated by the use of a single example application. The paper concludes with an evaluation of the current state, indicating that future progress will depend on the one hand on general advances in natural language processing, and on the other on expanding the capabilities of traditional databases.


2021 ◽  
pp. 1-20
Author(s):  
Aakanksha Singhal ◽  
Sharma D.K.

Artificial Intelligence and Machine Learning's component to recognise communication between machines and individual (real) languages is a natural language processing. Natural Language Processing is the identification of emotion mostly used to interpret terms to provide strong, slightly bad emotions due to people's reading patterns. Shannon's entropy helps me know whether or not people more like Zomato is a ranking program for restaurants. The assessment involves a restaurant review that can be used for entropy assessment. On this basis, the authors want to respond to the expected view of the analysis. The method used to pre-process the research is to minimise all terms, monitor access, remove quantities, sentence structure, stop words and compile. The latent semantic document frequency (TF-IDF) is then constructed from word to vector. The data we are gathering is 1,50,000 reviews. Great responses are rated 3 and above, poor comments are rated 3 and below, glowing reviews are rated 3 and above. The author uses split Evaluation, 80% full and 20% Data Screening. Accuracy, recall and precision is the criteria used to evaluate random forest classifiers. The reliability of this analysis is 92 percent. 92 %, 93 %, 96 % is the consistency of each selection's thoughts and feelings. 99%, 89%, 73% are a reminder of positive, pessimistic and constructive views. 93 % and 87 % are the average accuracy and recall. "Poor", "great", "fair", "better", "location", "care", "request", "food", "seek" and "pleasant" are the 10 terms that influence the results.


2019 ◽  
Author(s):  
Matthew J. Lavin

This lesson focuses on a foundational natural language processing and information retrieval method called Term Frequency - Inverse Document Frequency (tf-idf). This lesson explores the foundations of tf-idf, and will also introduce you to some of the questions and concepts of computationally oriented text analysis.


2019 ◽  
Vol 12 (2) ◽  
pp. 181-197 ◽  
Author(s):  
Walter Kehl ◽  
Mike Jackson ◽  
Alessandro Fergnani

Because the input for Futures Studies is to a very high degree formulated as written words and texts, methods which automate the processing of texts can substantially help Futures Studies. At Shaping Tomorrow, we have developed a software system using Natural Language Processing (NLP), a subfield of Artificial Intelligence, which automatically analyzes publicly available texts and extracts future-relevant data from theses texts. This process can be used to study the futures. This article discusses this software system, explains how it works with a detailed example, and shows real-life applications and visualizations of the resulting data. The current state of this method is just the first step; a number of technological improvements and their possible benefits are explained. The implications of using this software system for the field of Futures Studies are mostly positive, but there are also a number of caveats.


Sign in / Sign up

Export Citation Format

Share Document