Clustering by Similarity of Brazilian Legal Documents Using Natural Language Processing Approaches

Mapping Intimacies ◽

10.5772/intechopen.99875 ◽

2021 ◽

Author(s):

Raphael Souza de Oliveira ◽

Erick Giovani Sperandio Nascimento

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Legal Proceedings ◽

Legal Documents ◽

Current State ◽

Document Frequency ◽

Reference File ◽

Judicial Proceedings ◽

Cosine Distance

The Brazilian legal system postulates the expeditious resolution of judicial proceedings. However, legal courts are working under budgetary constraints and with reduced staff. As a way to face these restrictions, artificial intelligence (AI) has been tackling many complex problems in natural language processing (NLP). This work aims to detect the degree of similarity between judicial documents that can be achieved in the inference group using unsupervised learning, by applying three NLP techniques, namely term frequency-inverse document frequency (TF-IDF), Word2Vec CBoW, and Word2Vec Skip-gram, the last two being specialized with a Brazilian language corpus. We developed a template for grouping lawsuits, which is calculated based on the cosine distance between the elements of the group to its centroid. The Ordinary Appeal was chosen as a reference file since it triggers legal proceedings to follow to the higher court and because of the existence of a relevant contingent of lawsuits awaiting judgment. After the data-processing steps, documents had their content transformed into a vector representation, using the three NLP techniques. We notice that specialized word-embedding models—like Word2Vec—present better performance, making it possible to advance in the current state of the art in the area of NLP applied to the legal sector.

Download Full-text

Understanding Legal Documents: Classification of Rhetorical Role of Sentences Using Deep Learning and Natural Language Processing

2020 IEEE 14th International Conference on Semantic Computing (ICSC) ◽

10.1109/icsc.2020.00089 ◽

2020 ◽

Author(s):

Syed Rameel Ahmad ◽

Deborah Harris ◽

Ibrahim Sahibzada

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Legal Documents

Download Full-text

Automated Brain Tumor Prediction System using Natural Language Processing (NLP)

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37196 ◽

2021 ◽

Vol 9 (VII) ◽

pp. 3784-3787

Author(s):

Gourav Sharma

Keyword(s):

Brain Tumor ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Prediction System ◽

Term Weighting ◽

Linear Distance ◽

Document Frequency ◽

Maximum Similarity ◽

The Brain

In this paper, we proposed an Automated Brain Tumor Prediction System which predicts Brain Tumor through symptoms in several diseases using Natural Language Processing (NLP). Term Frequency Inverse Document Frequency (TF-IDF) is used for calculating term weighting of terms on different disease’s symptoms. Cosine Similarity Measure and Euclidean Distance are used for calculating angular and linear distance respectively between diseases and symptoms for getting ranking of the Brain Tumor in the ranked diseases. A novel mathematical strategy is used here for predicting chance of Brain Tumor through symptoms in several diseases. According to the proposed novel mathematical strategy, the chance of the Brain Tumor is proportional to the obtained similarity value of the Brain Tumor when symptoms are queried and inversely proportional to the rank of the Brain Tumor in several diseases and the maximum similarity value of the Brain Tumor, where all symptoms of Brain Tumor are present.

Download Full-text

Legal Document Summarization Using Nlp and Ml Techniques

International Journal Of Engineering And Computer Science ◽

10.18535/ijecs/v9i05.4488 ◽

2020 ◽

Vol 9 (05) ◽

pp. 25039-25046 ◽

Cited By ~ 1

Author(s):

Rahul C Kore ◽

Prachi Ray ◽

Priyanka Lade ◽

Amit Nerurkar

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Domain Knowledge ◽

Vector Representation ◽

Essential Information ◽

Legal Documents ◽

Ranking Algorithms ◽

Legal Document ◽

Text Ranking

Reading legal documents are tedious and sometimes it requires domain knowledge related to that document. It is hard to read the full legal document without missing the key important sentences. With increasing number of legal documents it would be convenient to get the essential information from the document without having to go through the whole document. The purpose of this study is to understand a large legal document within a short duration of time. Summarization gives flexibility and convenience to the reader. Using vector representation of words, text ranking algorithms, similarity techniques, this study gives a way to produce the highest ranked sentences. Summarization produces the result in such a way that it covers the most vital information of the document in a concise manner. The paper proposes how the different natural language processing concepts can be used to produce the desired result and give readers the relief from going through the whole complex document. This study definitively presents the steps that are required to achieve the aim and elaborates all the algorithms used at each and every step in the process.

Download Full-text

Sentiment analysis of customer reviews in zomato bangalore restaurants using random forest classifier

Abstract Proceedings International Scholars Conference ◽

10.35974/isc.v7i1.1003 ◽

2019 ◽

Vol 7 (1) ◽

pp. 1831-1840

Author(s):

Bern Jonathan ◽

Jay Idoan Sihotang ◽

Stanley Martin

Keyword(s):

Natural Language Processing ◽

Random Forest ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Natural Languages ◽

Inverse Document Frequency ◽

Customer Reviews ◽

Document Frequency ◽

Split Test

Introduction: Natural Language Processing is one part of Artificial Intelligence and Machine Learning to make an understanding of the interactions between computers and human (natural) languages. Sentiment analysis is one part of Natural Language Processing, that often used to analyze words based on the patterns of people in writing to find positive, negative, or neutral sentiments. Sentiment analysis is useful for knowing how users like something or not. Zomato is an application for rating restaurants. The rating has a review of the restaurant which can be used for sentiment analysis. Based on this, writers want to discuss the sentiment of the review to be predicted. Method: The method used for preprocessing the review is to make all words lowercase, tokenization, remove numbers and punctuation, stop words, and lemmatization. Then after that, we create word to vector with the term frequency-inverse document frequency (TF-IDF). The data that we process are 150,000 reviews. After that make positive with reviews that have a rating of 3 and above, negative with reviews that have a rating of 3 and below, and neutral who have a rating of 3. The author uses Split Test, 80% Data Training and 20% Data Testing. The metrics used to determine random forest classifiers are precision, recall, and accuracy. The accuracy of this research is 92%. Result: The precision of positive, negative, and neutral sentiment is 92%, 93%, 96%. The recall of positive, negative, and neutral sentiment are 99%, 89%, 73%. Average precision and recall are 93% and 87%. The 10 words that affect the results are: “bad”, “good”, “average”, “best”, “place”, “love”, “order”, “food”, “try”, and “nice”.

Download Full-text

Natural language interfaces to databases

The Knowledge Engineering Review ◽

10.1017/s0269888900005476 ◽

1990 ◽

Vol 5 (4) ◽

pp. 225-249 ◽

Cited By ~ 52

Author(s):

Ann Copestake ◽

Karen Sparck Jones

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Central Process ◽

Current State ◽

Natural Language Question ◽

The One ◽

Language Question ◽

And Task

AbstractThis paper reviews the current state of the art in natural language access to databases. This has been a long-standing area of work in natural language processing. But though some commercial systems are now available, providing front ends has proved much harder than was expected, and the necessary limitations on front ends have to be recognized. The paper discusses the issues, both general to language and task-specific, involved in front end design, and the way these have been addressed, concentrating on the work of the last decade. The focus is on the central process of translating a natural language question into a database query, but other supporting functions are also covered. The points are illustrated by the use of a single example application. The paper concludes with an evaluation of the current state, indicating that future progress will depend on the one hand on general advances in natural language processing, and on the other on expanding the capabilities of traditional databases.

Download Full-text

Entropy Analysis of a Dataset using Machine Learning Approach

10.46532/978-81-950008-7-6_001 ◽

2021 ◽

pp. 1-20

Author(s):

Aakanksha Singhal ◽

Sharma D.K.

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Sentence Structure ◽

Document Frequency ◽

Average Accuracy ◽

Machine Learning Approach ◽

Reading Patterns ◽

Screening Accuracy ◽

Data Screening

Artificial Intelligence and Machine Learning's component to recognise communication between machines and individual (real) languages is a natural language processing. Natural Language Processing is the identification of emotion mostly used to interpret terms to provide strong, slightly bad emotions due to people's reading patterns. Shannon's entropy helps me know whether or not people more like Zomato is a ranking program for restaurants. The assessment involves a restaurant review that can be used for entropy assessment. On this basis, the authors want to respond to the expected view of the analysis. The method used to pre-process the research is to minimise all terms, monitor access, remove quantities, sentence structure, stop words and compile. The latent semantic document frequency (TF-IDF) is then constructed from word to vector. The data we are gathering is 1,50,000 reviews. Great responses are rated 3 and above, poor comments are rated 3 and below, glowing reviews are rated 3 and above. The author uses split Evaluation, 80% full and 20% Data Screening. Accuracy, recall and precision is the criteria used to evaluate random forest classifiers. The reliability of this analysis is 92 percent. 92 %, 93 %, 96 % is the consistency of each selection's thoughts and feelings. 99%, 89%, 73% are a reminder of positive, pessimistic and constructive views. 93 % and 87 % are the average accuracy and recall. "Poor", "great", "fair", "better", "location", "care", "request", "food", "seek" and "pleasant" are the 10 terms that influence the results.

Download Full-text

Analyzing Documents with TF-IDF

The Programming Historian ◽

10.46430/phen0082 ◽

2019 ◽

Author(s):

Matthew J. Lavin

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Analysis ◽

Retrieval Method ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency

This lesson focuses on a foundational natural language processing and information retrieval method called Term Frequency - Inverse Document Frequency (tf-idf). This lesson explores the foundations of tf-idf, and will also introduce you to some of the questions and concepts of computationally oriented text analysis.

Download Full-text

A Systematic Literature Review of Natural Language Processing: Current State, Challenges and Risks

Proceedings of the Future Technologies Conference (FTC) 2020, Volume 1 - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-63128-4_49 ◽

2020 ◽

pp. 634-647

Author(s):

Eghbal Ghazizadeh ◽

Pengxiang Zhu

Keyword(s):

Natural Language Processing ◽

Literature Review ◽

Natural Language ◽

Language Processing ◽

Systematic Literature Review ◽

Current State

Download Full-text

Combining Natural Language Processing Approaches for Rule Extraction from Legal Documents

Lecture Notes in Computer Science - AI Approaches to the Complexity of Legal Systems ◽

10.1007/978-3-030-00178-0_19 ◽

2018 ◽

pp. 287-300 ◽

Cited By ~ 5

Author(s):

Mauro Dragoni ◽

Serena Villata ◽

Williams Rizzi ◽

Guido Governatori

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Rule Extraction ◽

Legal Documents

Download Full-text

Natural Language Processing and Futures Studies

World Futures Review ◽

10.1177/1946756719882414 ◽

2019 ◽

Vol 12 (2) ◽

pp. 181-197 ◽

Cited By ~ 1

Author(s):

Walter Kehl ◽

Mike Jackson ◽

Alessandro Fergnani

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Real Life ◽

Software System ◽

Futures Studies ◽

Current State ◽

Technological Improvements ◽

High Degree ◽

Very High

Because the input for Futures Studies is to a very high degree formulated as written words and texts, methods which automate the processing of texts can substantially help Futures Studies. At Shaping Tomorrow, we have developed a software system using Natural Language Processing (NLP), a subfield of Artificial Intelligence, which automatically analyzes publicly available texts and extracts future-relevant data from theses texts. This process can be used to study the futures. This article discusses this software system, explains how it works with a detailed example, and shows real-life applications and visualizations of the resulting data. The current state of this method is just the first step; a number of technological improvements and their possible benefits are explained. The implications of using this software system for the field of Futures Studies are mostly positive, but there are also a number of caveats.

Download Full-text