scholarly journals Semantic NLP Technologies in Information Retrieval Systems for Legal Research

Companies involved in providing legal research services to lawyers, such as LexisNexis or Westlaw, have rapidly incorporated natural language processing (NLP) into their database systems to deal with the massive amounts of legal texts contained within them. These NLP techniques, which perform analysis on natural language texts by taking advantage of methods developed in the fields of computational linguistics and artificial intelligence, have potential applications ranging from text summarization all the way to the prediction of court judgments. However, a potential concern with the use of this technology is that professionals will come to depend on systems, over which they have little control or understanding, as a source of knowledge. While recent strides in AI and deep learning have led to increased effectiveness in NLP techniques, the decision-making processes of these algorithms have progressively become less intuitive for humans to understand. Concerns about the interpretability of patented legal services such as LexisNexis are more pertinent than ever. The following survey conducted for current NLP techniques shows that one potential avenue to make algorithms in NLP more explainable is to incorporate symbol-based methods that take advantage of knowledge models generated for specific domains. An example of this can be seen in NLP techniques developed to facilitate the retrieval of inventive information from patent applications.

Author(s):  
Mans Hulden

Finite-state machines—automata and transducers—are ubiquitous in natural-language processing and computational linguistics. This chapter introduces the fundamentals of finite-state automata and transducers, both probabilistic and non-probabilistic, illustrating the technology with example applications and common usage. It also covers the construction of transducers, which correspond to regular relations, and automata, which correspond to regular languages. The technologies introduced are widely employed in natural language processing, computational phonology and morphology in particular, and this is illustrated through common practical use cases.


1990 ◽  
Vol 17 (1) ◽  
pp. 21-29
Author(s):  
C. Korycinski ◽  
Alan F. Newell

The task of producing satisfactory indexes by automatic means has been tackled on two fronts: by statistical analysis of text and by attempting content analysis of the text in much the same way as a human indexcr does. Though statistical techniques have a lot to offer for free-text database systems, neither method has had much success with back-of-the-bopk indexing. This review examines some problems associated with the application of natural-language processing techniques to book texts.


Author(s):  
Ayush Srivastav ◽  
Hera Khan ◽  
Amit Kumar Mishra

The chapter provides an eloquent account of the major methodologies and advances in the field of Natural Language Processing. The most popular models that have been used over time for the task of Natural Language Processing have been discussed along with their applications in their specific tasks. The chapter begins with the fundamental concepts of regex and tokenization. It provides an insight to text preprocessing and its methodologies such as Stemming and Lemmatization, Stop Word Removal, followed by Part-of-Speech tagging and Named Entity Recognition. Further, this chapter elaborates the concept of Word Embedding, its various types, and some common frameworks such as word2vec, GloVe, and fastText. A brief description of classification algorithms used in Natural Language Processing is provided next, followed by Neural Networks and its advanced forms such as Recursive Neural Networks and Seq2seq models that are used in Computational Linguistics. A brief description of chatbots and Memory Networks concludes the chapter.


1996 ◽  
Vol 16 ◽  
pp. 70-85 ◽  
Author(s):  
Thomas C. Rindflesch

Work in computational linguistics began very soon after the development of the first computers (Booth, Brandwood and Cleave 1958), yet in the intervening four decades there has been a pervasive feeling that progress in computer understanding of natural language has not been commensurate with progress in other computer applications. Recently, a number of prominent researchers in natural language processing met to assess the state of the discipline and discuss future directions (Bates and Weischedel 1993). The consensus of this meeting was that increased attention to large amounts of lexical and domain knowledge was essential for significant progress, and current research efforts in the field reflect this point of view.


2018 ◽  
Vol 11 (3) ◽  
pp. 1-25
Author(s):  
Leonel Figueiredo de Alencar ◽  
Bruno Cuconato ◽  
Alexandre Rademaker

ABSTRACT: One of the prerequisites for many natural language processing technologies is the availability of large lexical resources. This paper reports on MorphoBr, an ongoing project aiming at building a comprehensive full-form lexicon for morphological analysis of Portuguese. A first version of the resource is already freely available online under an open source, free software license. MorphoBr combines analogous free resources, correcting several thousand errors and gaps, and systematically adding new entries. In comparison to the integrated resources, lexical entries in MorphoBr follow a more user-friendly format, which can be straightforwardly compiled into finite-state transducers for morphological analysis, e.g. in the context of syntactic parsing with a grammar in the LFG formalism using the XLE system. MorphoBr results from a combination of computational techniques. Errors and the more obvious gaps in the integrated resources were automatically corrected with scripts. However, MorphoBr's main contribution is the expansion in the inventory of nouns and adjectives. This was carried out by systematically modeling diminutive formation in the paradigm of finite-state morphology. This allowed MorphoBr to significantly outperform analogous resources in the coverage of diminutives. The first evaluation results show MorphoBr to be a promising initiative which will directly contribute to the development of more robust natural language processing tools and applications which depend on wide-coverage morphological analysis.KEYWORDS: computational linguistics; natural language processing; morphological analysis; full-form lexicon; diminutive formation. RESUMO: Um dos pré-requisitos para muitas tecnologias de processamento de linguagem natural é a disponibilidade de vastos recursos lexicais. Este artigo trata do MorphoBr, um projeto em desenvolvimento voltado para a construção de um léxico de formas plenas abrangente para a análise morfológica do português. Uma primeira versão do recurso já está disponível gratuitamente on-line sob uma licença de software livre e de código aberto. MorphoBr combina recursos livres análogos, corrigindo vários milhares de erros e lacunas. Em comparação com os recursos integrados, as entradas lexicais do MorphoBr seguem um formato mais amigável, o qual pode ser compilado diretamente em transdutores de estados finitos para análise morfológica, por exemplo, no contexto do parsing sintático com uma gramática no formalismo da LFG usando o sistema XLE. MorphoBr resulta de uma combinação de técnicas computacionais. Erros e lacunas mais óbvias nos recursos integrados foram automaticamente corrigidos com scripts. No entanto, a principal contribuição de MorphoBr é a expansão no inventário de substantivos e adjetivos. Isso foi alcançado pela modelação sistemática da formação de diminutivos no paradigma da morfologia de estados finitos. Isso possibilitou a MorphoBr superar de forma significativa recursos análogos na cobertura de diminutivos. Os primeiros resultados de avaliação mostram que o MorphoBr constitui uma iniciativa promissora que contribuirá de forma direta para conferir robustez a ferramentas e aplicações de processamento de linguagem natural que dependem de análise morfológica de ampla cobertura.PALAVRAS-CHAVE: linguística computacional; processamento de linguagem natural; análise morfológica; léxico de formas plenas; formação de diminutivos.


2018 ◽  
Author(s):  
Massimo Stella

This technical report outlines the mechanisms and potential applications of SentiMental, a suite of natural language processing algorithm designed and implemented by Massimo Stella, Complex Science Consulting. The following technical report briefly outlines the novel approach of SentiMental in performing sentiment and emotional analysis by directly harnessing the whole structure of the mental lexicon rather than by using affect norms. Furthermore, this technical report outlines the direct emotional profiling and the visualisations currently implemented in version 0.1 of SentiMental. Features under development and current limitations are also outlined and discussed.This technical report is not meant as a publication. The author holds full copyright and any reproduction of parts of this report must be authorised by the copyright holder. SentiMental represents a work in progress, so do not hesitate to get in touch with the author for any potential feedback.


2021 ◽  
Vol 7 ◽  
pp. e508
Author(s):  
Sara Renjit ◽  
Sumam Idicula

Natural language inference (NLI) is an essential subtask in many natural language processing applications. It is a directional relationship from premise to hypothesis. A pair of texts is defined as entailed if a text infers its meaning from the other text. The NLI is also known as textual entailment recognition, and it recognizes entailed and contradictory sentences in various NLP systems like Question Answering, Summarization and Information retrieval systems. This paper describes the NLI problem attempted for a low resource Indian language Malayalam, the regional language of Kerala. More than 30 million people speak this language. The paper is about the Malayalam NLI dataset, named MaNLI dataset, and its application of NLI in Malayalam language using different models, namely Doc2Vec (paragraph vector), fastText, BERT (Bidirectional Encoder Representation from Transformers), and LASER (Language Agnostic Sentence Representation). Our work attempts NLI in two ways, as binary classification and as multiclass classification. For both the classifications, LASER outperformed the other techniques. For multiclass classification, NLI using LASER based sentence embedding technique outperformed the other techniques by a significant margin of 12% accuracy. There was also an accuracy improvement of 9% for LASER based NLI system for binary classification over the other techniques.


Sign in / Sign up

Export Citation Format

Share Document