scholarly journals IDL-Expressions: A Formalism for Representing and Parsing Finite Languages in Natural Language Processing

2004 ◽  
Vol 21 ◽  
pp. 287-317 ◽  
Author(s):  
M. J. Nederhof ◽  
G. Satta

We propose a formalism for representation of finite languages, referred to as the class of IDL-expressions, which combines concepts that were only considered in isolation in existing formalisms. The suggested applications are in natural language processing, more specifically in surface natural language generation and in machine translation, where a sentence is obtained by first generating a large set of candidate sentences, represented in a compact way, and then by filtering such a set through a parser. We study several formal properties of IDL-expressions and compare this new formalism with more standard ones. We also present a novel parsing algorithm for IDL-expressions and prove a non-trivial upper bound on its time complexity.

Author(s):  
Andrew M. Olney ◽  
Natalie K. Person ◽  
Arthur C. Graesser

The authors discuss Guru, a conversational expert ITS. Guru is designed to mimic expert human tutors using advanced applied natural language processing techniques including natural language understanding, knowledge representation, and natural language generation.


Author(s):  
Pankaj Kailas Bhole ◽  
A. J. Agrawal

Text  summarization is  an  old challenge  in  text  mining  but  in  dire  need  of researcher’s attention in the areas of computational intelligence, machine learning  and  natural  language  processing. We extract a set of features from each sentence that helps identify its importance in the document. Every time reading full text is time consuming. Clustering approach is useful to decide which type of data present in document. In this paper we introduce the concept of k-mean clustering for natural language processing of text for word matching and in order to extract meaningful information from large set of offline documents, data mining document clustering algorithm are adopted.


2016 ◽  
Vol 22 (1) ◽  
pp. 23-42 ◽  
Author(s):  
Michael J. Jensen

This paper develops a way for analyzing the structure of campaign communications within Twitter. The structure of communication affordances creates opportunities for a horizontal organization power within Twitter interactions. However, one cannot infer the structure of interactions as they materialize from the formal properties of the technical environment in which the communications occur. Consequently, the paper identifies three categories of empowering communication operations that can occur on Twitter: Campaigns can respond to others, campaigns can retweet others, and campaigns can call for others to become involved in the campaign on their own terms. The paper operationalizes these categories in the context of the 2015 U.K. general election. To determine whether Twitter is used to empower laypersons, the profiles of each account retweeted and replied to were retrieved and analyzed using natural language processing to identify whether an account is from a political figure, member of the media, or some other public figure. In addition, tweets and retweets are compared with respect to the manner key election issues are discussed. The findings indicate that empowering uses of Twitter are fairly marginal, and retweets use almost identical policy language as the original campaign tweets.


2020 ◽  
Author(s):  
Michael Prendergast

Abstract – A Verification Cross-Reference Matrix (VCRM) is a table that depicts the verification methods for requirements in a specification. Usually requirement labels are rows, available test methods are columns, and an “X” in a cell indicates usage of a verification method for that requirement. Verification methods include Demonstration, Inspection, Analysis and Test, and sometimes Certification, Similarity and/or Analogy. VCRMs enable acquirers and stakeholders to quickly understand how a product’s requirements will be tested.Maintaining consistency of very large VCRMs can be challenging, and inconsistent verification methods can result in a large set of uncoordinated “spaghetti tests”. Natural language processing algorithms that can identify similarities between requirements offer promise in addressing this challenge.This paper applies and compares compares four natural language processing algorithms to the problem of automatically populating VCRMs from natural language requirements: Naïve Bayesian inference, (b) Nearest Neighbor by weighted Dice similarity, (c) Nearest Neighbor with Latent Semantic Analysis similarity, and (d) an ensemble method combining the first three approaches. The VCRMs used for this study are for slot machine technical requirements derived from gaming regulations from the countries of Australia and New Zealand, the province of Nova Scotia (Canada), the state of Michigan (United States) and recommendations from the International Association of Gaming Regulators (IAGR).


2011 ◽  
Vol 4 (3) ◽  
Author(s):  
Treveur Bretaudière ◽  
Samuel Cruz-Lara ◽  
Lina María Rojas Barahona

We present our current research activities associating automatic natural language processing to serious games and virtual worlds. Several interesting scenarios have been developed: language learning, natural language generation, multilingual information, emotion detection, real-time translations, and non-intrusive access to linguistic information such as definitions or synonyms. Part of our work has contributed to the specification of the Multi Lingual Information Framework [ISO FDIS 24616], (MLIF,2011). Standardization will grant stability,  interoperability and sustainability of an important part of our research activities, in particular, in the framework of representing and managing multilingual textual information.


2013 ◽  
Vol 846-847 ◽  
pp. 1239-1242
Author(s):  
Yang Yang ◽  
Hui Zhang ◽  
Yong Qi Wang

This paper presents our recent work towards the development of a voice calculator based on speech error correction and natural language processing. The calculator enhances the accuracy of speech recognition by classifying and summarizing recognition errors on numerical calculation speech recognition area, then constructing Pinyin-text-mapping library and replacement rules, and combing priority correction mechanism and memory correction mechanism of Pinyin-text-mapping. For the expression after correctly recognizing, the calculator uses recursive-descent parsing algorithm and synthesized attribute computing algorithm to calculate the final result and output the result using TTS engine. The implementation of this voice calculator makes a calculator more humane and intelligent.


2021 ◽  
pp. 219256822110269
Author(s):  
Fabio Galbusera ◽  
Andrea Cina ◽  
Tito Bassani ◽  
Matteo Panico ◽  
Luca Maria Sconfienza

Study Design: Retrospective study. Objectives: Huge amounts of images and medical reports are being generated in radiology departments. While these datasets can potentially be employed to train artificial intelligence tools to detect findings on radiological images, the unstructured nature of the reports limits the accessibility of information. In this study, we tested if natural language processing (NLP) can be useful to generate training data for deep learning models analyzing planar radiographs of the lumbar spine. Methods: NLP classifiers based on the Bidirectional Encoder Representations from Transformers (BERT) model able to extract structured information from radiological reports were developed and used to generate annotations for a large set of radiographic images of the lumbar spine (N = 10 287). Deep learning (ResNet-18) models aimed at detecting radiological findings directly from the images were then trained and tested on a set of 204 human-annotated images. Results: The NLP models had accuracies between 0.88 and 0.98 and specificities between 0.84 and 0.99; 7 out of 12 radiological findings had sensitivity >0.90. The ResNet-18 models showed performances dependent on the specific radiological findings with sensitivities and specificities between 0.53 and 0.93. Conclusions: NLP generates valuable data to train deep learning models able to detect radiological findings in spine images. Despite the noisy nature of reports and NLP predictions, this approach effectively mitigates the difficulties associated with the manual annotation of large quantities of data and opens the way to the era of big data for artificial intelligence in musculoskeletal radiology.


Author(s):  
Hima Yeldo

Abstract: Natural Language Processing is the study that focuses the interplay between computer and the human languages NLP has spread its applications in various fields such as an email Spam detection, machine translation, summation, information extraction, and question answering etc. Natural Language Processing classifies two parts i.e. Natural Language Generation and Natural Language understanding which evolves the task to generate and understand the text.


2016 ◽  
Vol 8 ◽  
pp. BII.S38916 ◽  
Author(s):  
Yuan Luo ◽  
Peter Szolovits

In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen's interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen's relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions.


Sign in / Sign up

Export Citation Format

Share Document