Natural language query handling using extended knowledge provider system

Author(s):  
Prasenjit Mukherjee ◽  
Atanu Chattopadhyay ◽  
Baisakhi Chakraborty ◽  
Debashis Nandi

Extraction of knowledge data from knowledge database using natural language query is a difficult task. Different types of natural language processing (NLP) techniques have been developed to handle this knowledge data extraction task. This paper proposes an automated query-response model termed Extended Automated Knowledge Provider System (EAKPS) that can manage various types of natural language queries from user. The EAKPS uses combination based technique and it can handle assertive, interrogative, imperative, compound and complex type query sentences. The algorithm of EAKPS generates structure query language (SQL) for each natural language query to extract knowledge data from the knowledge database resident within the EAKPS. Extraction of noun or noun phrases is another issue in natural language query processing. Most of the times, determiner, preposition and conjunction are prefixed to a noun or noun phrase and it is difficult to identify the noun/noun phrase with prefix during query processing. The proposed system is able to identify these prefixes and extract exact noun or noun phrases from natural language queries without any manual intervention.

Linguistics ◽  
2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Chase Wesley Raymond ◽  
Rebecca Clift ◽  
John Heritage

Abstract In this article, we investigate a puzzle for standard accounts of reference in natural language processing, psycholinguistics and pragmatics: occasions where, following an initial reference (e.g., the ice), a subsequent reference is achieved using the same noun phrase (i.e., the ice), as opposed to an anaphoric form (i.e., it). We argue that such non-anaphoric reference can be understood as motivated by a central principle: the expression of agency in interaction. In developing this claim, we draw upon research in what may initially appear a wholly unconnected domain: the marking of epistemic and deontic stance, standardly investigated in linguistics as turn-level grammatical phenomena. Examination of naturally-occurring talk reveals that to analyze such stances solely though the lens of turn-level resources (e.g., modals) is to address only partially the means by which participants make epistemic and deontic claims in everyday discourse. Speakers’ use of referential expressions illustrates a normative dimension of grammar that incorporates both form and position, thereby affording speakers the ability to actively depart from this form-position norm through the use of a repeated NP, a grammatical practice that we show is associated with the expression of epistemic and deontic authority. It is argued that interactants can thus be seen to be agentively mobilizing the resources of grammar to accommodate the inescapable temporality of interaction.


Author(s):  
Sumathi S. ◽  
Rajkumar S. ◽  
Indumathi S.

Lease abstraction is the method of compartmentalization of key data from a lease document. Lease document for a property contains key business, money, and legal data about a property. A lease abstract report contains details concerning the property location and basic lease details, price schedules, key events, terms and conditions, automobile parking arrangements, and landowner and tenant obligations. Abstracting a true estate contract into electronic type facilitates easy access to key data, exchanging the tedious method of reading the whole contents of the contract every time. Language process may be used for data extraction and abstraction of knowledge from lease documents.


2018 ◽  
Vol 2018 ◽  
pp. 1-7 ◽  
Author(s):  
J. Bouaziz ◽  
R. Mashiach ◽  
S. Cohen ◽  
A. Kedem ◽  
A. Baron ◽  
...  

Endometriosis is a disease characterized by the development of endometrial tissue outside the uterus, but its cause remains largely unknown. Numerous genes have been studied and proposed to help explain its pathogenesis. However, the large number of these candidate genes has made functional validation through experimental methodologies nearly impossible. Computational methods could provide a useful alternative for prioritizing those most likely to be susceptibility genes. Using artificial intelligence applied to text mining, this study analyzed the genes involved in the pathogenesis, development, and progression of endometriosis. The data extraction by text mining of the endometriosis-related genes in the PubMed database was based on natural language processing, and the data were filtered to remove false positives. Using data from the text mining and gene network information as input for the web-based tool, 15,207 endometriosis-related genes were ranked according to their score in the database. Characterization of the filtered gene set through gene ontology, pathway, and network analysis provided information about the numerous mechanisms hypothesized to be responsible for the establishment of ectopic endometrial tissue, as well as the migration, implantation, survival, and proliferation of ectopic endometrial cells. Finally, the human genome was scanned through various databases using filtered genes as a seed to determine novel genes that might also be involved in the pathogenesis of endometriosis but which have not yet been characterized. These genes could be promising candidates to serve as useful diagnostic biomarkers and therapeutic targets in the management of endometriosis.


1985 ◽  
Vol 1 (1) ◽  
pp. 3-15 ◽  
Author(s):  
D.E. Suranjan ◽  
P.A.N. Shuh-Shen ◽  
Andrew B. Whinston

2001 ◽  
Vol 23 ◽  
pp. 85-101
Author(s):  
Donka F. Farkas

This paper is concerned with semantic noun phrase typology, focusing on the question of how to draw fine-grained distinctions necessary for an accurate account of natural language phenomena. In the extensive literature on this topic, the most commonly encountered parameters of classification concern the semantic type of the denotation of the noun phrase, the familiarity or novelty of its referent, the quantificational/nonquantificational distinction (connected to the weak/strong dichotomy), as well as, more recently, the question of whether the noun phrase is choice-functional or not (see Reinhart 1997, Winter 1997, Kratzer 1998, Matthewson 1999). In the discussion that follows I will attempt to make the following general points: (i) phenomena involving the behavior of noun phrases both within and across languages point to the need of establishing further distinctions that are too fine-grained to be caught in the net of these typologies; (ii) some of the relevant distinctions can be captured in terms of conditions on assignment functions; (iii) distribution and scopal peculiarities of noun phrases may result from constraints they impose on the way variables they introduce are to be assigned values. Section 2 reviews the typology of definite noun phrases introduced in Farkas 2000 and the way it provides support for the general points above. Section 3 examines some of the problems raised by recognizing the rich variety of 'indefinite' noun phrases found in natural language and by attempting to capture their distribution and interpretation. Common to the typologies discussed in the two sections is the issue of marking different types of variation in the interpretation of a noun phrase. In the light of this discussion, specificity turns out to be an epiphenomenon connected to a family of distinctions that are marked differently in different languages.  


Author(s):  
Jisheng Liang ◽  
Thien Nguyen ◽  
Krzysztof Koperski ◽  
Giovanni Marchisio

Blood ◽  
2014 ◽  
Vol 124 (21) ◽  
pp. 1267-1267
Author(s):  
Hanahlyn M Park ◽  
Vicky Sandhu ◽  
Paul Fearn ◽  
Kathleen Shannon Dorcy ◽  
Elihu H. Estey ◽  
...  

Abstract Background Clinical research in AML relies heavily on databases. Typically patients’ medical records are examined manually with relevant data entered into the database. This process is slow and subject to human error. Natural Language Processing (NLP) is a technique used to build and train computer algorithms to automatically extract structured data elements from unstructured text. Although, we have not used NLP extensively, the potential exists to use NLP as a tool to ease the time and resource-intensive burden of manual data abstraction. Since many of the data elements needed for clinical and translational research, quality metrics, and operations and analytics, are the same throughout FHCRC/UW/SCCA, the biomedical informatics group at FHCRC has decided to invest in the creation of an enterprise wide NLP pipeline to improve the efficiency and quality of data extraction for researchers, clinicians, and administrators throughout FHCRC/UW/SCCA. Purpose For this pilot project, an NLP system was trained and tested against a manually curated dataset to determine whether chemotherapeutic regimens were administered within 30 days prior to death in AML patients. The first part of this project was to train the NLP system with a small sample of patients in order to build in rules and logic about how to find both a patient’s date of death and the evidence of a completed chemotherapeutic agent. The second phase was to test the algorithm with unseen data from another set of patients and determine the system’s overall performance in finding the patient’s date of death and determining if they received chemotherapy within the preceding 30 days. Methods Inclusion criteria were the following: AML patients who came to FHCRC/SCCA/UWMC between 1/1/2010 to 12/31/2012, whose age ≥ 18 years, and who received chemotherapeutic agents within 30 days of death. Total sample size was 54 patients. Training sample was 24 patients and testing sample was 30 patients. In order to see the accuracy of the trained NLP system, manual and automatic extraction of data sets were compared. The performance of the system was evaluated in two ways: predicted value of a retrieved NLM identification (the number of correctly retrieved results out of all retrieved results) and sensitivity (the number of correctly retrieved results out of all possible correct results in the gold standard training and testing data). These two metrics will help determine if NLP can be a useful data extraction aid in order to expedite real time access to data analysis for improvement in outcomes for AML patients. Results For the training sample, the predictive value of a retrieved result by NLM of finding both the date of death and chemotherapeutic agents was 100%. The sensitivity of both date of death and chemotherapeutic agents was 92% in training sample. For the testing sample, the predicted value of a NLM identification was for finding date of death and chemotherapeutic agents was 96% while sensitivity of both date of death and chemotherapeutic agents was 73%. Limitations Sensitivity, in both training and testing populations, is primarily affected because of the ubiquitous problem of not having a concrete record of many patients’ death. Often patients go back to local facilities for continuing care and are lost to follow-up. The precision of finding date of death in the testing sample was affected by one date of death that was pulled incorrectly from a clinic note due to an error in the NLP algorithm. The recall of finding chemotherapeutic agents in the testing sample was affected by the lack of recognition of a chemotherapeutic trial name that had not appeared in the training sample. Conclusion The results of this pilot give us a preliminary idea of the feasibility of the NLP algorithm to perform in the future. Although the trained NLP tool only recalled 70-80% of the two data elements (date of death, chemotherapeutic agents), this was primarily due to the absence of certain data elements in the electronic health record and the precision of the defined date elements was nearly perfect. With the given results, we conclude that NLP can be a useful tool for data extraction purposes which will potentially maximize the ability of the leukemia service to have earlier access to data relative to symptom management and disease response which will influence the development of new clinical pathways for the optimizing of care and possible improvement in outcomes for AML patients. Disclosures No relevant conflicts of interest to declare.


Author(s):  
Jesse Thomason ◽  
Raymond J. Mooney

A word in natural language can be polysemous, having multiple meanings, as well as synonymous, meaning the same thing as other words. Word sense induction attempts to find the senses of polysemous words. Synonymy detection attempts to find when two words are interchangeable. We combine these tasks, first inducing word senses and then detecting similar senses to form word-sense synonym sets (synsets) in an unsupervised fashion. Given pairs of images and text with noun phrase labels, we perform synset induction to produce collections of underlying concepts described by one or more noun phrases. We find that considering multi-modal features from both visual and textual context yields better induced synsets than using either context alone. Human evaluations show that our unsupervised, multi-modally induced synsets are comparable in quality to annotation-assisted ImageNet synsets, achieving about 84% of ImageNet synsets' approval.


Sign in / Sign up

Export Citation Format

Share Document