Design of Relation Extraction Framework to develop Knowledge Base

Author(s):  
Poonam Jatwani ◽  
Pradeep Tomar ◽  
Vandana Dhingra

Web documents display information in the form of natural language text which is not understandable by machines. To search specific information from sea of web documents has become very challenging as it shows many unwanted non relevant documents along with relevant documents. To retrieve relevant information semantic knowledge can be stored in the domain specific ontology which helps in understanding user’s need to retrieve relevant information. Intensive research has been going on in the field of text processing to develop ontologies using NLP technique. The proposed technique is another effort in this direction. In this method to extract syntactic structure we have used Stanford parser which complete tokenization of text, parsing as well as morphological analysis. Semantic rules are defined manually to identify valid concepts and relation among them. Once concepts, properties and relationship among concepts are identified, extracted information is visualized in the form of ontology.

1996 ◽  
Vol 05 (01n02) ◽  
pp. 229-253 ◽  
Author(s):  
JEFFREY L. GOLDBERG

The Category Discrimination Method (CDM) is a new machine learning algo rithm designed specifically for text categorization. The motivation is there are sta tistical problems associated with natural language text when it is applied as input to existing machine learning algorithms (too much noise, too many features, skewed distribution). The bases of the CDM are research results about the way that humans learn categories and concepts vis-à-vis contrasting concepts. The essential formula is cue validity borrowed from cognitive psychology, and used to select from all possible single word-based features the best predictors of a, given category. The, hypothesis that CDM’s performance. will exceed two non-domain specific al gorithms, Bayesian classification and decision tree learners, is empirically tested.


Author(s):  
Logeswari Shanmugam ◽  
Premalatha K.

Biomedical literature is the primary repository of biomedical knowledge in which PubMed is the most absolute database for collecting, organizing and analyzing textual knowledge. The high dimensionality of the natural language text makes the text data quite noisy and sparse in the vector space. Hence, the data preprocessing and feature selection are important processes for the text processing issues. Ontologies select the meaningful terms semantically associated with the concepts from a document to reduce the dimensionality of the original text. In this chapter, semantic-based indexing approaches are proposed with cognitive search which makes use of domain ontology to extract relevant information from big and diverse data sets for users.


2021 ◽  
Vol 13 (2) ◽  
pp. 85-109
Author(s):  
Abduladem Aljamel ◽  
Taha Osman ◽  
Dhavalkumar Thakker

The availability of online documents that describe domain-specific information provides an opportunity in employing a knowledge-based approach in extracting information from web data. This research proposes a novel comprehensive semantic knowledge-based framework that helps to transform unstructured data to be easily exploited by data scientists. The resultant sematic knowledgebase is reasoned to infer new facts and classify events that might be of importance to end users. The target use case for the framework implementation was the financial domain, which represents an important class of dynamic applications that require the modelling of non-binary relations. Such complex relations are becoming increasingly common in the era of linked open data. This research in modelling and reasoning upon such relations is a further contribution of the proposed semantic framework, where non-binary relations are semantically modelled by adapting the semantic reasoning axioms to fit the intermediate resources in the N-ary relations requirements.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Basant Agarwal ◽  
Namita Mittal ◽  
Pooja Bansal ◽  
Sonal Garg

Sentiment analysis research has been increasing tremendously in recent times due to the wide range of business and social applications. Sentiment analysis from unstructured natural language text has recently received considerable attention from the research community. In this paper, we propose a novel sentiment analysis model based on common-sense knowledge extracted from ConceptNet based ontology and context information. ConceptNet based ontology is used to determine the domain specific concepts which in turn produced the domain specific important features. Further, the polarities of the extracted concepts are determined using the contextual polarity lexicon which we developed by considering the context information of a word. Finally, semantic orientations of domain specific features of the review document are aggregated based on the importance of a feature with respect to the domain. The importance of the feature is determined by the depth of the feature in the ontology. Experimental results show the effectiveness of the proposed methods.


2015 ◽  
Vol 3 ◽  
pp. 117-129 ◽  
Author(s):  
Congle Zhang ◽  
Stephen Soderland ◽  
Daniel S. Weld

Most approaches to relation extraction, the task of extracting ground facts from natural language text, are based on machine learning and thus starved by scarce training data. Manual annotation is too expensive to scale to a comprehensive set of relations. Distant supervision, which automatically creates training data, only works with relations that already populate a knowledge base (KB). Unfortunately, KBs such as FreeBase rarely cover event relations ( e.g. “person travels to location”). Thus, the problem of extracting a wide range of events — e.g., from news streams — is an important, open challenge. This paper introduces NewsSpike-RE, a novel, unsupervised algorithm that discovers event relations and then learns to extract them. NewsSpike-RE uses a novel probabilistic graphical model to cluster sentences describing similar events from parallel news streams. These clusters then comprise training data for the extractor. Our evaluation shows that NewsSpike-RE generates high quality training sentences and learns extractors that perform much better than rival approaches, more than doubling the area under a precision-recall curve compared to Universal Schemas.


Sign in / Sign up

Export Citation Format

Share Document