Domain-specific entity extraction from noisy, unstructured data using ontology-guided search

Author(s):  
Sergey Bratus ◽  
Anna Rumshisky ◽  
Alexy Khrabrov ◽  
Rajenda Magar ◽  
Paul Thompson
Author(s):  
Emrah Inan ◽  
Burak Yonyul ◽  
Fatih Tekbacak

Most of the data on the web is non-structural, and it is required that the data should be transformed into a machine operable structure. Therefore, it is appropriate to convert the unstructured data into a structured form according to the requirements and to store those data in different data models by considering use cases. As requirements and their types increase, it fails using one approach to perform on all. Thus, it is not suitable to use a single storage technology to carry out all storage requirements. Managing stores with various type of schemas in a joint and an integrated manner is named as 'multistore' and 'polystore' in the database literature. In this paper, Entity Linking task is leveraged to transform texts into wellformed data and this data is managed by an integrated environment of different data models. Finally, this integrated big data environment will be queried and be examined by presenting the method.


2021 ◽  
Vol 13 (2) ◽  
pp. 85-109
Author(s):  
Abduladem Aljamel ◽  
Taha Osman ◽  
Dhavalkumar Thakker

The availability of online documents that describe domain-specific information provides an opportunity in employing a knowledge-based approach in extracting information from web data. This research proposes a novel comprehensive semantic knowledge-based framework that helps to transform unstructured data to be easily exploited by data scientists. The resultant sematic knowledgebase is reasoned to infer new facts and classify events that might be of importance to end users. The target use case for the framework implementation was the financial domain, which represents an important class of dynamic applications that require the modelling of non-binary relations. Such complex relations are becoming increasingly common in the era of linked open data. This research in modelling and reasoning upon such relations is a further contribution of the proposed semantic framework, where non-binary relations are semantically modelled by adapting the semantic reasoning axioms to fit the intermediate resources in the N-ary relations requirements.


2015 ◽  
Vol 24 (02) ◽  
pp. 1540012 ◽  
Author(s):  
Pavlos Fafalios ◽  
Manolis Baritakis ◽  
Yannis Tzitzikas

Named Entity Extraction (NEE) is the process of identifying entities in texts and, very commonly, linking them to related (Web) resources. This task is useful in several applications, e.g. for question answering, annotating documents, post-processing of search results, etc. However, existing NEE tools lack an open or easy configuration although this is very important for building domain-specific applications. For example, supporting a new category of entities, or specifying how to link the detected entities with online resources, is either impossible or very laborious. In this paper, we show how we can exploit semantic information (Linked Data) at real-time for configuring (handily) a NEE system and we propose a generic model for configuring such services. To explicitly define the semantics of the proposed model, we introduce an RDF/S vocabulary, called “Open NEE Configuration Model”, which allows a NEE service to describe (and publish as Linked Data) its entity mining capabilities, but also to be dynamically configured. To allow relating the output of a NEE process with an applied configuration, we propose an extension of the Open Annotation Data Model which also enables an application to run advanced queries over the annotated data. As a proof of concept, we present X-Link, a fully-configurable NEE framework that realizes this approach. Contrary to the existing tools, X-Link allows the user to easily define the categories of entities that are interesting for the application at hand by exploiting one or more semantic Knowledge Bases. The user is also able to update a category and specify how to semantically link and enrich the identified entities. This enhanced configurability allows X-Link to be easily configured for different contexts for building domain-specific applications. To test the approach, we conducted a task-based evaluation with users that demonstrates its usability, and a case study that demonstrates its feasibility.


2021 ◽  
Vol 2 (4) ◽  
Author(s):  
Kanishk Verma ◽  
Brian Davis

AbstractMining opinions from reviews has been a field of ever-growing research. These include mining opinions on document level, sentence level and even aspect level. While explicitly mentioned aspects from user-generated texts have been widely researched, very little work has been done in gathering opinions on aspects that are implied and not explicitly mentioned. Previous work to identify implicit aspects and opinion was limited to syntactic-based classifiers or other machine learning methods trained on restaurant dataset. In this paper, the present is a novel study for extracting and analysing implicit aspects and opinions from airline reviews in English. Through this study, an airline domain-specific aspect-based annotated corpus, and a novel two-way technique that first augments pre-trained word embeddings for sequential with stochastic gradient descent optimized conditional random fields (CRF) and second using machine and ensemble learning algorithms to classify the implied aspects is devised and developed. This two-way technique resolves double-implicit problem, most encountered by previous work in implicit aspect and opinion text mining. Experiments with a hold-out test set on the first level i.e., entity extraction by optimized CRF yield a result of ROC-AUC score of 96% and F1 score of 94% outperforming few baseline systems. Further experiments with a range of machine and ensemble learning classifier algorithms to classify implied aspects and opinions for each entity yields a result of ROC-AUC score ranging from 71 to 94.8% for all implied entities. This two-level technique for implicit aspect extraction and classification outperforms many baseline systems in this domain.


2021 ◽  
Author(s):  
Abinaya Govindan ◽  
Gyan Ranjan ◽  
Amit Verma

Question Answering (QA) has been a well-researched NLP problem over the past few years. The ability for users to query through information content that is available in a range of formats - organized and unstructured - has become a requirement. This paper proposes to untangle factoid question answering targeting the Hi-Tech domain. This paper addresses issues faced during document question answering, such as document parsing, indexing and retrieval (identifying the relevant documents) as well as machine comprehension (extract spans of correct answers from the context). Our suggested solution provides a comprehensive pipeline comprised of document ingestion modules that handle a wide range of unstructured data across various sections of the document, such as textual, images, and tabular content. Our studies on a variety of “real-world” and domain-specific datasets show how current fine-tuned models are insufficient for this challenging task, and how our proposed pipeline is an effective alternative.


2008 ◽  
Vol 67 (2) ◽  
pp. 71-83 ◽  
Author(s):  
Yolanda A. Métrailler ◽  
Ester Reijnen ◽  
Cornelia Kneser ◽  
Klaus Opwis

This study compared individuals with pairs in a scientific problem-solving task. Participants interacted with a virtual psychological laboratory called Virtue to reason about a visual search theory. To this end, they created hypotheses, designed experiments, and analyzed and interpreted the results of their experiments in order to discover which of five possible factors affected the visual search process. Before and after their interaction with Virtue, participants took a test measuring theoretical and methodological knowledge. In addition, process data reflecting participants’ experimental activities and verbal data were collected. The results showed a significant but equal increase in knowledge for both groups. We found differences between individuals and pairs in the evaluation of hypotheses in the process data, and in descriptive and explanatory statements in the verbal data. Interacting with Virtue helped all students improve their domain-specific and domain-general psychological knowledge.


2008 ◽  
Vol 16 (3) ◽  
pp. 112-115 ◽  
Author(s):  
Stephan Bongard ◽  
Volker Hodapp ◽  
Sonja Rohrmann

Abstract. Our unit investigates the relationship of emotional processes (experience, expression, and coping), their physiological correlates and possible health outcomes. We study domain specific anger expression behavior and associated cardio-vascular loads and found e.g. that particularly an open anger expression at work is associated with greater blood pressure. Furthermore, we demonstrated that women may be predisposed for the development of certain mental disorders because of their higher disgust sensitivity. We also pointed out that the suppression of negative emotions leads to increased physiological stress responses which results in a higher risk for cardiovascular diseases. We could show that relaxation as well as music activity like singing in a choir causes increases in the local immune parameter immunoglobuline A. Finally, we are investigating connections between migrants’ strategy of acculturation and health and found e.g. elevated cardiovascular stress responses in migrants when they where highly adapted to the German culture.


Sign in / Sign up

Export Citation Format

Share Document