scholarly journals Cross-Fertilizing Deep Web Analysis and Ontology Enrichment

2017 ◽  
Author(s):  
Marilena Oita ◽  
Antoine Amarilli ◽  
Pierre Senellart

Deep Web databases, whose content is presented as dynamically-generated Web pages hidden behind forms, have mostly been left unindexed by search engine crawlers. In order to automatically explore this mass of information, many current techniques assume the existence of domain knowledge, which is costly to create and maintain. In this article, we present a new perspective on form understanding and deep Web data acquisition that does not require any domain-specific knowledge. Unlike previous approaches, we do not perform the various steps in the process (e.g., form understanding, record identification, attribute labeling) independently but integrate them to achieve a more complete understanding of deep Web sources. Through information extraction techniques and using the form itself for validation, we reconcile input and output schemas in a labeled graph which is further aligned with a generic ontology. The impact of this alignment is threefold: first, the resulting semantic infrastructure associated with the form can assist Web crawlers when probing the form for content indexing; second, attributes of response pages are labeled by matching known ontology instances, and relations between attributes are uncovered; and third, we enrich the generic ontology with facts from the deep Web.

2020 ◽  
Vol 34 (03) ◽  
pp. 2901-2908 ◽  
Author(s):  
Weijie Liu ◽  
Peng Zhou ◽  
Zhe Zhao ◽  
Zhiruo Wang ◽  
Qi Ju ◽  
...  

Pre-trained language representation models, such as BERT, capture a general language representation from large-scale corpora, but lack domain-specific knowledge. When reading a domain text, experts make inferences with relevant knowledge. For machines to achieve this capability, we propose a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge. However, too much knowledge incorporation may divert the sentence from its correct meaning, which is called knowledge noise (KN) issue. To overcome KN, K-BERT introduces soft-position and visible matrix to limit the impact of knowledge. K-BERT can easily inject domain knowledge into the models by being equipped with a KG without pre-training by itself because it is capable of loading model parameters from the pre-trained BERT. Our investigation reveals promising results in twelve NLP tasks. Especially in domain-specific tasks (including finance, law, and medicine), K-BERT significantly outperforms BERT, which demonstrates that K-BERT is an excellent choice for solving the knowledge-driven problems that require experts.


2016 ◽  
Vol 34 (3) ◽  
pp. 435-456 ◽  
Author(s):  
Lixin Xia ◽  
Zhongyi Wang ◽  
Chen Chen ◽  
Shanshan Zhai

Purpose Opinion mining (OM), also known as “sentiment classification”, which aims to discover common patterns of user opinions from their textual statements automatically or semi-automatically, is not only useful for customers, but also for manufacturers. However, because of the complexity of natural language, there are still some problems, such as domain dependence of sentiment words, extraction of implicit features and others. The purpose of this paper is to propose an OM method based on topic maps to solve these problems. Design/methodology/approach Domain-specific knowledge is key to solve problems in feature-based OM. On the one hand, topic maps, as an ontology framework, are composed of topics, associations, occurrences and scopes, and can represent a class of knowledge representation schemes. On the other hand, compared with ontology, topic maps have many advantages. Thus, it is better to integrate domain-specific knowledge into OM based on topic maps. This method can make full use of the semantic relationships among feature words and sentiment words. Findings In feature-level OM, most of the existing research associate product features and opinions by their explicit co-occurrence, or use syntax parsing to judge the modification relationship between opinion words and product features within a review unit. They are mostly based on the structure of language units without considering domain knowledge. Only few methods based on ontology incorporate domain knowledge into feature-based OM, but they only use the “is-a” relation between concepts. Therefore, this paper proposes feature-based OM using topic maps. The experimental results revealed that this method can improve the accuracy of the OM. The findings of this study not only advance the state of OM research but also shed light on future research directions. Research limitations/implications To demonstrate the “feature-based OM using topic maps” applications, this work implements a prototype that helps users to find their new washing machines. Originality/value This paper presents a new method of feature-based OM using topic maps, which can integrate domain-specific knowledge into feature-based OM effectively. This method can improve the accuracy of the OM greatly. The proposed method can be applied across various application domains, such as e-commerce and e-government.


2004 ◽  
Vol 13 (03) ◽  
pp. 721-738 ◽  
Author(s):  
XIAOYING GAO ◽  
MENGJIE ZHANG

This paper describes a learning/adaptive approach to automatically building knowledge bases for information extraction from text based web pages. A frame based representation is introduced to represent domain knowledge as knowledge unit frames. A frame learning algorithm is developed to automatically learn knowledge unit frames from training examples. Some training examples can be obtained by automatically parsing a number of tabular web pages in the same domain, which greatly reduced the amount of time consuming manual work. This approach was investigated on ten web sites of real estate advertisements and car advertisements and nearly all the information was successfully extracted with very few false alarms. These results suggest that both the knowledge unit frame representation and the frame learning algorithm work well, domain specific knowledge bases can be learned from training examples, and the domain specific knowledge base can be used for information extraction from flexible text-based semi-structured Web pages on multiple Web sites. The investigation of the knowledge representation on five other domains suggests that this approach can be easily applied to other domains by simply changing the training examples.


2020 ◽  
pp. 21-32
Author(s):  
Daphne Leong

This chapter describes the things and people that facilitate collaboration across disciplines: shared items, shared objectives, and shared agents. (These concepts draw from literature on collaboration in the sciences and from research on intercultural communication.) Shared items function differently from discipline to discipline, while being identifiable across disciplines. Shared objectives comprise activity objects, the prospective outcomes of collaboration, and epistemic objects, knowledge sought. Shared agents function within and across two or more disciplines. In this book, shared items are represented primarily by scores (and recordings), activity objects by the book’s chapters, epistemic objects by interpretations of pieces and of analysis-performance relations, and shared agents by scholar-performers or performer-scholars. Mechanisms and processes of collaboration are briefly described: strategies for collaborating when views diverge, and degrees of collaborative convergence (working in parallel, translating or mediating knowledge for mutual influence, transforming domain-specific knowledge into new cross-domain knowledge).


1998 ◽  
Vol 13 (1) ◽  
pp. 91-103 ◽  
Author(s):  
Wolfgang Schneider ◽  
Matthias Schlagmüller ◽  
Mechtild Visé

2014 ◽  
Vol 10 (3) ◽  
pp. 249-261 ◽  
Author(s):  
Tessa Sanderson ◽  
Jo Angouri

The active involvement of patients in decision-making and the focus on patient expertise in managing chronic illness constitutes a priority in many healthcare systems including the NHS in the UK. With easier access to health information, patients are almost expected to be (or present self) as an ‘expert patient’ (Ziebland 2004). This paper draws on the meta-analysis of interview data collected for identifying treatment outcomes important to patients with rheumatoid arthritis (RA). Taking a discourse approach to identity, the discussion focuses on the resources used in the negotiation and co-construction of expert identities, including domain-specific knowledge, access to institutional resources, and ability to self-manage. The analysis shows that expertise is both projected (institutionally sanctioned) and claimed by the patient (self-defined). We close the paper by highlighting the limitations of our pilot study and suggest avenues for further research.


1989 ◽  
Vol 81 (3) ◽  
pp. 306-312 ◽  
Author(s):  
Wolfgang Schneider ◽  
Joachim Körkel ◽  
Franz E. Weinert

1998 ◽  
Vol 10 (1) ◽  
pp. 1-34 ◽  
Author(s):  
Alfonso Caramazza ◽  
Jennifer R. Shelton

We claim that the animate and inanimate conceptual categories represent evolutionarily adapted domain-specific knowledge systems that are subserved by distinct neural mechanisms, thereby allowing for their selective impairment in conditions of brain damage. On this view, (some of) the category-specific deficits that have recently been reported in the cognitive neuropsychological literature—for example, the selective damage or sparing of knowledge about animals—are truly categorical effects. Here, we articulate and defend this thesis against the dominant, reductionist theory of category-specific deficits, which holds that the categorical nature of the deficits is the result of selective damage to noncategorically organized visual or functional semantic subsystems. On the latter view, the sensory/functional dimension provides the fundamental organizing principle of the semantic system. Since, according to the latter theory, sensory and functional properties are differentially important in determining the meaning of the members of different semantic categories, selective damage to the visual or the functional semantic subsystem will result in a category-like deficit. A review of the literature and the results of a new case of category-specific deficit will show that the domain-specific knowledge framework provides a better account of category-specific deficits than the sensory/functional dichotomy theory.


Sign in / Sign up

Export Citation Format

Share Document