Rapidly building domain-specific entity-centric language models using semantic web knowledge sources

Author(s):  
Murat Akbacak ◽  
Dilek Hakkani-Tür ◽  
Gokhan Tur
Semantic Web ◽  
2020 ◽  
pp. 1-29
Author(s):  
Bettina Klimek ◽  
Markus Ackermann ◽  
Martin Brümmer ◽  
Sebastian Hellmann

In the last years a rapid emergence of lexical resources has evolved in the Semantic Web. Whereas most of the linguistic information is already machine-readable, we found that morphological information is mostly absent or only contained in semi-structured strings. An integration of morphemic data has not yet been undertaken due to the lack of existing domain-specific ontologies and explicit morphemic data. In this paper, we present the Multilingual Morpheme Ontology called MMoOn Core which can be regarded as the first comprehensive ontology for the linguistic domain of morphological language data. It will be described how crucial concepts like morphs, morphemes, word forms and meanings are represented and interrelated and how language-specific morpheme inventories can be created as a new possibility of morphological datasets. The aim of the MMoOn Core ontology is to serve as a shared semantic model for linguists and NLP researchers alike to enable the creation, conversion, exchange, reuse and enrichment of morphological language data across different data-dependent language sciences. Therefore, various use cases are illustrated to draw attention to the cross-disciplinary potential which can be realized with the MMoOn Core ontology in the context of the existing Linguistic Linked Data research landscape.


2019 ◽  
Vol 1 (3) ◽  
Author(s):  
A. Aziz Altowayan ◽  
Lixin Tao

We consider the following problem: given neural language models (embeddings) each of which is trained on an unknown data set, how can we determine which model would provide a better result when used for feature representation in a downstream task such as text classification or entity recognition? In this paper, we assess the word similarity measure through analyzing its impact on word embeddings learned from various datasets and how they perform in a simple classification task. Word representations were learned and assessed under the same conditions. For training word vectors, we used the implementation of Continuous Bag of Words described in [1]. To assess the quality of the vectors, we applied the analogy questions test for word similarity described in the same paper. Further, to measure the retrieval rate of an embedding model, we introduced a new metric (Average Retrieval Error) which measures the percentage of missing words in the model. We observe that scoring a high accuracy of syntactic and semantic similarities between word pairs is not an indicator of better classification results. This observation can be justified by the fact that a domain-specific corpus contributes to the performance better than a general-purpose corpus. For reproducibility, we release our experiments scripts and results.


2011 ◽  
Vol 6 (2) ◽  
pp. 209-221 ◽  
Author(s):  
Huda Khan ◽  
Brian Caruso ◽  
Jon Corson-Rikert ◽  
Dianne Dietrich ◽  
Brian Lowe ◽  
...  

In disciplines as varied as medicine, social sciences, and economics, data and their analyses are essential parts of researchers’ contributions to their respective fields. While sharing research data for review and analysis presents new opportunities for furthering research, capturing these data in digital forms and providing the digital infrastructure for sharing data and metadata pose several challenges. This paper reviews the motivations behind and design of the Data Staging Repository (DataStaR) platform that targets specific portions of the research data curation lifecycle: data and metadata capture and sharing prior to publication, and publication to permanent archival repositories. The goal of DataStaR is to support both the sharing and publishing of data while at the same time enabling metadata creation without imposing additional overheads for researchers and librarians. Furthermore, DataStaR is intended to provide cross-disciplinary support by being able to integrate different domain-specific metadata schemas according to researchers’ needs. DataStaR’s strategy of a usable interface coupled with metadata flexibility allows for a more scaleable solution for data sharing, publication, and metadata reuse.


2007 ◽  
Vol 19 (2) ◽  
pp. 297-309 ◽  
Author(s):  
Yuanbo Guo ◽  
Abir Qasem ◽  
Zhengxiang Pan ◽  
Jeff Heflin

Author(s):  
Jose María Alvarez Rodríguez ◽  
José Emilio Labra Gayo ◽  
Patricia Ordoñez de Pablos

The aim of this chapter is to present a proposal and a case study to describe the information about organizations in a standard way using the Linked Data approach. Several models and ontologies have been provided in order to formalize the data, structure and behaviour of organizations. Nevertheless, these tries have not been fully accepted due to some factors: (1) missing pieces to define the status of the organization; (2) tangled parts to specify the structure (concepts and relations) between the elements of the organization; 3) lack of text properties, and other factors. These divergences imply a set of incomplete approaches to formalize data and information about organizations. Taking into account the current trends of applying semantic web technologies and linked data to formalize, aggregate, and share domain specific information, a new model for organizations taking advantage of these initiatives is required in order to overcome existing barriers and exploit the corporate information in a standard way. This work is especially relevant in some senses to: (1) unify existing models to provide a common specification; (2) apply semantic web technologies and the Linked Data approach; (3) provide access to the information via standard protocols, and (4) offer new services that can exploit this information to trace the evolution and behaviour of the organization over time. Finally, this work is interesting to improve the clarity and transparency of some scenarios in which organizations play a key role, like e-procurement, e-health, or financial transactions.


Author(s):  
Jakub Flotyński ◽  
Athanasios G. Malamos ◽  
Don Brutzman ◽  
Felix G. Hamza-Lup ◽  
Nicholas F. Polys ◽  
...  

The implementation of virtual and augmented reality environments on the web requires integration between 3D technologies and web technologies, which are increasingly focused on collaboration, annotation, and semantics. Thus, combining VR and AR with the semantics arises as a significant trend in the development of the web. The use of the Semantic Web may improve creation, representation, indexing, searching, and processing of 3D web content by linking the content with formal and expressive descriptions of its meaning. Although several semantic approaches have been developed for 3D content, they are not explicitly linked to the available well-established 3D technologies, cover a limited set of 3D components and properties, and do not combine domain-specific and 3D-specific semantics. In this chapter, the authors present the background, concepts, and development of the Semantic Web3D approach. It enables ontology-based representation of 3D content and introduces a novel framework to provide 3D structures in an RDF semantic-friendly format.


Sign in / Sign up

Export Citation Format

Share Document