scholarly journals Methods and Algorithms for Increasing Linked Data Expressiveness (Overview)

2020 ◽  
Vol 23 (4) ◽  
pp. 808-834
Author(s):  
Olga Avenirovna Nevzorova

This review discusses methods and algorithms for increasing linked data expressiveness which are prepared for Web publication. The main approaches to the enrichment of ontologies are considered, the methods on which they are based and the tools for implementing the corresponding methods are described.The main stage in the general scheme of the related data life cycle in a cloud of Linked Open Data is the stage of building a set of related RDF- triples. To improve the classification of data and the analysis of their quality, various methods are used to increase the expressiveness of related data. The main ideas of these methods are concerned with the enrichment of existing ontologies (an expansion of the basic scheme of knowledge) by adding or improving terminological axioms. Enrichment methods are based on methods used in various fields, such as knowledge representation, machine learning, statistics, natural language processing, analysis of formal concepts, and game theory.

2020 ◽  
Author(s):  
Alexandr Mansurov ◽  
Olga Majlingova

<p><span>Linked data is a method for publishing structured data in a way that </span>also expresses its semantics. This semantic description is implemented <span>by the use of vocabularies, which are usually specified by the W3C as web standards. However, anyone can create and register their vocabulary </span>and register it in an open catalogue like LOV.</p><p><span>There are many situations where it would be useful to be able to publish multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts. The Data Cube vocabulary provides a means to do this using the W3C RDF (Resource Description Framework) standard. The model underpinning the Data Cube vocabulary is compatible with the cube model that underlies SDMX (Statistical Data and Metadata eXchange), an ISO standard for exchanging </span>and sharing statistical data and metadata among organizations [1].<br><br>Given the dispersed nature of linked data, we want to infer <span>relationships between Linked Open Data datasets based on their semantic description. </span><span>In particular we are interested in geospatial relationships.<br><br></span> We show a generic approach for relationships in semantic data cubes using the same taxonomies, related dimensions, as well as through <span>structured geographical datasets. Good results were achieved using </span>structural geographical ontologies in combination with the generic <span>approach for taxonomies.</span></p><div> <div> </div> </div><p><span><br>[1]     Cyganiak, Reynolds, Tennison:  The RDF Data Cube Vocabulary, W3C Recommendation, 16 January 2014, <br></span></p>


2017 ◽  
Vol 108 (1) ◽  
pp. 355-366 ◽  
Author(s):  
Ankit Srivastava ◽  
Georg Rehm ◽  
Felix Sasaki

Abstract With the ever increasing availability of linked multilingual lexical resources, there is a renewed interest in extending Natural Language Processing (NLP) applications so that they can make use of the vast set of lexical knowledge bases available in the Semantic Web. In the case of Machine Translation, MT systems can potentially benefit from such a resource. Unknown words and ambiguous translations are among the most common sources of error. In this paper, we attempt to minimise these types of errors by interfacing Statistical Machine Translation (SMT) models with Linked Open Data (LOD) resources such as DBpedia and BabelNet. We perform several experiments based on the SMT system Moses and evaluate multiple strategies for exploiting knowledge from multilingual linked data in automatically translating named entities. We conclude with an analysis of best practices for multilingual linked data sets in order to optimise their benefit to multilingual and cross-lingual applications.


2017 ◽  
Vol 2017 ◽  
pp. 1-13 ◽  
Author(s):  
Yueqin Zhu ◽  
Wenwen Zhou ◽  
Yang Xu ◽  
Ji Liu ◽  
Yongjie Tan

Knowledge graph (KG) as a popular semantic network has been widely used. It provides an effective way to describe semantic entities and their relationships by extending ontology in the entity level. This article focuses on the application of KG in the traditional geological field and proposes a novel method to construct KG. On the basis of natural language processing (NLP) and data mining (DM) algorithms, we analyze those key technologies for designing a KG towards geological data, including geological knowledge extraction and semantic association. Through this typical geological ontology extracting on a large number of geological documents and open linked data, the semantic interconnection is achieved, KG framework for geological data is designed, application system of KG towards geological data is constructed, and dynamic updating of the geological information is completed accordingly. Specifically, unsupervised intelligent learning method using linked open data is incorporated into the geological document preprocessing, which generates a geological domain vocabulary ultimately. Furthermore, some application cases in the KG system are provided to show the effectiveness and efficiency of our proposed intelligent learning approach for KG.


2021 ◽  
Vol 26 (2) ◽  
pp. 143-149
Author(s):  
Abdelghani Bouziane ◽  
Djelloul Bouchiha ◽  
Redha Rebhi ◽  
Giulio Lorenzini ◽  
Noureddine Doumi ◽  
...  

The evolution of the traditional Web into the semantic Web makes the machine a first-class citizen on the Web and increases the discovery and accessibility of unstructured Web-based data. This development makes it possible to use Linked Data technology as the background knowledge base for unstructured data, especially texts, now available in massive quantities on the Web. Given any text, the main challenge is determining DBpedia's most relevant information with minimal effort and time. Although, DBpedia annotation tools, such as DBpedia spotlight, mainly targeted English and Latin DBpedia versions. The current situation of the Arabic language is less bright; the Web content of the Arabic language does not reflect the importance of this language. Thus, we have developed an approach to annotate Arabic texts with Linked Open Data, particularly DBpedia. This approach uses natural language processing and machine learning techniques for interlinking Arabic text with Linked Open Data. Despite the high complexity of the independent domain knowledge base and the reduced resources in Arabic natural language processing, the evaluation results of our approach were encouraging.


Author(s):  
Caio Saraiva Coneglian ◽  
José Eduardo Santarem Segundo

O surgimento de novas tecnologias, tem introduzido meios para a divulgação e a disponibilização das informações mais eficientemente. Uma iniciativa, chamada de Europeana, vem promovendo esta adaptação dos objetos informacionais dentro da Web, e mais especificamente no Linked Data. Desta forma, o presente estudo tem como objetivo apresentar uma discussão acerca da relação entre as Humanidades Digitais e o Linked Open Data, na figura da Europeana. Para tal, utilizamos uma metodologia exploratória e que busca explorar as questões relacionadas ao modelo de dados da Europeana, EDM, por meio do SPARQL. Como resultados, compreendemos as características do EDM, pela utilização do SPARQL. Identificamos, ainda, a importância que o conceito de Humanidades Digitais possui dentro do contexto da Europeana.Palavras-chave: Web semântica. Linked open data. Humanidades digitais. Europeana. EDM.Link: https://periodicos.ufsc.br/index.php/eb/article/view/1518-2924.2017v22n48p88/33031


Data ◽  
2021 ◽  
Vol 6 (7) ◽  
pp. 71
Author(s):  
Gonçalo Carnaz ◽  
Mário Antunes ◽  
Vitor Beires Nogueira

Criminal investigations collect and analyze the facts related to a crime, from which the investigators can deduce evidence to be used in court. It is a multidisciplinary and applied science, which includes interviews, interrogations, evidence collection, preservation of the chain of custody, and other methods and techniques of investigation. These techniques produce both digital and paper documents that have to be carefully analyzed to identify correlations and interactions among suspects, places, license plates, and other entities that are mentioned in the investigation. The computerized processing of these documents is a helping hand to the criminal investigation, as it allows the automatic identification of entities and their relations, being some of which difficult to identify manually. There exists a wide set of dedicated tools, but they have a major limitation: they are unable to process criminal reports in the Portuguese language, as an annotated corpus for that purpose does not exist. This paper presents an annotated corpus, composed of a collection of anonymized crime-related documents, which were extracted from official and open sources. The dataset was produced as the result of an exploratory initiative to collect crime-related data from websites and conditioned-access police reports. The dataset was evaluated and a mean precision of 0.808, recall of 0.722, and F1-score of 0.733 were obtained with the classification of the annotated named-entities present in the crime-related documents. This corpus can be employed to benchmark Machine Learning (ML) and Natural Language Processing (NLP) methods and tools to detect and correlate entities in the documents. Some examples are sentence detection, named-entity recognition, and identification of terms related to the criminal domain.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Olga Majewska ◽  
Charlotte Collins ◽  
Simon Baker ◽  
Jari Björne ◽  
Susan Windisch Brown ◽  
...  

Abstract Background Recent advances in representation learning have enabled large strides in natural language understanding; However, verbal reasoning remains a challenge for state-of-the-art systems. External sources of structured, expert-curated verb-related knowledge have been shown to boost model performance in different Natural Language Processing (NLP) tasks where accurate handling of verb meaning and behaviour is critical. The costliness and time required for manual lexicon construction has been a major obstacle to porting the benefits of such resources to NLP in specialised domains, such as biomedicine. To address this issue, we combine a neural classification method with expert annotation to create BioVerbNet. This new resource comprises 693 verbs assigned to 22 top-level and 117 fine-grained semantic-syntactic verb classes. We make this resource available complete with semantic roles and VerbNet-style syntactic frames. Results We demonstrate the utility of the new resource in boosting model performance in document- and sentence-level classification in biomedicine. We apply an established retrofitting method to harness the verb class membership knowledge from BioVerbNet and transform a pretrained word embedding space by pulling together verbs belonging to the same semantic-syntactic class. The BioVerbNet knowledge-aware embeddings surpass the non-specialised baseline by a significant margin on both tasks. Conclusion This work introduces the first large, annotated semantic-syntactic classification of biomedical verbs, providing a detailed account of the annotation process, the key differences in verb behaviour between the general and biomedical domain, and the design choices made to accurately capture the meaning and properties of verbs used in biomedical texts. The demonstrated benefits of leveraging BioVerbNet in text classification suggest the resource could help systems better tackle challenging NLP tasks in biomedicine.


2021 ◽  
pp. 002203452110202
Author(s):  
F. Schwendicke ◽  
J. Krois

Data are a key resource for modern societies and expected to improve quality, accessibility, affordability, safety, and equity of health care. Dental care and research are currently transforming into what we term data dentistry, with 3 main applications: 1) medical data analysis uses deep learning, allowing one to master unprecedented amounts of data (language, speech, imagery) and put them to productive use. 2) Data-enriched clinical care integrates data from individual (e.g., demographic, social, clinical and omics data, consumer data), setting (e.g., geospatial, environmental, provider-related data), and systems level (payer or regulatory data to characterize input, throughput, output, and outcomes of health care) to provide a comprehensive and continuous real-time assessment of biologic perturbations, individual behaviors, and context. Such care may contribute to a deeper understanding of health and disease and a more precise, personalized, predictive, and preventive care. 3) Data for research include open research data and data sharing, allowing one to appraise, benchmark, pool, replicate, and reuse data. Concerns and confidence into data-driven applications, stakeholders’ and system’s capabilities, and lack of data standardization and harmonization currently limit the development and implementation of data dentistry. Aspects of bias and data-user interaction require attention. Action items for the dental community circle around increasing data availability, refinement, and usage; demonstrating safety, value, and usefulness of applications; educating the dental workforce and consumers; providing performant and standardized infrastructure and processes; and incentivizing and adopting open data and data sharing.


2021 ◽  
Vol 18 (1) ◽  
Author(s):  
Oscar Mukasa ◽  
Honorati Masanja ◽  
Don DeSavigny ◽  
Joanna Schellenberg

Abstract Background To illustrate the public health potential of linking individual bedside data with community-based household data in a poor rural setting, we estimated excess pediatric mortality risk after discharge from St Francis Designated District Hospital in Ifakara, Tanzania. Methods Linked data from demographic and clinical surveillance were used to describe post-discharge mortality and survival probability in children aged < 5 years, by age group and cause of admission. Cox regression models were developed to identify risk factors. Results Between March 2003 and March 2007, demographic surveillance included 28,910 children aged 0 to 5 years and among them 831 (3%) were admitted at least once to the district hospital. From all the children under the demographic surveillance 57,880 person years and 1381 deaths were observed in 24 months of follow up. Survivors of hospital discharge aged 0–5 years were almost two times more likely to die than children of the same age in the community who had not been admitted (RR = 1.9, P < 0.01, 95% CI 1.6, 2.4). Amongst children who had been admitted, mortality rate within a year was highest in infants (93 per 1000 person years) and amongst those admitted due to pneumonia and diarrhoea (97 and 85 per 1000 person years respectively). Those who lived 75 km or further from the district hospital, amongst children who were admitted and survived discharge from hospital, had a three times greater chance of dying within one year compared to those living within 25 km (adjusted HR 3.23, 95% CI 1.54,6.75). The probability of surviving the first 30 days post hospitalization was 94.4% [95% CI 94.4, 94.9], compared to 98.8% [95% CI 97.199.5] in non-hospitalized children of the same age in the commuity. Conclusion This study illustrates the potential of linking health related data from facility and household levels. Our results suggest that families may need additional support post hospitalization.


Sign in / Sign up

Export Citation Format

Share Document