SPedia

Author(s):  
Muhammad Ahtisham Aslam ◽  
Naif Radi Aljohani

Producing the Linked Open Data (LOD) is getting potential to publish high-quality interlinked data. Publishing such data facilitates intelligent searching from the Web of data. In the context of scientific publications, data about millions of scientific documents published by hundreds and thousands of publishers is in silence as it is not published as open data and ultimately is not linked to other datasets. In this paper the authors present SPedia: a semantically enriched knowledge base of data about scientific documents. SPedia knowledge base provides information on more than nine million scientific documents, consisting of more than three hundred million RDF triples. These extracted datasets, allow users to put sophisticated queries by employing semantic Web techniques instead of relying on keyword-based searches. This paper also shows the quality of extracted data by performing sample queries through SPedia SPARQL Endpoint and analyzing results. Finally, the authors describe that how SPedia can serve as central hub for the cloud of LOD of scientific publications.

2017 ◽  
Vol 13 (1) ◽  
pp. 128-147 ◽  
Author(s):  
Muhammad Ahtisham Aslam ◽  
Naif Radi Aljohani

Producing the Linked Open Data (LOD) is getting potential to publish high-quality interlinked data. Publishing such data facilitates intelligent searching from the Web of data. In the context of scientific publications, data about millions of scientific documents published by hundreds and thousands of publishers is in silence as it is not published as open data and ultimately is not linked to other datasets. In this paper the authors present SPedia: a semantically enriched knowledge base of data about scientific documents. SPedia knowledge base provides information on more than nine million scientific documents, consisting of more than three hundred million RDF triples. These extracted datasets, allow users to put sophisticated queries by employing semantic Web techniques instead of relying on keyword-based searches. This paper also shows the quality of extracted data by performing sample queries through SPedia SPARQL Endpoint and analyzing results. Finally, the authors describe that how SPedia can serve as central hub for the cloud of LOD of scientific publications.


Author(s):  
JOSEP MARIA BRUNETTI ◽  
ROSA GIL ◽  
JUAN MANUEL GIMENO ◽  
ROBERTO GARCIA

Thanks to Open Data initiatives the amount of data available on the Web is rapidly increasing. Unfortunately, most of these initiatives only publish raw tabular data, which makes its analysis and reuse very difficult. Linked Data principles allow for a more sophisticated approach by making explicit both the structure and semantics of the data. However, from the user experience viewpoint, published datasets continue to be monolithic files which are completely opaque or difficult to explore by making complex semantic queries. Our objective is to facilitate the user to grasp what kind of entities are in the dataset, how they are interrelated, which are their main properties and values, etc. Rhizomer is a data publishing tool whose interface provides a set of components borrowed from Information Architecture (IA) that facilitate getting an insight of the dataset at hand. Rhizomer automatically generates navigation menus and facets based on the kinds of things in the dataset and how they are described through metadata properties and values. This tool is currently being evaluated with end users that discover a whole new perspective of the Web of Data.


Author(s):  
Ahsan Morshed

In the spite of explosive growth of the Internet, information relevant to users is often unavailable even when using the latest browsers. At the same time, there is an ever-increasing number of documents that vary widely in content, format, and quality. The documents often change in content and location because they do not belong to any kind of centralized control. On the other hand, there is a huge number of unknown users with extremely diverse needs, skills, education, and cultural and language backgrounds. One of the solutions to these problems might be to use standard terms with meaning; this can be termed as controlled vocabulary (CV). Though there is no specific notion of CV, we can define it as a set of concepts or preferred terms and existing relations among them. These vocabularies play very important roles classifying the information. In this chapter, we focus the role of CV for publishing the web of data on the Web.


2018 ◽  
Vol 10 (4) ◽  
pp. 1
Author(s):  
Mileidy Alvarez-Melgarejo ◽  
Martha L. Torres-Barreto

The bibliometric method has proven to be a powerful tool for the analysis of scientific publications, in such a way that allows rating the quality of the knowledge generating process, as well as its impact on firm´s environment. This article presents a comparison between two powerful bibliographic databases in terms of their coverage and the usefulness of their content. The comparison starts with a subject associated to the relationship between resources and capabilities. The outcomes show that the search results differ between both databases. The Web Of Science (WOS), has a greater coverage than SCOPUS has.  It also has a greater impact in terms of most cited authors and publications. The search results in the WOS yield articles from 2001, while Scopus yields articles from 1976, however, some of the latter are inconsistent with the topic being searched. The analysis points to a lack of studies regarding resources as foundations of firm´s capabilities; as a result, new research on this field is suggested.


Author(s):  
Heiko Paulheim ◽  
Christian Bizer

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.


Author(s):  
Axel Polleres ◽  
Simon Steyskal

The World Wide Web Consortium (W3C) as the main standardization body for Web standards has set a particular focus on publishing and integrating Open Data. In this chapter, the authors explain various standards from the W3C's Semantic Web activity and the—potential—role they play in the context of Open Data: RDF, as a standard data format for publishing and consuming structured information on the Web; the Linked Data principles for interlinking RDF data published across the Web and leveraging a Web of Data; RDFS and OWL to describe vocabularies used in RDF and for describing mappings between such vocabularies. The authors conclude with a review of current deployments of these standards on the Web, particularly within public Open Data initiatives, and discuss potential risks and challenges.


Author(s):  
Albert Meroño-Peñuela ◽  
Ashkan Ashkpour ◽  
Valentijn Gilissen ◽  
Jan Jonker ◽  
Tom Vreugdenhil ◽  
...  

The Dutch Historical Censuses (1795–1971) contain statistics that describe almost two centuries of History in the Netherlands. These censuses were conducted once every 10 years (with some exceptions) from 1795 to 1971. Researchers have used its wealth of demographic, occupational, and housing information to answer fundamental questions in social economic history. However, accessing these data has traditionally been a time consuming and knowledge intensive task. In this paper, we describe the outcomes of the cedar project, which make access to the digitized assets of the Dutch Historical Censuses easier, faster, and more reliable. This is achieved by using the data publishing paradigm of Linked Data from the Semantic Web. We use a digitized sample of 2,288 census tables to produce a linked dataset of more than 6.8 million statistical observations. The dataset is modeled using the rdf Data Cube, Open Annotation, and prov vocabularies. The contributions of representing this dataset as Linked Data are: (1) a uniform database interface for efficient querying of census data; (2) a standardized and reproducible data harmonization workflow; and (3) an augmentation of the dataset through richer connections to related resources on the Web.


2006 ◽  
Vol 6 (1) ◽  
pp. 25-33 ◽  
Author(s):  
E. Glennie ◽  
A. Kirby

Purpose: To establish whether or not the quantity and quality of information available on the internet about the career of diagnostic radiography is of a good or satisfactory standard.Methods: Four search engines with four different search terms were used and the top twenty hits for each group were read. The applicable sites were scored to determine the quality of each site.Results: Only 12% (37) of the 320 sites read were applicable. Out of the 37 there were 4 sites that gained a good score from the scoring sheet and therefore were classed as high quality, but 21 out of 37 sites did gain half marks or over.Conclusions: In conclusion, the quantity and quality of sites about the career of radiography was not of a satisfactory standard and more attention from both the government and professional bodies is needed if the profession is to gain attention and the staff shortage problem is to be solved.


Sign in / Sign up

Export Citation Format

Share Document