Discovering Missing Links in Large-Scale Linked Data

Author(s):  
Nam Hau ◽  
Ryutaro Ichise ◽  
Bac Le
Keyword(s):  
Author(s):  
Xiang Zhang ◽  
Erjing Lin ◽  
Yulian Lv

In this article, the authors propose a novel search model: Multi-Target Search (MT search in brief). MT search is a keyword-based search model on Semantic Associations in Linked Data. Each search contains multiple sub-queries, in which each sub-query represents a certain user need for a certain object in a group relationship. They first formularize the problem of association search, and then introduce their approach to discover Semantic Associations in large-scale Linked Data. Next, they elaborate their novel search model, the notion of Virtual Document they use to extract linguistic features, and the details of search process. The authors then discuss the way search results are organized and summarized. Quantitative experiments are conducted on DBpedia to validate the effectiveness and efficiency of their approach.


2015 ◽  
Vol 25 (4) ◽  
pp. 291-298 ◽  
Author(s):  
Makoto GOTO
Keyword(s):  

2019 ◽  
Vol 52 (5) ◽  
pp. 1-40 ◽  
Author(s):  
Michalis Mountantonakis ◽  
Yannis Tzitzikas

Author(s):  
Tavinder Kaur Ark ◽  
Sarah Kesselring ◽  
Brent Hills ◽  
Kim McGrail

BackgroundPopulation Data BC (PopData) was established as a multi-university data and education resourceto support training and education, data linkage, and access to individual level, de-identified data forresearch in a wide variety of areas including human and community development and well-being. ApproachA combination of deterministic and probabilistic linkage is conducted based on the quality andavailability of identifiers for data linkage. PopData utilizes a harmonized data request and approvalprocess for data stewards and researchers to increase efficiency and ease of access to linked data.Researchers access linked data through a secure research environment (SRE) that is equipped witha wide variety of tools for analysis. The SRE also allows for ongoing management and control ofdata. PopData continues to expand its data holdings and to evolve its services as well as governanceand data access process. DiscussionPopData has provided efficient and cost-effective access to linked data sets for research. After twodecades of learning, future planned developments for the organization include, but are not limitedto, policies to facilitate programs of research, access to reusable datasets, evaluation and use of newdata linkage techniques such as privacy preserving record linkage (PPRL). ConclusionPopData continues to maintain and grow the number and type of data holdings available for research.Its existing models support a number of large-scale research projects and demonstrate the benefitsof having a third-party data linkage and provisioning center for research purposes. Building furtherconnections with existing data holders and governing bodies will be important to ensure ongoingaccess to data and changes in policy exist to facilitate access for researchers.


2021 ◽  
Author(s):  
Ashleigh Hawkins

AbstractMass digitisation and the exponential growth of born-digital archives over the past two decades have resulted in an enormous volume of archives and archival data being available digitally. This has produced a valuable but under-utilised source of large-scale digital data ripe for interrogation by scholars and practitioners in the Digital Humanities. However, current digitisation approaches fall short of the requirements of digital humanists for structured, integrated, interoperable, and interrogable data. Linked Data provides a viable means of producing such data, creating machine-readable archival data suited to analysis using digital humanities research methods. While a growing body of archival scholarship and praxis has explored Linked Data, its potential to open up digitised and born-digital archives to the Digital Humanities is under-examined. This article approaches Archival Linked Data from the perspective of the Digital Humanities, extrapolating from both archival and digital humanities Linked Data scholarship to identify the benefits to digital humanists of the production and provision of access to Archival Linked Data. It will consider some of the current barriers preventing digital humanists from being able to experience the benefits of Archival Linked Data evidenced, and to fully utilise archives which have been made available digitally. The article argues for increased collaboration between the two disciplines, challenges individuals and institutions to engage with Linked Data, and suggests the incorporation of AI and low-barrier tools such as Wikidata into the Linked Data production workflow in order to scale up the production of Archival Linked Data as a means of increasing access to and utilisation of digitised and born-digital archives.


Author(s):  
Mathias Dillen ◽  
Quentin Groom ◽  
Donat Agosti ◽  
Lars Nielsen

Zenodo (https://zenodo.org) is an open-access repository operated by CERN (European Organization for Nuclear Research), which provides researchers with an easy and stable platform to archive and publish their data and other output, such as software tools, manuals and project reports. In the context of the ICEDIG (Innovation and Consolidation for Large scale Digitisation of Natural Heritage) project, Zenodo was investigated for its usability as a platform where digitized images of collection specimens could be archived and published. In a production digitization pipeline, we foresee the automated archiving of daily image production. If Zenodo could be used for this purpose, such a process would also immediately mean that data and images are published FAIR-ly (Findable, Accessible, Interoperable and Reusable) within hours of their creation. To evaluate performance of the system, we first used a test dataset of 1800 herbarium specimen images, which was uploaded using Zenodo's API (Application Programming Interface) (Dillen et al. 2019). This dataset includes lossless TIFF images, label-segmented overlays and JSON-LD (JavaScript Object Notation for Linked Data) metadata using DwC (Darwin Core) terminology, constituting over 208 gigabytes of data. In addition, for all individual digital specimens the data about the specimen (in DwC) as well as metadata about its deposition on Zenodo (in Zenodo's internal data model) were available in multiple machine-readable formats. All data in DwC were provided as linked data with their DwC identifiers (e.g. http://rs.tdwg.org/dwc/terms/basisOfRecord). All individual specimens received minted DOIs (Digital Object Identifiers). A second upload of 280,000 herbarium JPEG images from a single institution (ca. 1 terabyte of data) with limited metadata (but using the same approach) was launched as well. In this presentation, the workflow for proper usage of the API will be described as well as some performance metrics, flexibilities and functionalities of the platform. Some issues and potential developments to tackle them will be discussed. Currently, the rate of ingestion into Zenodo seems only fast enough for small scale digitization pipelines. However, a modest improvement in transfer rate would make this a realistic proposition for large volume usage.


2019 ◽  
Vol 4 (1) ◽  
pp. 3
Author(s):  
Chen Tao ◽  
Rongrong Shan ◽  
Hui Li ◽  
Dongsheng Wang ◽  
Wei Liu

In recent years, an increasing number of knowledge bases have been built using linked data, thus datasets have grown substantially. It is neither reasonable to store a large amount of triple data in a single graph, nor appropriate to store RDF in named graphs by class URIs, because many joins can cause performance problems between graphs. This paper presents an agglomerative-adapted approach for large-scale graphs, which is also a bottom-up merging process. The proposed algorithm can partition triples data in three levels: blank nodes, associated nodes, and inference nodes. Regarding blank nodes and classes/nodes involved in reasoning rules, it is better to store with an optimal neighbor node in the same partition instead of splitting into separate partitions. The process of merging associated nodes needs to start with the node in the smallest cost and then repeat it until the final number of partitions is met. Finally, the feasibility and rationality of the merging algorithm are analyzed in detail through bibliographic cases. In summary, the partitioning methods proposed in this paper can be applied in distributed storage, data retrieval, data export, and semantic reasoning of large-scale triples graphs. In the future, we will research the automation setting of the number of partitions with machine learning algorithms.


2014 ◽  
Vol 8 (supplement) ◽  
pp. xv-xx
Author(s):  
Simon C. Lin

The Taiwan e-Learning and Digital Archives Program (TELDAP) is one of the few large-scale national programs focusing on cultural heritage among the world. It is unique due to its inter-disciplinary program that will enhance the cultural, academic, social-economic and educational values of Taiwan Digital Archives. TELDAP is archiving the repository to preserve culture heritage and to support innovative researches and applications. It is crucial that users be able to access the right content at the right context while ensuring the integrity and utility of data. How to sustain the accomplishment of TELDAP becomes an important issue. Linked data, sustainable business model and international collaboration are the keys to lead Taiwan Digital Archives to the next decade. Linking up with new data infrastructure such as DARIAH and RDA, digital archives in Taiwan could continue its work in new ways.


2017 ◽  
Vol 112 ◽  
pp. 854-863 ◽  
Author(s):  
Fatma Ghorbel ◽  
Fayçal Hamdi ◽  
Nebrasse Ellouze ◽  
Elisabeth Métais ◽  
Faiez Gargouri
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document