Digital Repositories and Linked Data: Lessons Learned and Challenges

Author(s):  
Santiago Gonzalez-Toral ◽  
Mauricio Espinoza-Mejia ◽  
Victor Saquicela
Author(s):  
David A. Weir ◽  
Stephen Murray ◽  
Pankaj Bhawnani ◽  
Douglas Rosenberg

Traditionally business areas within an organization individually manage data essential for their operation. This data may be incorporated into specialized software applications, MS Excel or MS Access etc., e-mail filing, and hardcopy documents. These applications and data stores support the local business area decision-making and add to its knowledge. There have been problems with this approach. Data, knowledge and decisions are only captured locally within the business area and in many cases this information is not easily identifiable or available for enterprise-wide sharing. Furthermore, individuals within the business areas often keep “shadow files” of data and information. The state of accuracy, completeness, and timeliness of the data contained within these files is often questionable. Information created and managed at a local business level can be lost when a staff member leaves his or her role. This is especially significant given ongoing changes in today’s workforce. Data must be properly managed and maintained to retain its value within the organization. The development and execution of “single version of the truth” or master data management requires a partnership between the business areas, records management, legal, and the information technology groups of an organization. Master data management is expected to yield significant gains in staff effectiveness, efficiency, and productivity. In 2011, Enbridge Pipelines applied the principles of master data management and trusted data digital repositories to a widely used, geographically dispersed small database (less than 10,000 records) that had noted data shortcomings such as incomplete or incorrect data, multiple shadow files, and inconsistent usage throughout the organization of the application that stewards the data. This paper provides an overview of best practices in developing an authoritative single source of data and Enbridge experience in applying these practices to a real-world example. Challenges of the approach used by Enbridge and lessons learned will be examined and discussed.


Author(s):  
William Ulate ◽  
M. Marcela Mora

Annotation (i.e., making comments on a resource) is an important part of the vision for the Semantic Web as defined by the standards set by the World Wide Web Consortium (W3C). Its goal is to make Internet-published information and data, machine-readable to better utilize it. Despite the important role that annotation plays in the Semantic Web, many cultural heritage institutions have been slow to adopt it. The access to open historical biological literature hosted in digital libraries, like the Biodiversity Heritage Library (BHL), has improved the efficiency of biodiversity research, especially in the taxonomic field. This amount of information has even greater potential for research if annotation capabilities are incorporated within those legacy digital repositories. As part of the project Consumers as Creators, developed by the Missouri Botanical Garden (MOBOT) with partners at Saint Louis University (SLU), the Web annotation needs of the botanical community were analyzed. Likewise, the practicality of using existing annotation tools to satisfy this community’s particular needs was assessed, including technical and operational considerations. To do so, 15 users of a botanical virtual library from five institutions were interviewed. Their answers were analyzed and classified taking into account the user role and purpose. Desirable functionalities of annotation software were classified into three orders of priority (Must, Should, and Could). Subsequently, six open-source annotation tools were evaluated (i.e. Digilib, hypothes.is, Pundit Annotator Pro, Recogito, rerum, and VGG Annotator) to explore if they fulfilled the annotation needs of botanists. The selected annotation tools were installed (when necessary), assessed based on different functional aspects, and their advantages and disadvantages were identified. Finally, a proof-of-concept prototype was developed to exemplify how those needs could be met within a digital library platform. Botanicus, a free portal to historic botanical literature from the Peter H. Raven Library at MOBOT, and rerum, functioning as a repository of annotations, were used to explore the implementation of a minimal subset of these requirements. A summary of the results of the assessment, the lessons learned and some of the best practices recommended are presented.


2019 ◽  
Vol 22 (2) ◽  
Author(s):  
Felipe Augusto Arakaki ◽  
Caio Saraiva Coneglian ◽  
Plácida Leopoldina Ventura Amorim da Costa Santos ◽  
José Eduardo Santarem Segundo

Considerando la expansión de la producción científica en ambientes informacionales digitales y las nuevas formas de disponibilización de datos siguiendo los principios Linked Data, se objetivó discutir posibilidades de relacionamiento de datasets y enriquecimiento semántico de metadatos en repositorios digitales. Adicionalmente, presentar un modelo de conversión de registros en RDF. Es una investigación teórica y exploratoria, que realizó una revisión bibliográfica sobre repositorios digitales y Linked Data. Se demostraron posibilidades del proceso de conversión, con la identificación de bases de datos, vocabularios y estándares que deben ser adoptados para que los datos generados sean enriquecidos semánticamente. El trabajo presenta un modelo que refleja los pasos a ser aplicados durante el proceso de disponibilización de metadatos de un repositorio digital en Linked Data. Se concluye que la integración entre repositorios digitales y las tecnologías de la Web Semántica permite la disponibilización de datos en Linked Data, la cual proporciona nuevos medios para la divulgación y la integración de los recursos en la Web. Considering the expansion of scientific production in digital information environments and the new forms of data availability following the principles of Linked Data, the objective is to discuss possibilities of relationships of datasets and semantic enrichment of metadata in digital repositories and present a model of conversion of records in RDF. It is a theoretical and exploratory research, performing a bibliographic review on digital repositories and Linked Data. In this way, we have demonstrated possibilities of the conversion process, identifying databases, vocabularies and standards that must be adopted so that the generated data is enriched semantically. The work presented a model that reflects the steps that must be taken in the process of making available the metadata of a digital repository in Linked Data. It is concluded that the integration between digital repositories and Semantic Web technologies allows the availability of data in Linked Data, which provides new means for the dissemination and integration of resources on Web.


Author(s):  
Akeem Pedro ◽  
Anh-Tuan Pham-Hang ◽  
Phong Thanh Nguyen ◽  
Hai Chien Pham

Accident, injury, and fatality rates remain disproportionately high in the construction industry. Information from past mishaps provides an opportunity to acquire insights, gather lessons learned, and systematically improve safety outcomes. Advances in data science and industry 4.0 present new unprecedented opportunities for the industry to leverage, share, and reuse safety information more efficiently. However, potential benefits of information sharing are missed due to accident data being inconsistently formatted, non-machine-readable, and inaccessible. Hence, learning opportunities and insights cannot be captured and disseminated to proactively prevent accidents. To address these issues, a novel information sharing system is proposed utilizing linked data, ontologies, and knowledge graph technologies. An ontological approach is developed to semantically model safety information and formalize knowledge pertaining to accident cases. A multi-algorithmic approach is developed for automatically processing and converting accident case data to a resource description framework (RDF), and the SPARQL protocol is deployed to enable query functionalities. Trials and test scenarios utilizing a dataset of 200 real accident cases confirm the effectiveness and efficiency of the system in improving information access, retrieval, and reusability. The proposed development facilitates a new “open” information sharing paradigm with major implications for industry 4.0 and data-driven applications in construction safety management.


2017 ◽  
Vol 3 ◽  
pp. e105 ◽  
Author(s):  
Anastasia Dimou ◽  
Sahar Vahdati ◽  
Angelo Di Iorio ◽  
Christoph Lange ◽  
Ruben Verborgh ◽  
...  

While most challenges organized so far in the Semantic Web domain are focused on comparing tools with respect to different criteria such as their features and competencies, or exploiting semantically enriched data, the Semantic Web Evaluation Challenges series, co-located with the ESWC Semantic Web Conference, aims to compare them based on their output, namely the produced dataset. The Semantic Publishing Challenge is one of these challenges. Its goal is to involve participants in extracting data from heterogeneous sources on scholarly publications, and producing Linked Data that can be exploited by the community itself. This paper reviews lessons learned from both (i) the overall organization of the Semantic Publishing Challenge, regarding the definition of the tasks, building the input dataset and forming the evaluation, and (ii) the results produced by the participants, regarding the proposed approaches, the used tools, the preferred vocabularies and the results produced in the three editions of 2014, 2015 and 2016. We compared these lessons to other Semantic Web Evaluation Challenges. In this paper, we (i) distill best practices for organizing such challenges that could be applied to similar events, and (ii) report observations on Linked Data publishing derived from the submitted solutions. We conclude that higher quality may be achieved when Linked Data is produced as a result of a challenge, because the competition becomes an incentive, while solutions become better with respect to Linked Data publishing best practices when they are evaluated against the rules of the  challenge.


2016 ◽  
Author(s):  
Anastasia Dimou ◽  
Sahar Vahdati ◽  
Angelo Di Iorio ◽  
Christoph Lange ◽  
Ruben Verborgh ◽  
...  

While most challenges organized so far in the Semantic Web domain are focused on comparing tools with respect to different criteria such as their features and competencies, or exploiting semantically enriched data, the Semantic Web Evaluation Challenges series, co-located with the ESWC Semantic Web Conference, aims to compare them based on their output, namely the produced dataset. The Semantic Publishing Challenge is one of these challenges. Its goal is to involve participants in extracting data from heterogeneous sources on scholarly publications, and producing Linked Data that can be exploited by the community itself. This paper reviews lessons learned from both (i) the overall organization of the Semantic Publishing Challenge, regarding the definition of the tasks, building the input dataset and forming the evaluation, and (ii) the results produced by the participants, regarding the proposed approaches, the used tools, the preferred vocabularies and the results produced in the three editions of 2014, 2015 and 2016. We compared these lessons to other Semantic Web Evaluation challenges. In this paper, we (i) distill best practices for organizing such challenges that could be applied to similar events, and (ii) report observations on Linked Data publishing derived from the submitted solutions. We conclude that higher quality may be achieved when Linked Data is produced as a result of a challenge, because the competition becomes an incentive, while solutions become better with respect to Linked Data publishing best practices when they are evaluated against the rules of the challenge.


2016 ◽  
Author(s):  
Anastasia Dimou ◽  
Sahar Vahdati ◽  
Angelo Di Iorio ◽  
Christoph Lange ◽  
Ruben Verborgh ◽  
...  

While most challenges organized so far in the Semantic Web domain are focused on comparing tools with respect to different criteria such as their features and competencies, or exploiting semantically enriched data, the Semantic Web Evaluation Challenges series, co-located with the ESWC Semantic Web Conference, aims to compare them based on their output, namely the produced dataset. The Semantic Publishing Challenge is one of these challenges. Its goal is to involve participants in extracting data from heterogeneous sources on scholarly publications, and producing Linked Data that can be exploited by the community itself. This paper reviews lessons learned from both (i) the overall organization of the Semantic Publishing Challenge, regarding the definition of the tasks, building the input dataset and forming the evaluation, and (ii) the results produced by the participants, regarding the proposed approaches, the used tools, the preferred vocabularies and the results produced in the three editions of 2014, 2015 and 2016. We compared these lessons to other Semantic Web Evaluation challenges. In this paper, we (i) distill best practices for organizing such challenges that could be applied to similar events, and (ii) report observations on Linked Data publishing derived from the submitted solutions. We conclude that higher quality may be achieved when Linked Data is produced as a result of a challenge, because the competition becomes an incentive, while solutions become better with respect to Linked Data publishing best practices when they are evaluated against the rules of the challenge.


2018 ◽  
Vol 9 (2) ◽  
pp. 1-20
Author(s):  
Alia O. Bahanshal ◽  
Hend S. Al-Khalifa

Nowadays, research in Linked Data has significantly advanced and it now entails enormous area of applications like research publications and data sets. Its flexibility and effectiveness in handling and linking data from numerous sources has made Linked Data more popular. The aim of this article is to systematically present literature review of Linked Data and its development since 2009. Moreover, cumulative experiences and lessons learned from recent years will be highlighted. Findings showed that Linked Data has grown in the past five years in terms of number of datasets, research publications and domain-specific applications.


Sign in / Sign up

Export Citation Format

Share Document