scholarly journals Digital repository as a service: automatic deployment of an Invenio-based repository using TOSCA orchestration and Apache Mesos

2019 ◽  
Vol 214 ◽  
pp. 07023 ◽  
Author(s):  
Marica Antonacci ◽  
Alberto Brigandì ◽  
Miguel Caballer ◽  
Eva Cetinić ◽  
Davor Davidovic ◽  
...  

In the framework of the H2020 INDIGO-DataCloud project, we have implemented an advanced solution for the automatic deployment of digital data repositories based on Invenio, the digital library framework developed by CERN. Exploiting cutting-edge technologies, such as Docker and Apache Mesos, and standard specifications to describe application architectures such as TOSCA, we are able to provide a service that simplifies the process of creating and managing repositories of various digital assets using cloud resources. An Invenio-based repository consists of a set of services (e.g. database, message queue, cache, workers and frontend) that need to be properly installed, configured and linked together. These operations, along with the provisioning of the resources and their monitoring and maintenance, can be challenging for individual researchers or small-to-moderate-sized research groups. To this purpose, the INDIGO-DataCloud platform provides advanced features for orchestrating the deployment of complex virtual infrastructures on distributed cloud environments: it is able to provision the required resources automatically over heterogeneous and/or hybrid cloud infrastructures and to configure them automatically ensuring dynamic elasticity and resilience. This approach has been successfully adapted to support the needs of the researchers and scholars in the domain of the Digital Arts and Humanities.

2015 ◽  
Vol 39 (5) ◽  
pp. 664-681 ◽  
Author(s):  
Rob Kitchin ◽  
Sandra Collins ◽  
Dermot Frost

Purpose – The purpose of this paper is to examine funding models for Open Access (OA) digital data repositories whose costs are not wholly core funded. Whilst such repositories are free to access, they are not without significant cost to build and maintain and the lack of both full core costs and a direct funding stream through payment-for-use poses a considerable financial challenge, placing their future and the digital collections they hold at risk. Design/methodology/approach – The authors document 14 different potential funding streams for OA digital data repositories, grouped into six classes (institutional, philanthropy, research, audience, service, volunteer), drawing on the ongoing experiences of seeking a sustainable funding for the Digital Repository of Ireland (DRI). Findings – There is no straight forward solution to funding OA digital data repositories that are not wholly core funded, with a number of general and specific challenges facing each repository, and each funding model having strengths and weaknesses. The proposed DRI solution is the adoption of a blended approach that seeks to ameliorate cyclical effects across funding streams by generating income from a number of sources rather than overly relying on a single one, though it is still reliant on significant state core funding to be viable. Practical implications – The detailing of potential funding streams offers practical financial solutions to other OA digital data repositories which are seeking a means to become financially sustainable in the absence of full core funding. Originality/value – The review assesses and provides concrete advice with respect to potential funding streams in order to help repository owners address the financing conundrum they face.


2021 ◽  
Author(s):  
Fatema Rashid

With the tremendous growth of available digital data, the use of Cloud Service Providers (CSPs) are gaining more popularity, since these types of services promise to provide convenient and efficient storage services to end-users by taking advantage of a new set of benefits and savings offered by cloud technologies in terms of computational, storage, bandwidth, and transmission costs. In order to achieve savings in storage, CSPs often employ data dedplication techniques to eliminate duplicated data. However, benefits gained through these techniques have to balanced against users' privacy concerns, as these techniques typically require full access to data. In this thesis, we propose solutions for different data types (text, image and video) for secure data deduplication in cloud environments. Our schemes allow users to upload their data in a secure and efficient manner such that neither a semi-honest CSP nor a malicious user can access or compromise the security of the data. We use different image and video processing techniques, such as data compression, in order to further improve the efficiency of our proposed schemes. The security of the deduplication schemes is provided by applying suitable encryption schemes and error correcting codes. Moreover, we propose proof of storage protocols including Proof of Retrievability (POR) and Proof of Ownership (POW) so that users of cloud storage services are able to ensure that their data has been saved in the cloud without tampering or manipulation. Experimental results are provided to validate the effectiveness of the proposed schemes.


2021 ◽  
Author(s):  
Fatema Rashid

With the tremendous growth of available digital data, the use of Cloud Service Providers (CSPs) are gaining more popularity, since these types of services promise to provide convenient and efficient storage services to end-users by taking advantage of a new set of benefits and savings offered by cloud technologies in terms of computational, storage, bandwidth, and transmission costs. In order to achieve savings in storage, CSPs often employ data dedplication techniques to eliminate duplicated data. However, benefits gained through these techniques have to balanced against users' privacy concerns, as these techniques typically require full access to data. In this thesis, we propose solutions for different data types (text, image and video) for secure data deduplication in cloud environments. Our schemes allow users to upload their data in a secure and efficient manner such that neither a semi-honest CSP nor a malicious user can access or compromise the security of the data. We use different image and video processing techniques, such as data compression, in order to further improve the efficiency of our proposed schemes. The security of the deduplication schemes is provided by applying suitable encryption schemes and error correcting codes. Moreover, we propose proof of storage protocols including Proof of Retrievability (POR) and Proof of Ownership (POW) so that users of cloud storage services are able to ensure that their data has been saved in the cloud without tampering or manipulation. Experimental results are provided to validate the effectiveness of the proposed schemes.


2020 ◽  
Vol 32 ◽  
Author(s):  
Alexandre Ribas SEMELER ◽  
Adilson Luiz PINTO

Abstract Data are generated during all human activities related to digital technology. In recent times, scientific research has increasingly opted for digital data as its primary source of data; data definition changes for different disciplines and researchers. In this context, we study the main characteristics of data librarianship as a specialized field of traditional librarianship concerned with data use in libraries. Our work is organized as follows: First, we present a proposed Venn diagram on the theoretical foundations of data librarianship; then, we point out the core skills needed by data librarians. Based on a non-exhaustive literature review, we point out the main topics of research in data librarianship. We describe the significance of research data, data management, data curatorship, and data repositories. Finally, we list a few certification courses in data librarianship. We conclude that data librarianship plays a dynamic role in the practical application of data technologies in libraries, and that professional development, certification, and training in data librarianship are interdisciplinary tasks linked to digital technologies.


Author(s):  
Fleur Johns

Law and social science scholars have long elucidated ways of governing built around state governance of populations and subjects. Yet many are now grappling with the growing prevalence of practices of governance that depart, to varying degrees, from received models. The profusion of digital data, and the deployment of machine learning in its analysis, are redirecting states’ and international organizations’ attention away from the governance of populations as such and toward the amassing, analysis, and mobilization of hybrid data repositories and real-time data flows for governance. Much of this work does not depend on state data sources or on conventional statistical models. The subjectivities nurtured by these techniques of governance are frequently not those of choosing individuals. Digital objects and mediators are increasingly prevalent at all scales. This article surveys how scholars are beginning to understand the nascent political technologies associated with this shift toward governance by data. Expected final online publication date for the Annual Review of Law and Social Science, Volume 17 is October 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Author(s):  
Rajinder Sandhu ◽  
Adel Nadjaran Toosi ◽  
Rajkumar Buyya

Cloud computing provides resources using multitenant architecture where infrastructure is created from one or more distributed datacenters. Scheduling of applications in cloud infrastructures is one of the main research area in cloud computing. Researchers have developed many scheduling algorithms and evaluated them using simulators such as CloudSim. Their performance needs to be validated in real-time cloud environments to improve their usefulness. Aneka is one of the prominent PaaS software which allows users to develop cloud application using various programming models and underline infrastructure. This chapter presents a scheduling API developed for the Aneka software platform. Users can develop their own scheduling algorithms using this API and integrate it with Aneka to test their scheduling algorithms in real cloud environments. The proposed API provides all the required functionalities to integrate and schedule private, public, or hybrid cloud with the Aneka software.


Author(s):  
Peer Hasselmeyer ◽  
Gregory Katsaros ◽  
Bastian Koller ◽  
Philipp Wieder

The management of the entire service landscape comprising a Cloud environment is a complex and challenging venture. There, one task of utmost importance, is the generation and processing of information about the state, health, and performance of the various services and IT components, something which is generally referred to as monitoring. Such information is the foundation for proper assessment and management of the whole Cloud. This chapter pursues two objectives: first, to provide an overview of monitoring in Cloud environments and, second, to propose a solution for interoperable and vendor-independent Cloud monitoring. Along the way, the authors motivate the necessity of monitoring at the different levels of Cloud infrastructures, introduce selected state-of-the-art, and extract requirements for Cloud monitoring. Based on these requirements, the following sections depict a Cloud monitoring solution and describe current developments towards interoperable, open, and extensible Cloud monitoring frameworks.


BMJ Open ◽  
2019 ◽  
Vol 9 (4) ◽  
pp. e026828 ◽  
Author(s):  
Donald J Willison ◽  
Joslyn Trowbridge ◽  
Michelle Greiver ◽  
Karim Keshavjee ◽  
Doug Mumford ◽  
...  

Digital data generated in the course of clinical care are increasingly being leveraged for a wide range of secondary purposes. Researchers need to develop governance policies that can assure the public that their information is being used responsibly. Our aim was to develop a generalisable model for governance of research emanating from health data repositories that will invoke the trust of the patients and the healthcare professionals whose data are being accessed for health research. We developed our governance principles and processes through literature review and iterative consultation with key actors in the research network including: a data governance working group, the lead investigators and patient advisors. We then recruited persons to participate in the governing and advisory bodies. Our governance process is informed by eight principles: (1) transparency; (2) accountability; (3) follow rule of law; (4) integrity; (5) participation and inclusiveness; (6) impartiality and independence; (7) effectiveness, efficiency and responsiveness and (8) reflexivity and continuous quality improvement. We describe the rationale for these principles, as well as their connections to the subsequent policies and procedures we developed. We then describe the function of the Research Governing Committee, the majority of whom are either persons living with diabetes or physicians whose data are being used, and the patient and data provider advisory groups with whom they consult and communicate. In conclusion, we have developed a values-based information governance framework and process for Diabetes Action Canada that adds value over-and-above existing scientific and ethics review processes by adding a strong patient perspective and contextual integrity. This model is adaptable to other secure data repositories.


Author(s):  
Mafruz Ashrafi ◽  
David Taniar ◽  
Kate Smith

With the advancement of storage, retrieval, and network technologies today, the amount of information available to each organization is literally exploding. Although it is widely recognized that the value of data as an organizational asset often becomes a liability because of the cost to acquire and manage those data is far more than the value that is derived from it. Thus, the success of modern organizations not only relies on their capability to acquire and manage their data but their efficiency to derive useful actionable knowledge from it. To explore and analyze large data repositories and discover useful actionable knowledge from them, modern organizations have used a technique known as data mining, which analyzes voluminous digital data and discovers hidden but useful patterns from such massive digital data. However, discovery of hidden patterns has statistical meaning and may often disclose some sensitive information. As a result, privacy becomes one of the prime concerns in the data-mining research community. Since distributed data mining discovers rules by combining local models from various distributed sites, breaching data privacy happens more often than it does in centralized environments.


Sign in / Sign up

Export Citation Format

Share Document