Digital repository as a service: automatic deployment of an Invenio-based repository using TOSCA orchestration and Apache Mesos

In the framework of the H2020 INDIGO-DataCloud project, we have implemented an advanced solution for the automatic deployment of digital data repositories based on Invenio, the digital library framework developed by CERN. Exploiting cutting-edge technologies, such as Docker and Apache Mesos, and standard specifications to describe application architectures such as TOSCA, we are able to provide a service that simplifies the process of creating and managing repositories of various digital assets using cloud resources. An Invenio-based repository consists of a set of services (e.g. database, message queue, cache, workers and frontend) that need to be properly installed, configured and linked together. These operations, along with the provisioning of the resources and their monitoring and maintenance, can be challenging for individual researchers or small-to-moderate-sized research groups. To this purpose, the INDIGO-DataCloud platform provides advanced features for orchestrating the deployment of complex virtual infrastructures on distributed cloud environments: it is able to provision the required resources automatically over heterogeneous and/or hybrid cloud infrastructures and to configure them automatically ensuring dynamic elasticity and resilience. This approach has been successfully adapted to support the needs of the researchers and scholars in the domain of the Digital Arts and Humanities.

Download Full-text

Funding models for Open Access digital data repositories

Online Information Review ◽

10.1108/oir-01-2015-0031 ◽

2015 ◽

Vol 39 (5) ◽

pp. 664-681 ◽

Cited By ~ 11

Author(s):

Rob Kitchin ◽

Sandra Collins ◽

Dermot Frost

Keyword(s):

Open Access ◽

Digital Data ◽

Digital Repository ◽

Forward Solution ◽

Data Repositories ◽

Digital Collections ◽

Content Type ◽

Funding Model ◽

Service Volunteer ◽

Full Core

Purpose – The purpose of this paper is to examine funding models for Open Access (OA) digital data repositories whose costs are not wholly core funded. Whilst such repositories are free to access, they are not without significant cost to build and maintain and the lack of both full core costs and a direct funding stream through payment-for-use poses a considerable financial challenge, placing their future and the digital collections they hold at risk. Design/methodology/approach – The authors document 14 different potential funding streams for OA digital data repositories, grouped into six classes (institutional, philanthropy, research, audience, service, volunteer), drawing on the ongoing experiences of seeking a sustainable funding for the Digital Repository of Ireland (DRI). Findings – There is no straight forward solution to funding OA digital data repositories that are not wholly core funded, with a number of general and specific challenges facing each repository, and each funding model having strengths and weaknesses. The proposed DRI solution is the adoption of a blended approach that seeks to ameliorate cyclical effects across funding streams by generating income from a number of sources rather than overly relying on a single one, though it is still reliant on significant state core funding to be viable. Practical implications – The detailing of potential funding streams offers practical financial solutions to other OA digital data repositories which are seeking a means to become financially sustainable in the absence of full core funding. Originality/value – The review assesses and provides concrete advice with respect to potential funding streams in order to help repository owners address the financing conundrum they face.

Download Full-text

LP-WSC: a linear programming approach for web service composition in geographically distributed cloud environments

The Journal of Supercomputing ◽

10.1007/s11227-018-2656-3 ◽

2018 ◽

Vol 75 (5) ◽

pp. 2603-2628 ◽

Cited By ~ 16

Author(s):

Mostafa Ghobaei-Arani ◽

Alireza Souri

Keyword(s):

Linear Programming ◽

Web Service ◽

Service Composition ◽

Web Service Composition ◽

Programming Approach ◽

Linear Programming Approach ◽

Geographically Distributed ◽

Cloud Environments ◽

Distributed Cloud

Download Full-text

Secure Data Deduplication in Cloud Environments

10.32920/ryerson.14654145.v1 ◽

2021 ◽

Author(s):

Fatema Rashid

Keyword(s):

Video Processing ◽

Service Providers ◽

Cloud Service ◽

Error Correcting Codes ◽

Digital Data ◽

Data Deduplication ◽

Data Types ◽

Efficient Manner ◽

Secure Data ◽

Cloud Environments

With the tremendous growth of available digital data, the use of Cloud Service Providers (CSPs) are gaining more popularity, since these types of services promise to provide convenient and efficient storage services to end-users by taking advantage of a new set of benefits and savings offered by cloud technologies in terms of computational, storage, bandwidth, and transmission costs. In order to achieve savings in storage, CSPs often employ data dedplication techniques to eliminate duplicated data. However, benefits gained through these techniques have to balanced against users' privacy concerns, as these techniques typically require full access to data. In this thesis, we propose solutions for different data types (text, image and video) for secure data deduplication in cloud environments. Our schemes allow users to upload their data in a secure and efficient manner such that neither a semi-honest CSP nor a malicious user can access or compromise the security of the data. We use different image and video processing techniques, such as data compression, in order to further improve the efficiency of our proposed schemes. The security of the deduplication schemes is provided by applying suitable encryption schemes and error correcting codes. Moreover, we propose proof of storage protocols including Proof of Retrievability (POR) and Proof of Ownership (POW) so that users of cloud storage services are able to ensure that their data has been saved in the cloud without tampering or manipulation. Experimental results are provided to validate the effectiveness of the proposed schemes.

Download Full-text

Secure Data Deduplication in Cloud Environments

10.32920/ryerson.14654145 ◽

2021 ◽

Author(s):

Fatema Rashid

Keyword(s):

Video Processing ◽

Service Providers ◽

Cloud Service ◽

Error Correcting Codes ◽

Digital Data ◽

Data Deduplication ◽

Data Types ◽

Efficient Manner ◽

Secure Data ◽

Cloud Environments

Download Full-text

Data librarianship as a field study

Transinformação ◽

10.1590/2318-0889202032e200034 ◽

2020 ◽

Vol 32 ◽

Author(s):

Alexandre Ribas SEMELER ◽

Adilson Luiz PINTO

Keyword(s):

Primary Source ◽

Digital Data ◽

Venn Diagram ◽

Practical Application ◽

Data Repositories ◽

Core Skills ◽

Data Definition ◽

Abstract Data ◽

Theoretical Foundations ◽

And Training

Abstract Data are generated during all human activities related to digital technology. In recent times, scientific research has increasingly opted for digital data as its primary source of data; data definition changes for different disciplines and researchers. In this context, we study the main characteristics of data librarianship as a specialized field of traditional librarianship concerned with data use in libraries. Our work is organized as follows: First, we present a proposed Venn diagram on the theoretical foundations of data librarianship; then, we point out the core skills needed by data librarians. Based on a non-exhaustive literature review, we point out the main topics of research in data librarianship. We describe the significance of research data, data management, data curatorship, and data repositories. Finally, we list a few certification courses in data librarianship. We conclude that data librarianship plays a dynamic role in the practical application of data technologies in libraries, and that professional development, certification, and training in data librarianship are interdisciplinary tasks linked to digital technologies.

Download Full-text

Governance by Data

Annual Review of Law and Social Science ◽

10.1146/annurev-lawsocsci-120920-085138 ◽

2021 ◽

Vol 17 (1) ◽

Author(s):

Fleur Johns

Keyword(s):

Social Science ◽

Digital Data ◽

Annual Review ◽

Publication Date ◽

Time Data ◽

Data Repositories ◽

Data Flows ◽

Hybrid Data ◽

State Governance ◽

Digital Objects

Law and social science scholars have long elucidated ways of governing built around state governance of populations and subjects. Yet many are now grappling with the growing prevalence of practices of governance that depart, to varying degrees, from received models. The profusion of digital data, and the deployment of machine learning in its analysis, are redirecting states’ and international organizations’ attention away from the governance of populations as such and toward the amassing, analysis, and mobilization of hybrid data repositories and real-time data flows for governance. Much of this work does not depend on state data sources or on conventional statistical models. The subjectivities nurtured by these techniques of governance are frequently not those of choosing individuals. Digital objects and mediators are increasingly prevalent at all scales. This article surveys how scholars are beginning to understand the nascent political technologies associated with this shift toward governance by data. Expected final online publication date for the Annual Review of Law and Social Science, Volume 17 is October 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

An API for Development of User-Defined Scheduling Algorithms in Aneka PaaS Cloud Software

Handbook of Research on Cloud Computing and Big Data Applications in IoT - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-5225-8407-0.ch009 ◽

2019 ◽

pp. 170-187 ◽

Cited By ~ 1

Author(s):

Rajinder Sandhu ◽

Adel Nadjaran Toosi ◽

Rajkumar Buyya

Keyword(s):

Cloud Computing ◽

Real Time ◽

Scheduling Algorithms ◽

Research Area ◽

Programming Models ◽

Hybrid Cloud ◽

Main Research ◽

Cloud Application ◽

Cloud Environments ◽

Cloud Infrastructures

Cloud computing provides resources using multitenant architecture where infrastructure is created from one or more distributed datacenters. Scheduling of applications in cloud infrastructures is one of the main research area in cloud computing. Researchers have developed many scheduling algorithms and evaluated them using simulators such as CloudSim. Their performance needs to be validated in real-time cloud environments to improve their usefulness. Aneka is one of the prominent PaaS software which allows users to develop cloud application using various programming models and underline infrastructure. This chapter presents a scheduling API developed for the Aneka software platform. Users can develop their own scheduling algorithms using this API and integrate it with Aneka to test their scheduling algorithms in real cloud environments. The proposed API provides all the required functionalities to integrate and schedule private, public, or hybrid cloud with the Aneka software.

Download Full-text

Cloud Monitoring

Achieving Federated and Self-Manageable Cloud Infrastructures ◽

10.4018/978-1-4666-1631-8.ch006 ◽

2012 ◽

pp. 97-116

Author(s):

Peer Hasselmeyer ◽

Gregory Katsaros ◽

Bastian Koller ◽

Philipp Wieder

Keyword(s):

State Of The Art ◽

The State ◽

Cloud Environment ◽

State Health ◽

Cloud Monitoring ◽

Cloud Environments ◽

And Performance ◽

Cloud Infrastructures ◽

Different Levels ◽

The Way

The management of the entire service landscape comprising a Cloud environment is a complex and challenging venture. There, one task of utmost importance, is the generation and processing of information about the state, health, and performance of the various services and IT components, something which is generally referred to as monitoring. Such information is the foundation for proper assessment and management of the whole Cloud. This chapter pursues two objectives: first, to provide an overview of monitoring in Cloud environments and, second, to propose a solution for interoperable and vendor-independent Cloud monitoring. Along the way, the authors motivate the necessity of monitoring at the different levels of Cloud infrastructures, introduce selected state-of-the-art, and extract requirements for Cloud monitoring. Based on these requirements, the following sections depict a Cloud monitoring solution and describe current developments towards interoperable, open, and extensible Cloud monitoring frameworks.

Download Full-text

Participatory governance over research in an academic research network: the case of Diabetes Action Canada

BMJ Open ◽

10.1136/bmjopen-2018-026828 ◽

2019 ◽

Vol 9 (4) ◽

pp. e026828 ◽

Cited By ~ 1

Author(s):

Donald J Willison ◽

Joslyn Trowbridge ◽

Michelle Greiver ◽

Karim Keshavjee ◽

Doug Mumford ◽

...

Keyword(s):

Clinical Care ◽

Academic Research ◽

Digital Data ◽

Research Network ◽

Data Governance ◽

Data Repositories ◽

Continuous Quality ◽

Policies And Procedures ◽

Wide Range ◽

Governance Process

Digital data generated in the course of clinical care are increasingly being leveraged for a wide range of secondary purposes. Researchers need to develop governance policies that can assure the public that their information is being used responsibly. Our aim was to develop a generalisable model for governance of research emanating from health data repositories that will invoke the trust of the patients and the healthcare professionals whose data are being accessed for health research. We developed our governance principles and processes through literature review and iterative consultation with key actors in the research network including: a data governance working group, the lead investigators and patient advisors. We then recruited persons to participate in the governing and advisory bodies. Our governance process is informed by eight principles: (1) transparency; (2) accountability; (3) follow rule of law; (4) integrity; (5) participation and inclusiveness; (6) impartiality and independence; (7) effectiveness, efficiency and responsiveness and (8) reflexivity and continuous quality improvement. We describe the rationale for these principles, as well as their connections to the subsequent policies and procedures we developed. We then describe the function of the Research Governing Committee, the majority of whom are either persons living with diabetes or physicians whose data are being used, and the patient and data provider advisory groups with whom they consult and communicate. In conclusion, we have developed a values-based information governance framework and process for Diabetes Action Canada that adds value over-and-above existing scientific and ethics review processes by adding a strong patient perspective and contextual integrity. This model is adaptable to other secure data repositories.

Download Full-text

Towards Distributed Association Rule Mining Privacy

Application of Agents and Intelligent Information Technologies - Advances in Intelligent Information Technologies ◽

10.4018/978-1-59904-265-7.ch011 ◽

2011 ◽

pp. 245-271

Author(s):

Mafruz Ashrafi ◽

David Taniar ◽

Kate Smith

Keyword(s):

Data Mining ◽

Data Privacy ◽

Large Data ◽

Digital Data ◽

Sensitive Information ◽

Distributed Data ◽

Data Repositories ◽

Actionable Knowledge ◽

The Cost ◽

Network Technologies

With the advancement of storage, retrieval, and network technologies today, the amount of information available to each organization is literally exploding. Although it is widely recognized that the value of data as an organizational asset often becomes a liability because of the cost to acquire and manage those data is far more than the value that is derived from it. Thus, the success of modern organizations not only relies on their capability to acquire and manage their data but their efficiency to derive useful actionable knowledge from it. To explore and analyze large data repositories and discover useful actionable knowledge from them, modern organizations have used a technique known as data mining, which analyzes voluminous digital data and discovers hidden but useful patterns from such massive digital data. However, discovery of hidden patterns has statistical meaning and may often disclose some sensitive information. As a result, privacy becomes one of the prime concerns in the data-mining research community. Since distributed data mining discovers rules by combining local models from various distributed sites, breaching data privacy happens more often than it does in centralized environments.

Download Full-text