scholarly journals From Persistent Identifiers to Digital Objects to Make Data Science More Efficient

2019 ◽  
Vol 1 (1) ◽  
pp. 6-21 ◽  
Author(s):  
Peter Wittenburg

Data-intensive science is reality in large scientific organizations such as the Max Planck Society, but due to the inefficiency of our data practices when it comes to integrating data from different sources, many projects cannot be carried out and many researchers are excluded. Since about 80% of the time in data-intensive projects is wasted according to surveys we need to conclude that we are not fit for the challenges that will come with the billions of smart devices producing continuous streams of data—our methods do not scale. Therefore experts worldwide are looking for strategies and methods that have a potential for the future. The first steps have been made since there is now a wide agreement from the Research Data Alliance to the FAIR principles that data should be associated with persistent identifiers (PID) and metadata (MD). In fact after 20 years of experience we can claim that there are trustworthy PID systems already in broad use. It is argued, however, that assigning PIDs is just the first step. If we agree to assign PIDs and also use the PID to store important relationships such as pointing to locations where the bit sequences or different metadata can be accessed, we are close to defining Digital Objects (DO) which could indeed indicate a solution to solve some of the basic problems in data management and processing. In addition to standardizing the way we assign PIDs, metadata and other state information we could also define a Digital Object Access Protocol as a universal exchange protocol for DOs stored in repositories using different data models and data organizations. We could also associate a type with each DO and a set of operations allowed working on its content which would facilitate the way to automatic processing which has been identified as the major step for scalability in data science and data industry. A globally connected group of experts is now working on establishing testbeds for a DO-based data infrastructure.

2021 ◽  
Vol 8 (1) ◽  
pp. 205395172199603
Author(s):  
Nathaniel Tkacz ◽  
Mário Henrique da Mata Martins ◽  
João Porto de Albuquerque ◽  
Flávio Horita ◽  
Giovanni Dolif Neto

This article adapts the ethnographic medium of the diary to develop a method for studying data and related data practices. The article focuses on the creation of one data diary, developed iteratively over three years in the context of a national centre for monitoring disasters and natural hazards in Brazil (Cemaden). We describe four points of focus involved in the creation of a data diary – spaces, interfaces, types and situations – before reflecting on the value of this method. We suggest data diaries (1) are able to capture the informal dimension of data-intensive organisations; (2) enable empirical analysis of the specific ways that data intervene in the unfolding of situations; and (3) as a document, data diaries can foster interdisciplinary and inter-expert dialogue by bridging different ways of knowing data.


2019 ◽  
Author(s):  
Mia Partlow ◽  
Karen Ciccone ◽  
Margaret Peak

Presentation given at TRLN Annual Meeting, Durham, North Carolina, July 1, 2019. The Hunt Library Dataspace was launched in August 2018 to provide students with access to the tools and support they need to develop critical data skills and perform data intensive tasks. It is outfitted with specialized computing hardware and software and staffed by graduate student Data Science Consultants who provide drop-in support for programming, data analysis, statistical analysis, visualization, and other data-related topics.Prior to launching the Dataspace the Libraries’ Director of Planning and Research worked with the Data & Visualization Services department to develop a plan for assessing the new Dataspace services. The process began with identifying relevant goals based on NC State University and the NC State University Libraries’ strategic priorities. Next we identified measures that would assess our success in relation to those goals. This talk describes the assessment planning process, the measures and methods employed, outcomes, and how this information will be used to improve our services and inform new service development.


2020 ◽  
Vol 6 ◽  
Author(s):  
Christoph Steinbeck ◽  
Oliver Koepler ◽  
Felix Bach ◽  
Sonja Herres-Pawlis ◽  
Nicole Jung ◽  
...  

The vision of NFDI4Chem is the digitalisation of all key steps in chemical research to support scientists in their efforts to collect, store, process, analyse, disclose and re-use research data. Measures to promote Open Science and Research Data Management (RDM) in agreement with the FAIR data principles are fundamental aims of NFDI4Chem to serve the chemistry community with a holistic concept for access to research data. To this end, the overarching objective is the development and maintenance of a national research data infrastructure for the research domain of chemistry in Germany, and to enable innovative and easy to use services and novel scientific approaches based on re-use of research data. NFDI4Chem intends to represent all disciplines of chemistry in academia. We aim to collaborate closely with thematically related consortia. In the initial phase, NFDI4Chem focuses on data related to molecules and reactions including data for their experimental and theoretical characterisation. This overarching goal is achieved by working towards a number of key objectives: Key Objective 1: Establish a virtual environment of federated repositories for storing, disclosing, searching and re-using research data across distributed data sources. Connect existing data repositories and, based on a requirements analysis, establish domain-specific research data repositories for the national research community, and link them to international repositories. Key Objective 2: Initiate international community processes to establish minimum information (MI) standards for data and machine-readable metadata as well as open data standards in key areas of chemistry. Identify and recommend open data standards in key areas of chemistry, in order to support the FAIR principles for research data. Finally, develop standards, if there is a lack. Key Objective 3: Foster cultural and digital change towards Smart Laboratory Environments by promoting the use of digital tools in all stages of research and promote subsequent Research Data Management (RDM) at all levels of academia, beginning in undergraduate studies curricula. Key Objective 4: Engage with the chemistry community in Germany through a wide range of measures to create awareness for and foster the adoption of FAIR data management. Initiate processes to integrate RDM and data science into curricula. Offer a wide range of training opportunities for researchers. Key Objective 5: Explore synergies with other consortia and promote cross-cutting development within the NFDI. Key Objective 6: Provide a legally reliable framework of policies and guidelines for FAIR and open RDM.


2021 ◽  
Vol 11 (4) ◽  
pp. 80-99
Author(s):  
Syed Imran Jami ◽  
Siraj Munir

Recent trends in data-intensive experiments require extensive computing and storage resources that are now handled using cloud resources. Industry experts and researchers use cloud-based services and resources to get analytics of their data to avoid inter-organizational issues including power overhead on local machines, cost associated with maintaining and running infrastructure, etc. This article provides detailed review of selected metrics for cloud computing according to the requirements of data science and big data that includes (1) load balancing, (2) resource scheduling, (3) resource allocation, (4) resource sharing, and (5) job scheduling. The major contribution of this review is the inclusion of these metrics collectively which is the first attempt towards evaluating the latest systems in the context of data science. The detailed analysis shows that cloud computing needs research in its association with data-intensive experiments with emphasis on the resource scheduling area.


2013 ◽  
pp. 74-86
Author(s):  
David Giaretta

To preserve digitally encoded information over a long term following the OAIS Reference Model requires that the information remains accessible, understandable and usable by a specified Designated Community. These are significant challenges for repositories. It will be argued that infrastructure which is needed to support this preservation must be seen in the context of the broader science data infrastructure which international and national funders seek to put in place. Moreover aspects of the preservation components of this infrastructure must themselves be preservable, resulting in a recursive system which must also be highly adaptable, loosely coupled and asynchronous. Even more difficult is to be able to judge whether any proposal is actually likely to be effective. From the earliest discussions of concerns about the preservability of digital objects there have been calls for some way of judging the quality of digital repositories. In this chapter several interrelated efforts which contribute to solutions for these issues will be outlined. Evidence about the challenges which must be overcome and the consistency of demands across nations, disciplines and organisations will be presented, based on extensive surveys which have been carried out by the PARSE.Insight project (http://www.parse-insight.eu). The key points about the revision of the OAIS Reference Model which is underway will be provided; OAIS provides many of the key concepts which underpin the efforts to judge solutions. In the past few years the Trustworthy Repositories Audit and Certification: Criteria and Checklist (TRAC) document has been produced, as well as a number of related checklists. These efforts provide the background of the international effort (the RAC Working Group http://wiki.digitalrepositoryauditandcertification.org) to produce a full ISO standard on which an accreditation and certification process can be built. If successful this standard and associated processes will allow funders to have an independent evaluation of the effectiveness of the archives they support and data producers to have a basis for deciding which repository to entrust with their valuable data. It could shape the digital preservation market. The CASPAR project (http://www.casparpreserves.eu) is an EU part funded project with total spend of 16MEuros which is trying to faithfully implement almost all aspects of the OAIS Reference Model in particular the Information Model. The latter involves tools for capturing all types of Representation Information (Structure, Semantics and all Other types), and tools for defining the Designated Community. This chapter will describe implementations of tools and infrastructure components to support repositories in their task of long term preservation of digital resources, including the capture and preservation of digital rights management and evidence of authenticity associated with digital objects. In order to justify their existence, most repositories must also support contemporaneous use of contemporary as well as “historical” resources; the authors will show how the same techniques can support both, and hence link to the fuller science data infrastructure.


Author(s):  
Meghna Babubhai Patel ◽  
Jagruti N. Patel ◽  
Upasana M. Bhilota

ANN can work the way the human brain works and can learn the way we learn. The neural network is this kind of technology that is not an algorithm; it is a network that has weights on it, and you can adjust the weights so that it learns. You teach it through trials. It is a fact that the neural network can operate and improve its performance after “teaching” it, but it needs to undergo some process of learning to acquire information and be familiar with them. Nowadays, the age of smart devices dominates the technological world, and no one can deny their great value and contributions to mankind. A dramatic rise in the platforms, tools, and applications based on machine learning and artificial intelligence has been seen. These technologies not only impacted software and the internet industry but also other verticals such as healthcare, legal, manufacturing, automobile, and agriculture. The chapter shows the importance of latest technology used in ANN and future trends in ANN.


2019 ◽  
Vol 11 (1) ◽  
pp. 70-93
Author(s):  
Kody MOODLEY ◽  
Pedro V HERNANDEZ-SERRANO ◽  
Amrapali J ZAVERI ◽  
Marcel GH SCHAPER ◽  
Michel DUMONTIER ◽  
...  

This contribution explores the application of data science and artificial intelligence to legal research, more specifically an element that has not received much attention: the research infrastructure required to make such analysis possible. In recent years, EU law has become increasingly digitised and published in online databases such as EUR-Lex and HUDOC. However, the main barrier inhibiting legal scholars from analysing this information is lack of training in data analytics. Legal analytics software can mitigate this problem to an extent. However, current systems are dominated by the commercial sector. In addition, most systems focus on search of legal information but do not facilitate advanced visualisation and analytics. Finally, free to use systems that do provide such features are either too complex to use for general legal scholars, or are not rich enough in their analytics tools. In this paper, we motivate the case for building a software platform that addresses these limitations. Such software can provide a powerful platform for visualising and exploring connections and correlations in EU case law, helping to unravel the “DNA” behind EU legal systems. It will also serve to train researchers and students in schools and universities to analyse legal information using state-of-the-art methods in data science, without requiring technical proficiency in the underlying methods. We also suggest that the software should be powered by a data infrastructure and management paradigm following the seminal FAIR (Findable, Accessible, Interoperable and Reusable) principles.


2018 ◽  
Vol 5 (1) ◽  
pp. 205395171775296 ◽  
Author(s):  
Sarah Wadmann ◽  
Klaus Hoeyer

For years, attempts at ensuring the social sustainability of digital solutions have focused on ensuring that they are perceived as helpful and easy to use. A smooth and seamless work experience has been the goal to strive for. Based on document analysis and interviews with 15 stakeholders, we trace the setting up of a data infrastructure in Danish General Practice that had achieved just this goal – only to end in a scandal and subsequent loss of public support. The ease of data access made it possible for data to be extracted, exchanged and used by new actors and for new purposes – without those producing the data fully realizing the expansion of the infrastructure. We suggest that the case has wider relevance for a still more data-intensive healthcare sector and a growing data economy: when those who produce the data are not made aware of new uses of data, it makes it more difficult to resolve potential conflicts along the way. In the Danish case, conflicting views on legitimate data use led to the collapse of the infrastructure. Therefore, while seamlessness may be a solution to the old problem of a poor fit between user and technology, this celebrated virtue may also involve new problems relating to social instability. As digital solutions tend to be integrated still more seamlessly in still more of our activities, we need to develop political mechanisms to define and protect the rights and obligations of both data suppliers and users in order to ensure the long-term sustainability of digital infrastructures.


Sign in / Sign up

Export Citation Format

Share Document