scholarly journals Usage Patterns of Open Genomic Data

2013 ◽  
Vol 74 (2) ◽  
pp. 195-207 ◽  
Author(s):  
Jingfeng Xia ◽  
Ying Liu

This paper uses Genome Expression Omnibus (GEO), a data repository in biomedical sciences, to examine the usage patterns of open data repositories. It attempts to identify the degree of recognition of data reuse value and understand how e-science has impacted a large-scale scholarship. By analyzing a list of 1,211 publications that cite GEO data to support their independent studies, it discovers that free data can support a wealth of high-quality investigations, that the rate of open data use keeps growing over the years, and that scholars in different countries show different rates of complying with data-sharing policies.

2018 ◽  
Vol 42 (1) ◽  
pp. 124-142 ◽  
Author(s):  
Youngseek Kim ◽  
Seungahn Nah

Purpose The purpose of this paper is to examine how data reuse experience, attitudinal beliefs, social norms, and resource factors influence internet researchers to share data with other researchers outside their teams. Design/methodology/approach An online survey was conducted to examine the extent to which data reuse experience, attitudinal beliefs, social norms, and resource factors predicted internet researchers’ data sharing intentions and behaviors. The theorized model was tested using a structural equation modeling technique to analyze a total of 201 survey responses from the Association of Internet Researchers mailing list. Findings Results show that data reuse experience significantly influenced participants’ perception of benefit from data sharing and participants’ norm of data sharing. Belief structures regarding data sharing, including perceived career benefit and risk, and perceived effort, had significant associations with attitude toward data sharing, leading internet researchers to have greater data sharing intentions and behavior. The results also reveal that researchers’ norms for data sharing had a direct effect on data sharing intention. Furthermore, the results indicate that, while the perceived availability of data repository did not yield a positive impact on data sharing intention, it has a significant, direct, positive impact on researchers’ data sharing behaviors. Research limitations/implications This study validated its novel theorized model based on the theory of planned behavior (TPB). The study showed a holistic picture of how different data sharing factors, including data reuse experience, attitudinal beliefs, social norms, and data repositories, influence internet researchers’ data sharing intentions and behaviors. Practical implications Data reuse experience, attitude toward and norm of data sharing, and the availability of data repository had either direct or indirect influence on internet researchers’ data sharing behaviors. Thus, professional associations, funding agencies, and academic institutions alike should promote academic cultures that value data sharing in order to create a virtuous cycle of reciprocity and encourage researchers to have positive attitudes toward/norms of data sharing; these cultures should be strengthened by the strong support of data repositories. Originality/value In line with prior scholarship concerning scientific data sharing, this study of internet researchers offers a map of scientific data sharing intentions and behaviors by examining the impacts of data reuse experience, attitudinal beliefs, social norms, and data repositories together.


Metabolomics ◽  
2019 ◽  
Vol 15 (10) ◽  
Author(s):  
Kevin M. Mendez ◽  
Leighton Pritchard ◽  
Stacey N. Reinke ◽  
David I. Broadhurst

Abstract Background A lack of transparency and reporting standards in the scientific community has led to increasing and widespread concerns relating to reproduction and integrity of results. As an omics science, which generates vast amounts of data and relies heavily on data science for deriving biological meaning, metabolomics is highly vulnerable to irreproducibility. The metabolomics community has made substantial efforts to align with FAIR data standards by promoting open data formats, data repositories, online spectral libraries, and metabolite databases. Open data analysis platforms also exist; however, they tend to be inflexible and rely on the user to adequately report their methods and results. To enable FAIR data science in metabolomics, methods and results need to be transparently disseminated in a manner that is rapid, reusable, and fully integrated with the published work. To ensure broad use within the community such a framework also needs to be inclusive and intuitive for both computational novices and experts alike. Aim of Review To encourage metabolomics researchers from all backgrounds to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science. Key Scientific Concepts of Review This tutorial introduces the concept of interactive web-based computational laboratory notebooks. The reader is guided through a set of experiential tutorials specifically targeted at metabolomics researchers, based around the Jupyter Notebook web application, GitHub data repository, and Binder cloud computing platform.


2019 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Angela P. Murillo

Purpose The purpose of this study is to examine the information needs of earth and environmental scientists regarding how they determine data reusability and relevance. Additionally, this study provides strategies for the development of data collections and recommendations for data management and curation for information professionals working alongside researchers. Design/methodology/approach This study uses a multi-phase mixed-method approach. The test environment is the DataONE data repository. Phase 1 includes a qualitative and quantitative content analysis of deposited data. Phase 2 consists of a quasi-experiment think-aloud study. This paper reports mainly on Phase 2. Findings This study identifies earth and environmental scientists’ information needs to determine data reusability. The findings include a need for information regarding research methods, instruments and data descriptions when determining data reusability, as well as a restructuring of data abstracts. Additional findings include reorganizing of the data record layout and data citation information. Research limitations/implications While this study was limited to earth and environmental science data, the findings provide feedback for scientists in other disciplines, as earth and environmental science is a highly interdisciplinary scientific domain that pulls from many disciplines, including biology, ecology and geology, and additionally there has been a significant increase in interdisciplinary research in many scientific fields. Practical implications The practical implications include concrete feedback to data librarians, data curators and repository managers, as well as other information professionals as to the information needs of scientists reusing data. The suggestions could be implemented to improve consultative practices when working alongside scientists regarding data deposition and data creation. These suggestions could improve policies for data repositories through direct feedback from scientists. These suggestions could be implemented to improve how data repositories are created and what should be considered mandatory information and secondary information to improve the reusability of data. Social implications By examining the information needs of earth and environmental scientists reusing data, this study provides feedback that could change current practices in data deposition, which ultimately could improve the potentiality of data reuse. Originality/value While there has been research conducted on data sharing and reuse, this study provides more detailed granularity regarding what information is needed to determine reusability. This study sets itself apart by not focusing on social motivators and demotivators, but by focusing on information provided in a data record.


2018 ◽  
Author(s):  
Kristian Peters ◽  
James Bradbury ◽  
Sven Bergmann ◽  
Marco Capuccini ◽  
Marta Cascante ◽  
...  

AbstractBackgroundMetabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism’s metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological and many other applied biological domains. Its computationally-intensive nature has driven requirements for open data formats, data repositories and data analysis tools. However, the rapid progress has resulted in a mosaic of independent – and sometimes incompatible – analysis methods that are difficult to connect into a useful and complete data analysis solution.FindingsThe PhenoMeNal (Phenome and Metabolome aNalysis) e-infrastructure provides a complete, workflow-oriented, interoperable metabolomics data analysis solution for a modern infrastructure-as-a-service (IaaS) cloud platform. PhenoMeNal seamlessly integrates a wide array of existing open source tools which are tested and packaged as Docker containers through the project’s continuous integration process and deployed based on a kubernetes orchestration framework. It also provides a number of standardized, automated and published analysis workflows in the user interfaces Galaxy, Jupyter, Luigi and Pachyderm.ConclusionsPhenoMeNal constitutes a keystone solution in cloud infrastructures available for metabolomics. It provides scientists with a ready-to-use, workflow-driven, reproducible and shareable data analysis platform harmonizing the software installation and configuration through user-friendly web interfaces. The deployed cloud environments can be dynamically scaled to enable large-scale analyses which are interfaced through standard data formats, versioned, and have been tested for reproducibility and interoperability. The flexible implementation of PhenoMeNal allows easy adaptation of the infrastructure to other application areas and ‘omics research domains.


Author(s):  
Michele Cocca ◽  
Douglas Teixeira ◽  
Luca Vassio ◽  
Marco Mellia ◽  
Jussara M. Almeida ◽  
...  

Free Floating Car Sharing (FFCS) services are a flexible alternative to car ownership. These transportation services show highly dynamic usage both over different hours of the day, and across different city areas. In this work, we study the problem of predicting FFCS demand patterns -- a problem of great importance to an adequate provisioning of the service. We tackle both the prediction of the demand i) over time and ii) over space. We rely on months of real FFCS rides in Vancouver, which constitute our ground truth. We enrich this data with detailed socio-demographic information obtained from large open-data repositories to predict usage patterns. Our aim is to offer a thorough comparison of several machine learning algorithms in terms of accuracy and easiness of training, and to assess the effectiveness of current state-of-art approaches to address the prediction problem. Our results show that it is possible to predict the future usage with relative errors down to 10%, and the spatial prediction can be estimated with relative errors of about 40%. Our study also uncovered the socio-demographic features that most strongly correlate with FFCS usage, providing interesting insights for providers opening service in new regions.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Youngseek Kim

PurposeThis research investigates how the availabilities of both metadata standards and data repositories influence researchers' data reuse intentions either directly or indirectly as mediated by the norms of data reuse and their attitudes toward data reuse.Design/methodology/approachThe theory of planned behavior (TPB) was employed to develop the research model of researchers' data reuse intentions, focusing on the roles of metadata standards, data repositories and norms of data reuse. The proposed research model was evaluated using the structural equation modeling (SEM) method based on the survey responses received from 811 STEM (science, technology, engineering and mathematics) researchers in the United States.FindingsThis research found that the availabilities of both metadata standards and data repositories significantly affect STEM researchers' norm of data reuse, which influences their data reuse intentions as mediated by their attitudes toward data reuse. This research also found that both the availability of data repositories and the norm of data reuse have a direct influence on data reuse intentions and that norm of data reuse significantly increases the effect of attitude toward data reuse on data reuse intention as a moderator.Research limitations/implicationsThe modified model of TPB provides a new perspective in apprehending the roles of resource facilitating conditions such as the availabilities of metadata standards and data repositories in an individual's attitude, norm and their behavioral intention to conduct a certain behavior.Practical implicationsThis study suggests that scientific communities need to develop more supportive metadata standards and data repositories by considering their roles in enhancing the community norm of data reuse, which eventually lead to data reuse behaviors.Originality/valueThis study sheds light on the mechanism of metadata standard and data repository in researchers' data reuse behaviors through their community norm of data reuse; this can help scientific communities and academic institutions to better support researchers in their data sharing and reuse behaviors.Peer reviewThe peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-09-2020-0431


Electronics ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 72 ◽  
Author(s):  
Michele Cocca ◽  
Douglas Teixeira ◽  
Luca Vassio ◽  
Marco Mellia ◽  
Jussara M. Almeida ◽  
...  

Free-Floating Car-Sharing (FFCS) services are a flexible alternative to car ownership. These transportation services show highly dynamic usage both over different hours of the day, and across different city areas. In this work, we study the problem of predicting FFCS demand patterns—a problem of great importance to the adequate provisioning of the service. We tackle both the prediction of the demand (i) over time and (ii) over space. We rely on months of real FFCS rides in Vancouver, which constitute our ground truth. We enrich this data with detailed socio-demographic information obtained from large open-data repositories to predict usage patterns. Our aim is to offer a thorough comparison of several machine-learning algorithms in terms of accuracy and ease of training, and to assess the effectiveness of current state-of-the-art approaches to address the prediction problem. Our results show that it is possible to predict the future usage with relative errors down to 10%, while the spatial prediction can be estimated with relative errors of about 40%. Our study also uncovers the socio-demographic features that most strongly correlate with FFCS usage, providing interesting insights for providers interested in offering services in new regions.


2020 ◽  
Author(s):  
Geoff Boeing

Cities worldwide exhibit a variety of street network patterns and configurations that shape human mobility, equity, health, and livelihoods. This study models and analyzes the street networks of each urban area in the world, using boundaries derived from the Global Human Settlement Layer. Street network data are acquired and modeled from OpenStreetMap with the open-source OSMnx software. In total, this study models over 160 million OpenStreetMap street network nodes and over 320 million edges across 8,914 urban areas in 178 countries, and attaches elevation and grade data. This article presents the study's reproducible computational workflow, introduces two new open data repositories of ready-to-use global street network models and calculated indicators, and discusses summary findings on street network form worldwide. It makes four contributions. First, it reports the methodological advances of this open-source workflow. Second, it produces an open data repository containing street network models for each urban area. Third, it analyzes these models to produce an open data repository containing street network form indicators for each urban area. No such global urban street network indicator dataset has previously existed. Fourth, it presents a summary analysis of urban street network form, reporting the first such worldwide results in the literature.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Nushrat Khan ◽  
Mike Thelwall ◽  
Kayvan Kousha

PurposeThe purpose of this study is to explore current practices, challenges and technological needs of different data repositories.Design/methodology/approachAn online survey was designed for data repository managers, and contact information from the re3data, a data repository registry, was collected to disseminate the survey.FindingsIn total, 189 responses were received, including 47% discipline specific and 34% institutional data repositories. A total of 71% of the repositories reporting their software used bespoke technical frameworks, with DSpace, EPrint and Dataverse being commonly used by institutional repositories. Of repository managers, 32% reported tracking secondary data reuse while 50% would like to. Among data reuse metrics, citation counts were considered extremely important by the majority, followed by links to the data from other websites and download counts. Despite their perceived usefulness, repository managers struggle to track dataset citations. Most repository managers support dataset and metadata quality checks via librarians, subject specialists or information professionals. A lack of engagement from users and a lack of human resources are the top two challenges, and outreach is the most common motivator mentioned by repositories across all groups. Ensuring findable, accessible, interoperable and reusable (FAIR) data (49%), providing user support for research (36%) and developing best practices (29%) are the top three priorities for repository managers. The main recommendations for future repository systems are as follows: integration and interoperability between data and systems (30%), better research data management (RDM) tools (19%), tools that allow computation without downloading datasets (16%) and automated systems (16%).Originality/valueThis study identifies the current challenges and needs for improving data repository functionalities and user experiences.Peer reviewThe peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-04-2021-0204


1970 ◽  
Vol 12 (2) ◽  
pp. 177-195 ◽  
Author(s):  
Alastair Dunning ◽  
Madeleine De Smaele ◽  
Jasmin Böhmer

This practice paper describes an ongoing research project to test the effectiveness and relevance of the FAIR Data Principles. Simultaneously, it will analyse how easy it is for data archives to adhere to the principles. The research took place from November 2016 to January 2017, and will be underpinned with feedback from the repositories. The FAIR Data Principles feature 15 facets corresponding to the four letters of FAIR - Findable, Accessible, Interoperable, Reusable. These principles have already gained traction within the research world. The European Commission has recently expanded its demand for research to produce open data. The relevant guidelines1are explicitly written in the context of the FAIR Data Principles. Given an increasing number of researchers will have exposure to the guidelines, understanding their viability and suggesting where there may be room for modification and adjustment is of vital importance. This practice paper is connected to a dataset(Dunning et al.,2017) containing the original overview of the sample group statistics and graphs, in an Excel spreadsheet. Over the course of two months, the web-interfaces, help-pages and metadata-records of over 40 data repositories have been examined, to score the individual data repository against the FAIR principles and facets. The traffic-light rating system enables colour-coding according to compliance and vagueness. The statistical analysis provides overall, categorised, on the principles focussing, and on the facet focussing results. The analysis includes the statistical and descriptive evaluation, followed by elaborations on Elements of the FAIR Data Principles, the subject specific or repository specific differences, and subsequently what repositories can do to improve their information architecture. (1) H2020 Guidelines on FAIR Data Management:http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf


Sign in / Sign up

Export Citation Format

Share Document