scholarly journals A Data-Driven Approach to Appraisal and Selection at a Domain Data Repository

2018 ◽  
Vol 12 (2) ◽  
Author(s):  
Amy M Pienta ◽  
Dharma Akmon ◽  
Justin Noble ◽  
Lynette Hoelter ◽  
Susan Jekielek

Social scientists are producing an ever-expanding volume of data, leading to questions about appraisal and selection of content given finite resources to process data for reuse. We analyze users’ search activity in an established social science data repository to better understand demand for data and more effectively guide collection development. By applying a data-driven approach, we aim to ensure curation resources are applied to make the most valuable data findable, understandable, accessible, and usable. We analyze data from a domain repository for the social sciences that includes over 500,000 annual searches in 2014 and 2015 to better understand trends in user search behavior. Using a newly created search-to-study ratio technique, we identified gaps in the domain data repository’s holdings and leveraged this analysis to inform our collection and curation practices and policies. The evaluative technique we propose in this paper will serve as a baseline for future studies looking at trends in user demand over time at the domain data repository being studied with broader implications for other data repositories.

2020 ◽  
Vol 4 (2) ◽  
pp. 461-481
Author(s):  
Charles Chang

AbstractThis article presents a data-driven approach to the study of the social and political statuses of urban communities in modern Kunming. Such information is lacking in government maps and documents. Using data from a wide variety of sources, many unconventional, I subject them to critical evaluation and computational analysis to extract information that can be used to produce a land use map of sufficient detail and accuracy to allow scholars to address and even answer questions of a socio-political, economic and, indeed, humanistic nature. My method can also be applied to other Chinese cities and to cities elsewhere that lack accurate information.


2020 ◽  
Vol 16 (1) ◽  
Author(s):  
Kevin Louis Bardosh ◽  
Daniel H. de Vries ◽  
Sharon Abramowitz ◽  
Adama Thorlie ◽  
Lianne Cremers ◽  
...  

Abstract Background The importance of integrating the social sciences in epidemic preparedness and response has become a common feature of infectious disease policy and practice debates. However to date, this integration remains inadequate, fragmented and under-funded, with limited reach and small initial investments. Based on data collected prior to the COVID-19 pandemic, in this paper we analysed the variety of knowledge, infrastructure and funding gaps that hinder the full integration of the social sciences in epidemics and present a strategic framework for addressing them. Methods Senior social scientists with expertise in public health emergencies facilitated expert deliberations, and conducted 75 key informant interviews, a consultation with 20 expert social scientists from Africa, Asia and Europe, 2 focus groups and a literature review of 128 identified high-priority peer reviewed articles. We also analysed 56 interviews from the Ebola 100 project, collected just after the West African Ebola epidemic. Analysis was conducted on gaps and recommendations. These were inductively classified according to various themes during two group prioritization exercises. The project was conducted between February and May 2019. Findings from the report were used to inform strategic prioritization of global investments in social science capacities for health emergencies. Findings Our analysis consolidated 12 knowledge and infrastructure gaps and 38 recommendations from an initial list of 600 gaps and 220 recommendations. In developing our framework, we clustered these into three areas: 1) Recommendations to improve core social science response capacities, including investments in: human resources within response agencies; the creation of social science data analysis capacities at field and global level; mechanisms for operationalizing knowledge; and a set of rapid deployment infrastructures; 2) Recommendations to strengthen applied and basic social sciences, including the need to: better define the social science agenda and core competencies; support innovative interdisciplinary science; make concerted investments in developing field ready tools and building the evidence-base; and develop codes of conduct; and 3) Recommendations for a supportive social science ecosystem, including: the essential foundational investments in institutional development; training and capacity building; awareness-raising activities with allied disciplines; and lastly, support for a community of practice. Interpretation Comprehensively integrating social science into the epidemic preparedness and response architecture demands multifaceted investments on par with allied disciplines, such as epidemiology and virology. Building core capacities and competencies should occur at multiple levels, grounded in country-led capacity building. Social science should not be a parallel system, nor should it be “siloed” into risk communication and community engagement. Rather, it should be integrated across existing systems and networks, and deploy interdisciplinary knowledge “transversally” across all preparedness and response sectors and pillars. Future work should update this framework to account for the impact of the COVID-19 pandemic on the institutional landscape.


Author(s):  
Anthony Scime ◽  
Gregg R. Murray

Social scientists address some of the most pressing issues of society such as health and wellness, government processes and citizen reactions, individual and collective knowledge, working conditions and socio-economic processes, and societal peace and violence. In an effort to understand these and many other consequential issues, social scientists invest substantial resources to collect large quantities of data, much of which are not fully explored. This chapter proffers the argument that privacy protection and responsible use are not the only ethical considerations related to data mining social data. Given (1) the substantial resources allocated and (2) the leverage these “big data” give on such weighty issues, this chapter suggests social scientists are ethically obligated to conduct comprehensive analysis of their data. Data mining techniques provide pertinent tools that are valuable for identifying attributes in large data sets that may be useful for addressing important issues in the social sciences. By using these comprehensive analytical processes, a researcher may discover a set of attributes that is useful for making behavioral predictions, validating social science theories, and creating rules for understanding behavior in social domains. Taken together, these attributes and values often present previously unknown knowledge that may have important applied and theoretical consequences for a domain, social scientific or otherwise. This chapter concludes with examples of important social problems studied using various data mining methodologies including ethical concerns.


2017 ◽  
Vol 35 (4) ◽  
pp. 626-649 ◽  
Author(s):  
Wei Jeng ◽  
Daqing He ◽  
Yu Chi

Purpose Owing to the recent surge of interest in the age of the data deluge, the importance of researching data infrastructures is increasing. The open archival information system (OAIS) model has been widely adopted as a framework for creating and maintaining digital repositories. Considering that OAIS is a reference model that requires customization for actual practice, this paper aims to examine how the current practices in a data repository map to the OAIS environment and functional components. Design/methodology/approach The authors conducted two focus-group sessions and one individual interview with eight employees at the world’s largest social science data repository, the Interuniversity Consortium for Political and Social Research (ICPSR). By examining their current actions (activities regarding their work responsibilities) and IT practices, they studied the barriers and challenges of archiving and curating qualitative data at ICPSR. Findings The authors observed that the OAIS model is robust and reliable in actual service processes for data curation and data archives. In addition, a data repository’s workflow resembles digital archives or even digital libraries. On the other hand, they find that the cost of preventing disclosure risk and a lack of agreement on the standards of text data files are the most apparent obstacles for data curation professionals to handle qualitative data; the maturation of data metrics seems to be a promising solution to several challenges in social science data sharing. Originality/value The authors evaluated the gap between a research data repository’s current practices and the adoption of the OAIS model. They also identified answers to questions such as how current technological infrastructure in a leading data repository such as ICPSR supports their daily operations, what the ideal technologies in those data repositories would be and the associated challenges that accompany these ideal technologies. Most importantly, they helped to prioritize challenges and barriers from the data curator’s perspective and to contribute implications of data sharing and reuse in social sciences.


Author(s):  
Jianxi Luo ◽  
Serhad Sarica ◽  
Kristin L. Wood

Abstract Traditionally, the ideation of design opportunities and new concepts relies on human expertise or intuition and is faced with high uncertainty. Inexperienced or specialized designers often fail to explore ideas broadly and become fixed on specific ideas early in the design process. Recent data-driven design methods provide external design stimuli beyond one’s own knowledge, but their uses in rapid ideation are still limited. Intuitive and directed ideation techniques, such as brainstorming, mind mapping, Design-by-Analogy, SCAMPER, TRIZ and Design Heuristics may empower designers in rapid ideation but are limited in the designer’s own knowledge base. Herein, we harness data-driven design and rapid ideation techniques to introduce a data-driven computer-aided rapid ideation process using the cloud-based InnoGPS system. InnoGPS integrates an empirical network map of all technology domains based on the international patent classification which are connected according to knowledge distance based on patent data, with a few map-based functions to position technologies, explore neighborhoods, and retrieve knowledge, concepts and solutions in the near or far fields for design analogies and syntheses. The functions of InnoGPS fuse design science, network science, data science and interactive visualization and make the design ideation process data-driven, theoretically-grounded, visually-inspiring, and rapid. We demonstrate the procedures of using InnoGPS as a data-driven rapid ideation tool to generate new rolling toy design concepts.


2017 ◽  
Vol 25 (1) ◽  
pp. 17-24 ◽  
Author(s):  
Hossein Estiri ◽  
Kari A Stephens ◽  
Jeffrey G Klann ◽  
Shawn N Murphy

Abstract Objective To provide an open source, interoperable, and scalable data quality assessment tool for evaluation and visualization of completeness and conformance in electronic health record (EHR) data repositories. Materials and Methods This article describes the tool’s design and architecture and gives an overview of its outputs using a sample dataset of 200 000 randomly selected patient records with an encounter since January 1, 2010, extracted from the Research Patient Data Registry (RPDR) at Partners HealthCare. All the code and instructions to run the tool and interpret its results are provided in the Supplementary Appendix. Results DQe-c produces a web-based report that summarizes data completeness and conformance in a given EHR data repository through descriptive graphics and tables. Results from running the tool on the sample RPDR data are organized into 4 sections: load and test details, completeness test, data model conformance test, and test of missingness in key clinical indicators. Discussion Open science, interoperability across major clinical informatics platforms, and scalability to large databases are key design considerations for DQe-c. Iterative implementation of the tool across different institutions directed us to improve the scalability and interoperability of the tool and find ways to facilitate local setup. Conclusion EHR data quality assessment has been hampered by implementation of ad hoc processes. The architecture and implementation of DQe-c offer valuable insights for developing reproducible and scalable data science tools to assess, manage, and process data in clinical data repositories.


2019 ◽  
Vol 2019 ◽  
pp. 1-15 ◽  
Author(s):  
C. Pommier ◽  
C. Michotey ◽  
G. Cornut ◽  
P. Roumet ◽  
E. Duchêne ◽  
...  

GnpIS is a data repository for plant phenomics that stores whole field and greenhouse experimental data including environment measures. It allows long-term access to datasets following the FAIR principles: Findable, Accessible, Interoperable, and Reusable, by using a flexible and original approach. It is based on a generic and ontology driven data model and an innovative software architecture that uncouples data integration, storage, and querying. It takes advantage of international standards including the Crop Ontology, MIAPPE, and the Breeding API. GnpIS allows handling data for a wide range of species and experiment types, including multiannual perennial plants experimental network or annual plant trials with either raw data, i.e., direct measures, or computed traits. It also ensures the integration and the interoperability among phenotyping datasets and with genotyping data. This is achieved through a careful curation and annotation of the key resources conducted in close collaboration with the communities providing data. Our repository follows the Open Science data publication principles by ensuring citability of each dataset. Finally, GnpIS compliance with international standards enables its interoperability with other data repositories hence allowing data links between phenotype and other data types. GnpIS can therefore contribute to emerging international federations of information systems.


2016 ◽  
Vol 10 (2) ◽  
pp. 205-224 ◽  
Author(s):  
Ruth Mostern ◽  
Marieka Arksey

Historians and historical quantitative social scientists, motivated by a renewed interest in quantitative history and by sophisticated tools for digital infrastructure, are developing data repositories for global-scale and collaborative analysis. However, their archives have been slow to grow. This article is directed toward historians who are contemplating such projects. Repository development is very valuable. On the other hand, studies show that repository projects that rely upon voluntary contribution from numerous researchers seldom reach critical mass. Our surveys and our study of the Collaborative for Historical Information and Analysis Data Hoover Project confirm this assessment. We conclude that historical data repositories remain poorly aligned with present day scholarly practices and are unlikely to realize their promise until the social life of data becomes a part of the profession. Because we believe that this is possible we introduce four strategies, each one backed by a successful project, that will help to make data sharing a part of professional practice. These suggestions are: 1) hiring ‘data hoovers’ to solicit and curate data, 2) appealing to close-knit communities and networking their domain-specific archives, 3) rightsizing crowdsourcing tasks, and 4) incorporating peer review.


Sign in / Sign up

Export Citation Format

Share Document