scholarly journals Piloting a Community of Student Data Consultants that Supports and Enhances Research Data Services

Research ecosystems within university environments are continuously evolving and requiring more resources and domain specialists to assist with the data lifecycle. Typically, academic researchers and professionals are overcommitted, making it challenging to be up-to-date on recent developments in best practices of data management, curation, transformation, analysis, and visualization. Recently, research groups, university core centers, and Libraries are revitalizing these services to fill in the gaps to aid researchers in finding new tools and approaches to make their work more impactful, sustainable, and replicable. In this paper, we report on a student consultation program built within the University Libraries, that takes an innovative, student-centered approach to meeting the research data needs in a university environment while also providing students with experiential learning opportunities. This student program, DataBridge, trains students to work in multi-disciplinary teams and as student consultants to assist faculty, staff, and students with their real-world, data-intensive research challenges. Centering DataBridge in the Libraries allows students the unique opportunity to work across all disciplines, on problems and in domains that some students may not interact with during their college careers. To encourage students from multiple disciplines to participate, we developed a scaffolded curriculum that allows students from any discipline and skill level to quickly develop the essential data science skill sets and begin contributing their own unique perspectives and specializations to the research consultations. These students, mentored by Informatics faculty in the Libraries, provide research support that can ultimately impact the entire research process. Through our pilot phase, we have found that DataBridge enhances the utilization and openness of data created through research, extends the reach and impact of the work beyond the researcher’s specialized community, and creates a network of student “data champions” across the University who see the value in working with the Library. Here, we describe the evolution of the DataBridge program and outline its unique role in both training the data stewards of the future with regard to FAIR data practices, and in contributing significant value to research projects at Virginia Tech. Ultimately, this work highlights the need for innovative, strategic programs that encourage and enable real-world experience of data curation, data analysis, and data publication for current researchers, all while training the next generation of researchers in these best practices.

10.29173/iq12 ◽  
2017 ◽  
Vol 41 (1-4) ◽  
pp. 12
Author(s):  
Bhojaraju Gunjal ◽  
Panorea Gaitanou

This paper attempts to present a brief overview of several Research Data Management (RDM) issues and a detailed literature review regarding the RDM aspects adopted in libraries globally. Furthermore, it will describe several tendencies concerning the management of repository tools for research data, as well as the challenges in implementing the RDM. The proper planned training and skill development for all stakeholders by mentors to train both staff and users are some of the issues that need to be considered to enhance the RDM process. An effort will be also made to present the suitable policies and workflows along with the adoption of best practices in RDM, so as to boost the research process in an organisation. This study will showcase the implementation of RDM processes in the Higher Educational Institute of India, referring particularly to the Central Library @ NIT Rourkela in Odisha, India with a proposed framework. Finally, this study will also propose an area of opportunities that can boost research activities in the Institute.


2021 ◽  
Vol 4 ◽  
Author(s):  
Bradley Butcher ◽  
Vincent S. Huang ◽  
Christopher Robinson ◽  
Jeremy Reffin ◽  
Sema K. Sgaier ◽  
...  

Developing data-driven solutions that address real-world problems requires understanding of these problems’ causes and how their interaction affects the outcome–often with only observational data. Causal Bayesian Networks (BN) have been proposed as a powerful method for discovering and representing the causal relationships from observational data as a Directed Acyclic Graph (DAG). BNs could be especially useful for research in global health in Lower and Middle Income Countries, where there is an increasing abundance of observational data that could be harnessed for policy making, program evaluation, and intervention design. However, BNs have not been widely adopted by global health professionals, and in real-world applications, confidence in the results of BNs generally remains inadequate. This is partially due to the inability to validate against some ground truth, as the true DAG is not available. This is especially problematic if a learned DAG conflicts with pre-existing domain doctrine. Here we conceptualize and demonstrate an idea of a “Causal Datasheet” that could approximate and document BN performance expectations for a given dataset, aiming to provide confidence and sample size requirements to practitioners. To generate results for such a Causal Datasheet, a tool was developed which can generate synthetic Bayesian networks and their associated synthetic datasets to mimic real-world datasets. The results given by well-known structure learning algorithms and a novel implementation of the OrderMCMC method using the Quotient Normalized Maximum Likelihood score were recorded. These results were used to populate the Causal Datasheet, and recommendations could be made dependent on whether expected performance met user-defined thresholds. We present our experience in the creation of Causal Datasheets to aid analysis decisions at different stages of the research process. First, one was deployed to help determine the appropriate sample size of a planned study of sexual and reproductive health in Madhya Pradesh, India. Second, a datasheet was created to estimate the performance of an existing maternal health survey we conducted in Uttar Pradesh, India. Third, we validated generated performance estimates and investigated current limitations on the well-known ALARM dataset. Our experience demonstrates the utility of the Causal Datasheet, which can help global health practitioners gain more confidence when applying BNs.


2010 ◽  
Vol 1 (4) ◽  
pp. 1-9 ◽  
Author(s):  
Stan Lipovetsky

Chaotic systems have been widely studied for description and explanation of various observed phenomena. The problem of statistical modeling for messy data can be attempted using the so called Supercritical Pitchfork Bifurcation (SPB) approach. This work considers the possibility of applying SPB technique to regression modeling of the implicit functions. Theoretical and practical advantages of SPB regression are discussed with an example from marketing research data on advertising in the car industry. Results are very promising, which can help in modeling, analysis, interpretation, and lead to understanding of the real world data.


2021 ◽  
Author(s):  
Marlena Duda ◽  
Kelly L Sovacool ◽  
Negar Farzaneh ◽  
Vy Kim Nguyen ◽  
Sarah E Haynes ◽  
...  

We are bioinformatics trainees at the University of Michigan who started a local chapter of Girls Who Code to provide a fun and supportive environment for high school women to learn the power of coding. Our goal was to cover basic coding topics and data science concepts through live coding and hands-on practice. However, we could not find a resource that exactly met our needs. Therefore, over the past three years, we have developed a curriculum and instructional format using Jupyter notebooks to effectively teach introductory Python for data science. This method, inspired by The Carpentries organization, uses bite-sized lessons followed by independent practice time to reinforce coding concepts, and culminates in a data science capstone project using real-world data. We believe our open curriculum is a valuable resource to the wider education community and hope that educators will use and improve our lessons, practice problems, and teaching best practices. Anyone can contribute to our educational material on GitHub (https://github.com/GWC-DCMB).


2021 ◽  
Author(s):  
Prasanta Pal ◽  
Shataneek Banerjee ◽  
Amardip Ghosh ◽  
David R. Vago ◽  
Judson Brewer

<div> <div> <div> <p>Knowingly or unknowingly, digital-data is an integral part of our day-to-day lives. Realistically, there is probably not a single day when we do not encounter some form of digital-data. Typically, data originates from diverse sources in various formats out of which time-series is a special kind of data that captures the information about the time-evolution of a system under observation. How- ever, capturing the temporal-information in the context of data-analysis is a highly non-trivial challenge. Discrete Fourier-Transform is one of the most widely used methods that capture the very essence of time-series data. While this nearly 200-year-old mathematical transform, survived the test of time, however, the nature of real-world data sources violates some of the intrinsic properties presumed to be present to be able to be processed by DFT. Adhoc noise and outliers fundamentally alter the true signature of the frequency domain behavior of the signal of interest and as a result, the frequency-domain representation gets corrupted as well. We demonstrate that the application of traditional digital filters as is, may not often reveal an accurate description of the pristine time-series characteristics of the system under study. In this work, we analyze the issues of DFT with real-world data as well as propose a method to address it by taking advantage of insights from modern data-science techniques and particularly our previous work SOCKS. Our results reveal that a dramatic, never-before-seen improvement is possible by re-imagining DFT in the context of real-world data with appropriate curation protocols. We argue that our proposed transformation DFT21 would revolutionize the digital world in terms of accuracy, reliability, and information retrievability from raw-data. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Prasanta Pal ◽  
Shataneek Banerjee ◽  
Amardip Ghosh ◽  
David R. Vago ◽  
Judson Brewer

<div> <div> <div> <p>Knowingly or unknowingly, digital-data is an integral part of our day-to-day lives. Realistically, there is probably not a single day when we do not encounter some form of digital-data. Typically, data originates from diverse sources in various formats out of which time-series is a special kind of data that captures the information about the time-evolution of a system under observation. How- ever, capturing the temporal-information in the context of data-analysis is a highly non-trivial challenge. Discrete Fourier-Transform is one of the most widely used methods that capture the very essence of time-series data. While this nearly 200-year-old mathematical transform, survived the test of time, however, the nature of real-world data sources violates some of the intrinsic properties presumed to be present to be able to be processed by DFT. Adhoc noise and outliers fundamentally alter the true signature of the frequency domain behavior of the signal of interest and as a result, the frequency-domain representation gets corrupted as well. We demonstrate that the application of traditional digital filters as is, may not often reveal an accurate description of the pristine time-series characteristics of the system under study. In this work, we analyze the issues of DFT with real-world data as well as propose a method to address it by taking advantage of insights from modern data-science techniques and particularly our previous work SOCKS. Our results reveal that a dramatic, never-before-seen improvement is possible by re-imagining DFT in the context of real-world data with appropriate curation protocols. We argue that our proposed transformation DFT21 would revolutionize the digital world in terms of accuracy, reliability, and information retrievability from raw-data. </p> </div> </div> </div>


The importance of data science and machine learning is evident in all the domains where any kind of data is generated. The multi aspect analysis and visualizations help the society to come up with useful solutions and formulate policies. This paper takes the live data of current pandemic of Corona Virus and presents multi-faceted views of the data as to help the authorities and Governments to take appropriate decisions to takle this unprecedented problem. Python and its libraries along with Google Colab platform is used to get the results. The best possible techniques and combinations of modules/libraries are used to present the information related to COVID-19..


2015 ◽  
Vol 10 (1) ◽  
pp. 111-122 ◽  
Author(s):  
Liz Lyon ◽  
Aaron Brenner

This paper examines the role, functions and value of the “iSchool” as an agent of change in the data informatics and data curation arena. A brief background to the iSchool movement is given followed by a brief review of the data decade, which highlights key data trends from the iSchool perspective: open data and open science, big data and disciplinary data diversity. The growing emphasis on the shortage of data talent is noted and a family of data science roles identified. The paper moves on to describe three primary functions of iSchools: education, research intelligence and professional practice, which form the foundations of a new Capability Ramp Model. The model is illustrated by mini-case studies from the School of Information Sciences, University of Pittsburgh: the immersive (laboratory-based) component of two new Research Data Management and Research Data Infrastructures graduate courses, a new practice partnership with the University Library System centred on RDM, and the mapping of disciplinary data practice using the Community Capability Model Profile Tool. The paper closes with a look to the future and, based on the assertion that data is mission-critical for iSchools, some steps are proposed for the next data decade: moving data education programs into the mainstream core curriculum, adopting a translational data science perspective and strengthening engagement with the Research Data Alliance.


2021 ◽  
Vol 4 ◽  
pp. 110-120
Author(s):  
Audra Diers-Lawson

Contemporary professional reports and research suggest that in corporate communication and related programs, we are not creating environments for modern students to thrive nor are we meeting the industry’s expectations in a “hypermodern” world. Using personal ethnography, this article analyzes industry-articulated limitations in the knowledge and skill sets of new communication practitioners, reviews contemporary literature identifying the learning needs of today’s students, and proposes a set of best practices based on the literature and the author’s own journey as a higher education practitioner of 20 years. Best practices identified here incorporate elements of entertainment, engagement, and an “open-world” approach that places the student experience at the core of each class and overall course design.


Sign in / Sign up

Export Citation Format

Share Document