Computational and Informatic Advances for Reproducible Data Analysis in Neuroimaging

2019 ◽  
Vol 2 (1) ◽  
pp. 119-138 ◽  
Author(s):  
Russell A. Poldrack ◽  
Krzysztof J. Gorgolewski ◽  
Gaël Varoquaux

The reproducibility of scientific research has become a point of critical concern. We argue that openness and transparency are critical for reproducibility, and we outline an ecosystem for open and transparent science that has emerged within the human neuroimaging community. We discuss the range of open data-sharing resources that have been developed for neuroimaging data, as well as the role of data standards (particularly the brain imaging data structure) in enabling the automated sharing, processing, and reuse of large neuroimaging data sets. We outline how the open source Python language has provided the basis for a data science platform that enables reproducible data analysis and visualization. We also discuss how new advances in software engineering, such as containerization, provide the basis for greater reproducibility in data analysis. The emergence of this new ecosystem provides an example for many areas of science that are currently struggling with reproducibility.

2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S23-S24
Author(s):  
Kendra L Seaman

Abstract In concert with broader efforts to increase the reliability of social science research, there are several efforts to increase transparency and reproducibility in neuroimaging. The large-scale nature of neuroimaging data and constantly evolving analysis tools can make transparency challenging. I will describe emerging tools used to document, organize, and share behavioral and neuroimaging data. These tools include: (1) the preregistration of neuroimaging data sets which increases openness and protects researchers from suspicions of p-hacking, (2) the conversion of neuroimaging data into a standardized format (Brain Imaging Data Structure: BIDS) that enables standardized scripts to process and share neuroimaging data, and (3) the sharing of final neuroimaging results on Neurovault which allows the community to do rapid meta-analysis. Using these tools improves workflows within labs, improves the overall quality of our science and provides a potential model for other disciplines using large-scale data.


Author(s):  
Gorgolewski Krzysztof ◽  
Poline Jean-Baptiste ◽  
Keator David ◽  
Nichols B ◽  
Auer Tibor ◽  
...  

Author(s):  
Laura Dipietro ◽  
Seth Elkin-Frankston ◽  
Ciro Ramos-Estebanez ◽  
Timothy Wagner

The history of neuroscience has tracked with the evolution of science and technology. Today, neuroscience's trajectory is heavily dependent on computational systems and the availability of high-performance computing (HPC), which are becoming indispensable for building simulations of the brain, coping with high computational demands of analysis of brain imaging data sets, and developing treatments for neurological diseases. This chapter will briefly review the current and potential future use of supercomputers in neuroscience.


Author(s):  
Kirti Raj Bhatele ◽  
Stuti Singhal ◽  
Muktasha R. Mithora ◽  
Sneha Sharma

This chapter will guide you through the modeling, uses, and trends in data analysis and data science. The authors focus on the importance of pictorial data in replacement of numeric data. In most situations, graphical representation of data can present the information more distinctly, informative, and in less space than the same information requires in sentence form. This chapter provides a brief knowledge about representing data to more understandable form such that any person whether layman or not can understand it without any difficulty. This chapter also deals with the software Tableau which we use to convert the table data into graphical data. This Chapter contains 11 heat maps related to the world economies and their detailed study on several different topics. It will also give light on the basics of Python Language and its various algorithm studies to compare all the world economies based on their development.


PLoS ONE ◽  
2012 ◽  
Vol 7 (12) ◽  
pp. e50332 ◽  
Author(s):  
Yuanqing Li ◽  
Jinyi Long ◽  
Lin He ◽  
Haidong Lu ◽  
Zhenghui Gu ◽  
...  

The 2017 SIS Conference aims to highlight the crucial role of the Statistics in Data Science. In this new domain of ‘meaning’ extracted from the data, the increasing amount of produced and available data in databases, nowadays, has brought new challenges. That involves different fields of statistics, machine learning, information and computer science, optimization, pattern recognition. These afford together a considerable contribute in the analysis of ‘Big data’, open data, relational and complex data, structured and no-structured. The interest is to collect the contributes which provide from the different domains of Statistics, in the high dimensional data quality validation, sampling extraction, dimensional reduction, pattern selection, data modelling, testing hypotheses and confirming conclusions drawn from the data.


2019 ◽  
Author(s):  
Melanie Christine Föll ◽  
Lennart Moritz ◽  
Thomas Wollmann ◽  
Maren Nicole Stillger ◽  
Niklas Vockert ◽  
...  

AbstractBackgroundMass spectrometry imaging is increasingly used in biological and translational research as it has the ability to determine the spatial distribution of hundreds of analytes in a sample. Being at the interface of proteomics/metabolomics and imaging, the acquired data sets are large and complex and often analyzed with proprietary software or in-house scripts, which hinder reproducibility. Open source software solutions that enable reproducible data analysis often require programming skills and are therefore not accessible to many MSI researchers.FindingsWe have integrated 18 dedicated mass spectrometry imaging tools into the Galaxy framework to allow accessible, reproducible, and transparent data analysis. Our tools are based on Cardinal, MALDIquant, and scikit-image and enable all major MSI analysis steps such as quality control, visualization, preprocessing, statistical analysis, and image co-registration. Further, we created hands-on training material for use cases in proteomics and metabolomics. To demonstrate the utility of our tools, we re-analyzed a publicly available N-linked glycan imaging dataset. By providing the entire analysis history online, we highlight how the Galaxy framework fosters transparent and reproducible research.ConclusionThe Galaxy framework has emerged as a powerful analysis platform for the analysis of MSI data with ease of use and access together with high levels of reproducibility and transparency.


2019 ◽  
Author(s):  
Andrea Blasco ◽  
Michael G. Endres ◽  
Rinat A. Sergeev ◽  
Anup Jonchhe ◽  
Max Macaluso ◽  
...  

SummaryOpen data science and algorithm development competitions offer a unique avenue for rapid discovery of better computational strategies. We highlight three examples in computational biology and bioinformatics research where the use of competitions has yielded significant performance gains over established algorithms. These include algorithms for antibody clustering, imputing gene expression data, and querying the Connectivity Map (CMap). Performance gains are evaluated quantitatively using realistic, albeit sanitized, data sets. The solutions produced through these competitions are then examined with respect to their utility and the prospects for implementation in the field. We present the decision process and competition design considerations that lead to these successful outcomes as a model for researchers who want to use competitions and non-domain crowds as collaborators to further their research.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 1512 ◽  
Author(s):  
Jing Ming ◽  
Eric Verner ◽  
Anand Sarwate ◽  
Ross Kelly ◽  
Cory Reed ◽  
...  

In the era of Big Data, sharing neuroimaging data across multiple sites has become increasingly important. However, researchers who want to engage in centralized, large-scale data sharing and analysis must often contend with problems such as high database cost, long data transfer time, extensive manual effort, and privacy issues for sensitive data. To remove these barriers to enable easier data sharing and analysis, we introduced a new, decentralized, privacy-enabled infrastructure model for brain imaging data called COINSTAC in 2016. We have continued development of COINSTAC since this model was first introduced. One of the challenges with such a model is adapting the required algorithms to function within a decentralized framework. In this paper, we report on how we are solving this problem, along with our progress on several fronts, including additional decentralized algorithms implementation, user interface enhancement, decentralized regression statistic calculation, and complete pipeline specifications.


2020 ◽  
Vol 119 (12) ◽  
pp. 1862-1870 ◽  
Author(s):  
Hsueh-Wen Hsueh ◽  
Yi-Ching Chen ◽  
Chi-Fen Chang ◽  
Tyng-Guey Wang ◽  
Ming-Jang Chiu

Sign in / Sign up

Export Citation Format

Share Document