scholarly journals How to share data for collaboration

Author(s):  
Shannon E Ellis ◽  
Jeffrey T Leek

Within the statistics community, a number of guiding principles for sharing data have emerged; however, these principles are not always made clear to collaborators generating the data. To bridge this divide, we have established a set of guidelines for sharing data. In these, we highlight the need to provide raw data to the statistician, the importance of consistent formatting, and the necessity of including all essential experimental information and pre-processing steps carried out to the statistician. With these guidelines we hope to avoid errors and delays in data analysis.

2017 ◽  
Author(s):  
Shannon E Ellis ◽  
Jeffrey T Leek

Within the statistics community, a number of guiding principles for sharing data have emerged; however, these principles are not always made clear to collaborators generating the data. To bridge this divide, we have established a set of guidelines for sharing data. In these, we highlight the need to provide raw data to the statistician, the importance of consistent formatting, and the necessity of including all essential experimental information and pre-processing steps carried out to the statistician. With these guidelines we hope to avoid errors and delays in data analysis.


2017 ◽  
Author(s):  
Shannon E Ellis ◽  
Jeffrey T Leek

Within the statistics community, a number of guiding principles for sharing data have emerged; however, these principles are not always made clear to collaborators generating the data. To bridge this divide, we have established a set of guidelines for sharing data. In these, we highlight the need to provide raw data to the statistician, the importance of consistent formatting, and the necessity of including all essential experimental information and pre-processing steps carried out to the statistician. With these guidelines we hope to avoid errors and delays in data analysis.


2017 ◽  
Author(s):  
Shannon E Ellis ◽  
Jeffrey T Leek

Within the statistics community, a number of guiding principles for sharing data have emerged; however, these principles are not always made clear to collaborators generating the data. To bridge this divide, we have established a set of guidelines for sharing data. In these, we highlight the need to provide raw data to the statistician, the importance of consistent formatting, and the necessity of including all essential experimental information and pre-processing steps carried out to the statistician. With these guidelines we hope to avoid errors and delays in data analysis.


2017 ◽  
Author(s):  
Shannon E Ellis ◽  
Jeffrey T Leek

Within the statistics community, a number of guiding principles for sharing data have emerged; however, these principles are not always made clear to collaborators generating the data. To bridge this divide, we have established a set of guidelines for sharing data. In these, we highlight the need to provide raw data to the statistician, the importance of consistent formatting, and the necessity of including all essential experimental information and pre-processing steps carried out to the statistician. With these guidelines we hope to avoid errors and delays in data analysis.


2017 ◽  
Author(s):  
Shannon E Ellis ◽  
Jeffrey T Leek

Within the statistics community, a number of guiding principles for sharing data have emerged; however, these principles are not always made clear to collaborators generating the data. To bridge this divide, we have established a set of guidelines for sharing data. In these, we highlight the need to provide raw data to the statistician, the importance of consistent formatting, and the necessity of including all essential experimental information and pre-processing steps carried out to the statistician. With these guidelines we hope to avoid errors and delays in data analysis.


Universe ◽  
2021 ◽  
Vol 7 (3) ◽  
pp. 64
Author(s):  
Ivan A. Soldatenkov ◽  
Anastasiya A. Yakovenko ◽  
Vitaly B. Svetovoy

Technological progress has made possible precise measurements of the Casimir forces at distances less than 100 nm. It has enabled stronger constraints on the non-Newtonian forces at short separations and improved control of micromechanical devices. Experimental information on the forces below 30 nm is sparse and not precise due to pull-in instability and surface roughness. Recently, a method of adhered cantilever was proposed to measure the forces at small distances, which does not suffer from the pull-in instability. Deviation of the cantilever from a classic shape carries information on the forces acting nearby the adhered end. We calculate the force between a flat cantilever and rough Au plate and demonstrate that the effect of roughness dominates when the bodies approach the contact. Short-distance repulsion operating at the contact is included in the analysis. Deviations from the classic shape due to residual stress, inhomogeneous thickness of the cantilever, and finite compliance of the substrate are analysed. It is found that a realistic residual stress gives a negligible contribution to the shape, while the finite compliance and inhomogeneous thickness give measurable contributions that have to be subtracted from the raw data.


2016 ◽  
Vol 49 (1) ◽  
pp. 302-310 ◽  
Author(s):  
Michael Kachala ◽  
John Westbrook ◽  
Dmitri Svergun

Recent advances in small-angle scattering (SAS) experimental facilities and data analysis methods have prompted a dramatic increase in the number of users and of projects conducted, causing an upsurge in the number of objects studied, experimental data available and structural models generated. To organize the data and models and make them accessible to the community, the Task Forces on SAS and hybrid methods for the International Union of Crystallography and the Worldwide Protein Data Bank envisage developing a federated approach to SAS data and model archiving. Within the framework of this approach, the existing databases may exchange information and provide independent but synchronized entries to users. At present, ways of exchanging information between the various SAS databases are not established, leading to possible duplication and incompatibility of entries, and limiting the opportunities for data-driven research for SAS users. In this work, a solution is developed to resolve these issues and provide a universal exchange format for the community, based on the use of the widely adopted crystallographic information framework (CIF). The previous version of the sasCIF format, implemented as an extension of the core CIF dictionary, has been available since 2000 to facilitate SAS data exchange between laboratories. The sasCIF format has now been extended to describe comprehensively the necessary experimental information, results and models, including relevant metadata for SAS data analysis and for deposition into a database. Processing tools for these files (sasCIFtools) have been developed, and these are available both as standalone open-source programs and integrated into the SAS Biological Data Bank, allowing the export and import of data entries as sasCIF files. Software modules to save the relevant information directly from beamline data-processing pipelines in sasCIF format are also developed. This update of sasCIF and the relevant tools are an important step in the standardization of the way SAS data are presented and exchanged, to make the results easily accessible to users and to promote further the application of SAS in the structural biology community.


2020 ◽  
Author(s):  
Alessandra Maciel Paz Milani ◽  
Fernando V. Paulovich ◽  
Isabel Harb Manssour

Analyzing and managing raw data are still a challenging part of the data analysis process, mainly regarding data preprocessing. Although we can find studies proposing design implications or recommendations for visualization solutions in the data analysis scope, they do not focus on challenges during the preprocessing phase. Likewise, the current Visual Analytics processes do not consider preprocessing an equally important stage in their process. Thus, with this study, we aim to contribute to the discussion of how we can use and combine methods of visualization and data mining to assist data analysts during the preprocessing activities. To achieve that, we introduce the Preprocessing Profiling Model for Visual Analytics, which contemplates a set of features to inspire the implementation of new solutions. In turn, these features were designed considering a list of insights we obtained during an interview study with thirteen data analysts. Our contributions can be summarized as offering resources to promote a shift to a visual preprocessing.


2021 ◽  
Author(s):  
Melanie Christine Föll ◽  
Veronika Volkmann ◽  
Kathrin Enderle-Ammour ◽  
Konrad Wilhelm ◽  
Dan Guo ◽  
...  

Background: Mass spectrometry imaging (MSI) derives spatial molecular distribution maps directly from clinical tissue specimens. This allows for spatial characterization of molecular compositions of different tissue types and tumor subtypes, which bears great potential for assisting pathologists with diagnostic decisions or personalized treatments. Unfortunately, progress in translational MSI is often hindered by insufficient quality control and lack of reproducible data analysis. Raw data and analysis scripts are rarely publicly shared. Here, we demonstrate the application of the Galaxy MSI tool set for the reproducible analysis of an urothelial carcinoma dataset. Methods: Tryptic peptides were imaged in a cohort of 39 formalin-fixed, paraffin-embedded human urothelial cancer tissue cores with a MALDI-TOF/TOF device. The complete data analysis was performed in a fully transparent and reproducible manner on the European Galaxy Server. Annotations of tumor and stroma were performed by a pathologist and transferred to the MSI data to allow for supervised classifications of tumor vs. stroma tissue areas as well as for muscle-infiltrating and non-muscle invasive urothelial carcinomas. For putative peptide identifications, m/z features were matched to the MSiMass list. Results: Rigorous quality control in combination with careful pre-processing enabled reduction of m/z shifts and intensity batch effects. High classification accuracy was found for both, tumor vs. stroma and muscle-infiltrating vs. non-muscle invasive tumors. Some of the most discriminative m/z features for each condition could be assigned a putative identity: Stromal tissue was characterized by collagen type I peptides and tumor tissue by histone and heat shock protein beta-1 peptides. Intermediate filaments such as cytokeratins and vimentin were discriminative between the tumors with different muscle-infiltration status. To make the study fully reproducible and to advocate the criteria of FAIR (findability, accessibility, interoperability, and reusability) research data, we share the raw data, spectra annotations as well as all Galaxy histories and workflows. Data are available via ProteomeXchange with identifier PXD026459 and Galaxy results via https://github.com/foellmelanie/Bladder_MSI_Manuscript_Galaxy_links. Conclusion: Here, we show that translational MSI data analysis in a fully transparent and reproducible manner is possible and we would like to encourage the community to join our efforts.


2016 ◽  
Vol 44 (3) ◽  
pp. 20150057
Author(s):  
Gokhan Kilic ◽  
Mehmet S. Unluturk

Sign in / Sign up

Export Citation Format

Share Document