scholarly journals Data organization in spreadsheets

Author(s):  
Karl W Broman ◽  
Kara H. Woo

Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this paper offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses. The basic principles are: be consistent, write dates like YYYY-MM-DD, don't leave any cells empty, put just one thing in a cell, organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row), create a data dictionary, don't include calculations in the raw data files, don't use font color or highlighting as data, choose good names for things, make backups, use data validation to avoid data entry errors, and save the data in plain text file.

Author(s):  
Karl W Broman ◽  
Kara H. Woo

Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this paper offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses. The basic principles are: be consistent, write dates like YYYY-MM-DD, don't leave any cells empty, put just one thing in a cell, organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row), create a data dictionary, don't include calculations in the raw data files, don't use font color or highlighting as data, choose good names for things, make backups, use data validation to avoid data entry errors, and save the data in plain text files.


2018 ◽  
Author(s):  
Karl W Broman ◽  
Kara H. Woo

Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this paper offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses. The basic principles are: be consistent, write dates like YYYY-MM-DD, don't leave any cells empty, put just one thing in a cell, organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row), create a data dictionary, don't include calculations in the raw data files, don't use font color or highlighting as data, choose good names for things, make backups, use data validation to avoid data entry errors, and save the data in plain text files.


2021 ◽  
Vol 1 (2) ◽  
pp. 340-364
Author(s):  
Rui Araújo ◽  
António Pinto

Along with the use of cloud-based services, infrastructure, and storage, the use of application logs in business critical applications is a standard practice. Application logs must be stored in an accessible manner in order to be used whenever needed. The debugging of these applications is a common situation where such access is required. Frequently, part of the information contained in logs records is sensitive. In this paper, we evaluate the possibility of storing critical logs in a remote storage while maintaining its confidentiality and server-side search capabilities. To the best of our knowledge, the designed search algorithm is the first to support full Boolean searches combined with field searching and nested queries. We demonstrate its feasibility and timely operation with a prototype implementation that never requires access, by the storage provider, to plain text information. Our solution was able to perform search and decryption operations at a rate of, approximately, 0.05 ms per line. A comparison with the related work allows us to demonstrate its feasibility and conclude that our solution is also the fastest one in indexing operations, the most frequent operations performed.


2002 ◽  
Vol 1804 (1) ◽  
pp. 144-150
Author(s):  
Kenneth G. Courage ◽  
Scott S. Washburn ◽  
Jin-Tae Kim

The proliferation of traffic software programs on the market has resulted in many very specialized programs, intended to analyze one or two specific items within a transportation network. Consequently, traffic engineers use multiple programs on a single project, which ironically has resulted in new inefficiency for the traffic engineer. Most of these programs deal with the same core set of data, for example, physical roadway characteristics, traffic demand levels, and traffic control variables. However, most of these programs have their own formats for saving data files. Therefore, these programs cannot share information directly or communicate with each other because of incompatible data formats. Thus, the traffic engineer is faced with manually reentering common data from one program into another. In addition to inefficiency, this also creates additional opportunities for data entry errors. XML is catching on rapidly as a means for exchanging data between two systems or users who deal with the same data but in different formats. Specific vocabularies have been developed for statistics, mathematics, chemistry, and many other disciplines. The traffic model markup language (TMML) is introduced as a resource for traffic model data representation, storage, rendering, and exchange. TMML structure and vocabulary are described, and examples of their use are presented.


2019 ◽  
Vol 214 ◽  
pp. 04010
Author(s):  
Álvaro Fernández Casaní ◽  
Dario Barberis ◽  
Javier Sánchez ◽  
Carlos García Montoro ◽  
Santiago González de la Hoz ◽  
...  

The ATLAS EventIndex currently runs in production in order to build a complete catalogue of events for experiments with large amounts of data. The current approach is to index all final produced data files at CERN Tier0, and at hundreds of grid sites, with a distributed data collection architecture using Object Stores to temporarily maintain the conveyed information, with references to them sent with a Messaging System. The final backend of all the indexed data is a central Hadoop infrastructure at CERN; an Oracle relational database is used for faster access to a subset of this information. In the future of ATLAS, instead of files, the event should be the atomic information unit for metadata, in order to accommodate future data processing and storage technologies. Files will no longer be static quantities, possibly dynamically aggregating data, and also allowing event-level granularity processing in heavily parallel computing environments. It also simplifies the handling of loss and or extension of data. In this sense the EventIndex may evolve towards a generalized whiteboard, with the ability to build collections and virtual datasets for end users. This proceedings describes the current Distributed Data Collection Architecture of the ATLAS EventIndex project, with details of the Producer, Consumer and Supervisor entities, and the protocol and information temporarily stored in the ObjectStore. It also shows the data flow rates and performance achieved since the new Object Store as temporary store approach was put in production in July 2017. We review the challenges imposed by the expected increasing rates that will reach 35 billion new real events per year in Run 3, and 100 billion new real events per year in Run 4. For simulated events the numbers are even higher, with 100 billion events/year in run 3, and 300 billion events/year in run 4. We also outline the challenges we face in order to accommodate future use cases in the EventIndex.


2020 ◽  
Vol 6 ◽  
pp. 30-34
Author(s):  
Aydar Kadyirov ◽  
Yulia Karaeva ◽  
Ekaterina Vachagina

Ultrasonic treatment of heavy crude oils has been proven to manage oil viscosity and temperature sensitivity. In continuation of the previously published research results (Energy Safety and Energy Economy, iss. 5, 2019), we found out basic principles to predict dynamics of crude oil viscosity depending on time, power, and frequency of ultrasonic treatment. Viscosity control is essential for crude oil not only after its ultrasonic treatment but also while transporting and storage to keep energy efficiency of the entire process at the desired level.


1994 ◽  
Vol 48 (12) ◽  
pp. 1545-1552 ◽  
Author(s):  
Peter Lampen ◽  
Heinrich Hillig ◽  
Antony N. Davies ◽  
Michael Linscheid

JCAMP-DX has, for several years, been the standard form for the exchange of infrared spectral data. More recently JCAMP-DX protocols have been published for chemical structure data and for nuclear magnetic resonance spectroscopy. This publication presents a new JCAMP-DX data exchange protocol for mass spectrometry, covering the transport of single spectra, spectral series, and raw data files. The protocol can be implemented on any computer system and storage media. It is completely manufacturer independent. As with previous publications in this series, the aim is to provide reliable data transfer without loss of information regardless of the hardware or software involved. A comparison to the work on a binary protocol currently being carried out by the Analytical Instrument Association is also presented.


2020 ◽  
Author(s):  
Walter Henry Gunzburg ◽  
Myo Myint Aung ◽  
Pauline Toa ◽  
Shirelle Ng ◽  
Eliot Read ◽  
...  

Abstract Gut microbiota in humans and animals play an important role in health, aiding in digestion, regulation of the immune system and protection against pathogens. Changes or imbalances in the gut microbiota (dysbiosis) have been linked to a variety of local and systemic diseases, and there is growing evidence that restoring the balance of the microbiota can restore health. This can be achieved by oral delivery of members of the microbiome (including probiotics) or by fecal microbiome transplantation. In order to provide their health promoting effects, microbiota must survive (i) transport and storage (i.e. shelf life) and (ii) transit through the highly acid conditions in the stomach and bile salts in the small intestine. We have developed a cell encapsulation technology based on the natural polymer, cellulose sulphate (CS) that protects members of the microbiota from stomach acid and bile.


Sign in / Sign up

Export Citation Format

Share Document