scholarly journals Creating Detailed Metadata for an R Shiny Analysis of Rodent Behavior Sequence Data Detected Along One Light-Dark Cycle

2021 ◽  
Vol 15 ◽  
Author(s):  
Julien Colomb ◽  
York Winter

Automated mouse phenotyping through the high-throughput analysis of home cage behavior has brought hope of a more effective and efficient method for testing rodent models of diseases. Advanced video analysis software is able to derive behavioral sequence data sets from multiple-day recordings. However, no dedicated mechanisms exist for sharing or analyzing these types of data. In this article, we present a free, open-source software actionable through a web browser (an R Shiny application), which performs an analysis of home cage behavioral sequence data, which is designed to spot differences in circadian activity while preventing p-hacking. The software aligns time-series data to the light/dark cycle, and then uses different time windows to produce up to 162 behavior variables per animal. A principal component analysis strategy detected differences between groups. The behavior activity is represented graphically for further explorative analysis. A machine-learning approach was implemented, but it proved ineffective at separating the experimental groups. The software requires spreadsheets that provide information about the experiment (i.e., metadata), thus promoting a data management strategy that leads to FAIR data production. This encourages the publication of some metadata even when the data are kept private. We tested our software by comparing the behavior of female mice in videos recorded twice at 3 and 7 months in a home cage monitoring system. This study demonstrated that combining data management with data analysis leads to a more efficient and effective research process.

2021 ◽  
Author(s):  
York Winter ◽  
Julien Colomb

Automated mouse phenotyping through the high-throughput analysis of home cage behavior has brought hope of a more effective and efficient method for testing rodent models of diseases. Advanced video analysis software is able to derive behavioral sequence data sets from multiple-day recordings. However, no dedicated mechanisms exist for sharing or analyzing these types of data. In this article, we present a free, open-source software actionable through a web browser (an R Shiny application), which can perform state-of-the-art multidimensional analysis of homecage behavioral sequence data. The software aligns time-series data to the light/dark cycle, and then uses different time windows to produce up to 162 behavior variables per animal. It prevents p-hacking by providing an analysis that uses a principal component analysis strategy, while also representing the behavior graphically for further explorative analysis. A machine-learning approach was implemented, but it proved ineffective at separating the experimental groups. The software requires spreadsheets that provide information about the experiment (i.e., metadata), thus promoting a data management strategy that leads to FAIR data production. This encourages the publication of some metadata even when the data are kept private. We tested our software by comparing the behavior of female mice in videos recorded twice at 3and 7 months in a home cage monitoring system. This study demonstrated that combining data management with data analysis leads to a more efficient and effective research process.


2020 ◽  
Author(s):  
Paolo Oliveri ◽  
SImona Simoncelli ◽  
Pierluigi DI Pietro ◽  
Sara Durante

<p>One of the main challenges for the present and future in ocean observations is to find best practices for data management: infrastructures like Copernicus and SeaDataCloud already take responsibility for assembly, archive, update and publish data. Here we present the strengths and weaknesses in a SeaDataCloud Temperature and Salinity time series data collections, in particular a tool able to recognize the different devices and platforms and to merge them with processed Copernicus platforms.</p><p>While Copernicus has the main target to quickly acquire and publish data, SeaDataNet aims to publish data with the best quality available. This two data repository should be considered together, since the originator can ingest the data in both the infrastructures or only in one, or partially in both. This results sometimes in data partially available in Copernicus or SeaDataCloud, with great impact for the researcher who wants to access as much data as possible. The data reprocessing should not be loaded on researchers' shoulders, since only skilled users in all data management plan know how merge the data.</p><p>The SeaDataCloud time series data collections is a Global Ocean soon-to-be-published dataset that will represent a reference for ocean researchers, released in binary, user friendly Ocean Data View format. The database management plan was originally for profiles, but had been adapted for time series, resolving several issues like the uniqueness of the identifiers (ID).</p><p>Here we present an extension of the SOURCE (Sea Observations Utility for Reprocessing. Calibration and Evaluation) Python package, able to enhance the data quality with redundant sophisticated methods and simplify their usage. </p><p>SOURCE increases quality control (Q/C) performances on observations using statistical quality check procedures that follows the ocean best practices guidelines, exploiting the following  issues:</p><ol><li>Find and aggregate all broken time series using likeness in ID parameter strings;</li> <li>Find and organize in a dictionary all different metadata variables;</li> <li>Correct time series time to match simpler measure units;</li> <li>Filter devices that are outside of a selected horizontal rectangle;</li> <li>Give some information on original Q/C scheme by SeaDataCloud infrastructure;</li> <li>Give information tables on platforms and on the merged ID string duplicates together with an errors log file (missing time, depth, data, wrong Q/C variables, etc.).</li> </ol><p>In particular, the duplicates table and the log file may be helpful to SeaDataCloud partners in order to update the data collection and make it finally available for the users.</p><p>The reconstructed SeaDataCloud time series data, divided by parameter and stored in a more flexible dataset, give the possibility to ingest it in the main part of the software, allowing to compare it with Copernicus time series, find the same platform using horizontal and vertical surroundings (without looking to ID) find and cleanup  duplicated data, merge the two databases to extend the data coverage.</p><p>This allow researchers to have the most wide and the best quality possible data for the final users release and to to use these data to calibrate and validate models, in order to reach an idea of a whole area sea conditions.</p>


2020 ◽  
Author(s):  
Mark Amo-Boateng

ABSTRACTThe novel coronavirus disease (COVID-19) and pandemic has taken the world by surprise and simultaneously challenged the health infrastructure of every country. Governments have resorted to draconian measures to contain the spread of the disease despite its devastating effect on their economies and education. Tracking the novel coronavirus 2019 disease remains vital as it influences the executive decisions needed to tighten or ease restrictions meant to curb the pandemic. One-Dimensional (1D) Convolution Neural Networks (CNN) have been used classify and predict several time-series and sequence data. Here 1D-CNN is applied to the time-series data of confirmed COVID-19 cases for all reporting countries and territories. The model performance was 90.5% accurate. The model was used to develop an automated AI tracker web app (AI Country Monitor) and is hosted on https://aicountrymonitor.org. This article also presents a novel concept of pandemic response curves based on cumulative confirmed cases that can be use to classify the stage of a country or reporting territory. It is our firm believe that this Artificial Intelligence COVID-19 tracker can be extended to other domains such as the monitoring/tracking of Sustainable Development Goals (SDGs) in addition to monitoring and tracking pandemics.


2015 ◽  
Vol 18 (2) ◽  
pp. 198-209 ◽  
Author(s):  
Jeffrey M. Sadler ◽  
Daniel P. Ames ◽  
Shaun J. Livingston

The Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) hydrologic information system (HIS) is a widely used service oriented system for time series data management. While this system is intended to empower the hydrologic sciences community with better data storage and distribution, it lacks support for the kind of ‘Web 2.0’ collaboration and social-networking capabilities being used in other fields. This paper presents the design, development, and testing of a software extension of CUAHSI's newest product, HydroShare. The extension integrates the existing CUAHSI HIS into HydroShare's social hydrology architecture. With this extension, HydroShare provides integrated HIS time series with efficient archiving, discovery, and retrieval of the data, extensive creator and science metadata, scientific discussion and collaboration around the data and other basic social media features. HydroShare provides functionality for online social interaction and collaboration while the existing HIS provides the distributed data management and web services framework. The extension is expected to enable scientists to access and share both national- and laboratory-scale hydrologic time series datasets in a standards-based web services architecture combined with social media functionality developed specifically for the hydrologic sciences.


2016 ◽  
Vol 97 (9) ◽  
pp. 1573-1581 ◽  
Author(s):  
John J. Bates ◽  
Jeffrey L. Privette ◽  
Edward J. Kearns ◽  
Walter Glance ◽  
Xuepeng Zhao

Abstract The key objective of the NOAA Climate Data Record (CDR) program is the sustained production of high-quality, multidecadal time series data describing the global atmosphere, oceans, and land surface that can be used for informed decision-making. The challenges of a long-term program of sustaining CDRs, as contrasted with short-term efforts of traditional 3-yr research programs, are substantial. The sustained production of CDRs requires collaboration between experts in the climate community, data management, and software development and maintenance. It is also informed by scientific application and associated user feedback on the accessibility and usability of the produced CDRs. The CDR program has developed a metric for assessing the maturity of CDRs with respect to data management, software, and user application and applied it to over 30 CDRs. The main lesson learned over the past 7 years is that a rigorous team approach to data management, employing subject matter experts at every step, is critical to open and transparent production. This approach also makes it much easier to support the needs of users who want near-real-time production of CDRs for monitoring and users who want to use CDRs for tailored, derived information, such as a drought index.


2020 ◽  
Author(s):  
Sebastian Drost ◽  
Jan Speckamp ◽  
Carsten Hollmann ◽  
Christian Malewski ◽  
Matthes Rieke ◽  
...  

<p>The collection of hydrological measurement data comprises a broad range of challenges beyond the development and deployment of sensing devices. Especially the transmission of the collected (raw) data to central data servers may be a challenging task depending on the available infrastructure.</p><p>In our presentation we will discuss the applicability of Internet of Things (IoT) technologies to enable a lightweight data collection workflow relying on the Message Queuing Telemetry Transport (MQTT) protocol as well as the SensorThings API standard of the Open Geospatial Consortium (OGC). These standards are especially optimised to reduce communication overheads, to be viable via resource constrained communication links, and to support a seamless plug-and-play integration of new measurement devices.</p><p>As part of this presentation, we will introduce the communication patterns and messages used by the data collection mechanism. This will be combined with a discussion how these IoT standards can be coupled to existing sensor hardware and which types of communication link can be used. Afterwards, we will also discuss the design of a data management server that integrates the collected measurement data. This comprises on the one hand connectors to the IoT data streams but on the other hand also data management and storage functionality, as well as interoperable interfaces for sharing the collected data.</p><p>For the validation of the presented concept, a pre-operational deployment at the Wupperverband, a regional water management association in Germany, will be shown. This comprises not only the practical experiences gained during the operation but also recommendations on future challenges such as semantic interoperability (e.g. vocabularies) as well as the efficient management of large amounts of incoming time series data (e.g. via dedicated database concepts).</p><p>Thus, in summary our contribution aims to contribute to the discussion on how IoT technologies may help to facilitate the collection of hydrological measurement data and to support the sharing of such data.</p>


Author(s):  
Ovidiu Popa ◽  
Ellen Oldenburg ◽  
Oliver Ebenhöh

Today massive amounts of sequenced metagenomic and -transcriptomic data from different ecological niches and environmental locations are available. Scientific progress depends critically on methods that allow extracting useful information from the various types of sequence data. Here, we will first discuss types of information contained in the various flavours of biological sequence data, and how this information can be interpreted to increase our scientific knowledge and understanding. We argue that a mechanistic understanding is required to consistently interpret experimental observations, and that this understanding is greatly facilitated by the generation and analysis of dynamic mathematical models. We conclude that, in order to construct mathematical models and to test mechanistic hypotheses, time-series data is of critical importance. We review diverse techniques to analyse time-series data and discuss various approaches by which time-series of biological sequence data was successfully used to derive and test mechanistic hypotheses. Analysing the bottlenecks of current strategies in the extraction of knowledge and understanding from data, we conclude that combined experimental and theoretical efforts should be implemented as early as possible during the planning phase of individual experiments and scientific research projects.


Sign in / Sign up

Export Citation Format

Share Document