scholarly journals Extending HydroShare to enable hydrologic time series data as social media

2015 ◽  
Vol 18 (2) ◽  
pp. 198-209 ◽  
Author(s):  
Jeffrey M. Sadler ◽  
Daniel P. Ames ◽  
Shaun J. Livingston

The Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) hydrologic information system (HIS) is a widely used service oriented system for time series data management. While this system is intended to empower the hydrologic sciences community with better data storage and distribution, it lacks support for the kind of ‘Web 2.0’ collaboration and social-networking capabilities being used in other fields. This paper presents the design, development, and testing of a software extension of CUAHSI's newest product, HydroShare. The extension integrates the existing CUAHSI HIS into HydroShare's social hydrology architecture. With this extension, HydroShare provides integrated HIS time series with efficient archiving, discovery, and retrieval of the data, extensive creator and science metadata, scientific discussion and collaboration around the data and other basic social media features. HydroShare provides functionality for online social interaction and collaboration while the existing HIS provides the distributed data management and web services framework. The extension is expected to enable scientists to access and share both national- and laboratory-scale hydrologic time series datasets in a standards-based web services architecture combined with social media functionality developed specifically for the hydrologic sciences.

Hydrology ◽  
2018 ◽  
Vol 5 (4) ◽  
pp. 66 ◽  
Author(s):  
Wade Roberts ◽  
Gustavious P. Williams ◽  
Elise Jackson ◽  
E. James Nelson ◽  
Daniel P. Ames

Hydrologists use a number of tools to compare model results to observed flows. These include tools to pre-process the data, data frames to store and access data, visualization and plotting routines, error metrics for single realizations, and ensemble metrics for stochastic realizations to calibrate and evaluate hydrologic models. We present an open-source Python package to help characterize predicted and observed hydrologic time series data called hydrostats which has three main capabilities: Data storage and retrieval based on the Python Data Analysis Library (pandas), visualization and plotting routines using Matplotlib, and a metrics library that currently contains routines to compute over 70 different error metrics and routines for ensemble forecast skill scores. Hydrostats data storage and retrieval functions allow hydrologists to easily compare all, or portions of, a time series. For example, it makes it easy to compare observed and modeled data only during April over a 30-year period. The package includes literature references, explanations, examples, and source code. In this note, we introduce the hydrostats package, provide short examples of the various capabilities, and provide some background on programming issues and practices. The hydrostats package provides a range of tools to make characterizing and analyzing model data easy and efficient. The electronic supplement provides working hydrostats examples.


2020 ◽  
Author(s):  
Paolo Oliveri ◽  
SImona Simoncelli ◽  
Pierluigi DI Pietro ◽  
Sara Durante

<p>One of the main challenges for the present and future in ocean observations is to find best practices for data management: infrastructures like Copernicus and SeaDataCloud already take responsibility for assembly, archive, update and publish data. Here we present the strengths and weaknesses in a SeaDataCloud Temperature and Salinity time series data collections, in particular a tool able to recognize the different devices and platforms and to merge them with processed Copernicus platforms.</p><p>While Copernicus has the main target to quickly acquire and publish data, SeaDataNet aims to publish data with the best quality available. This two data repository should be considered together, since the originator can ingest the data in both the infrastructures or only in one, or partially in both. This results sometimes in data partially available in Copernicus or SeaDataCloud, with great impact for the researcher who wants to access as much data as possible. The data reprocessing should not be loaded on researchers' shoulders, since only skilled users in all data management plan know how merge the data.</p><p>The SeaDataCloud time series data collections is a Global Ocean soon-to-be-published dataset that will represent a reference for ocean researchers, released in binary, user friendly Ocean Data View format. The database management plan was originally for profiles, but had been adapted for time series, resolving several issues like the uniqueness of the identifiers (ID).</p><p>Here we present an extension of the SOURCE (Sea Observations Utility for Reprocessing. Calibration and Evaluation) Python package, able to enhance the data quality with redundant sophisticated methods and simplify their usage. </p><p>SOURCE increases quality control (Q/C) performances on observations using statistical quality check procedures that follows the ocean best practices guidelines, exploiting the following  issues:</p><ol><li>Find and aggregate all broken time series using likeness in ID parameter strings;</li> <li>Find and organize in a dictionary all different metadata variables;</li> <li>Correct time series time to match simpler measure units;</li> <li>Filter devices that are outside of a selected horizontal rectangle;</li> <li>Give some information on original Q/C scheme by SeaDataCloud infrastructure;</li> <li>Give information tables on platforms and on the merged ID string duplicates together with an errors log file (missing time, depth, data, wrong Q/C variables, etc.).</li> </ol><p>In particular, the duplicates table and the log file may be helpful to SeaDataCloud partners in order to update the data collection and make it finally available for the users.</p><p>The reconstructed SeaDataCloud time series data, divided by parameter and stored in a more flexible dataset, give the possibility to ingest it in the main part of the software, allowing to compare it with Copernicus time series, find the same platform using horizontal and vertical surroundings (without looking to ID) find and cleanup  duplicated data, merge the two databases to extend the data coverage.</p><p>This allow researchers to have the most wide and the best quality possible data for the final users release and to to use these data to calibrate and validate models, in order to reach an idea of a whole area sea conditions.</p>


2017 ◽  
Author(s):  
Marco T. Bastos ◽  
Dan Mercea ◽  
Arthur Charpentier

Recent protests have fuelled deliberations about the extent to which social media ignites popular uprisings. In this paper we use time-series data of Twitter, Facebook, and onsite protests to assess the Granger-causality between social media streams and onsite developments at the Indignados, Occupy, and Brazilian Vinegar protests. After applying a Gaussianization procedure to the data, we found that contentious communication on Twitter and Facebook forecasted onsite protest during the Indignados and Occupy protests, with bidirectional Granger-causality between online and onsite protest in the Occupy series. Conversely, the Vinegar demonstrations presented Granger-causality between Facebook and Twitter communication, and separately between protestors and injuries/arrests onsite. We conclude that the effective forecasting of protest activity likely varies across different instances of political unrest.


2021 ◽  
Author(s):  
Shoko Wakamiya ◽  
Osamu Morimoto ◽  
Katsuhiro Omichi ◽  
Hideyuki Hara ◽  
Ichiro Kawase ◽  
...  

BACKGROUND Health-related social media data are increasingly being used in disease surveillance studies. In particular, surveillance of infectious diseases such as influenza has demonstrated high correlations between the number of social media posts mentioning the disease and the number of patients who went to the hospital and were diagnosed with the disease. However, the prevalence of some diseases, such as allergic rhinitis, cannot be estimated based on the number of patients alone. Specifically, patients with allergic rhinitis self-medicate by taking over-the-counter (OTC) medications without going to the hospital. Although allergic rhinitis is not a life-threatening disease, it is a major social problem because it reduces patients’ quality of life, making it essential to understand its prevalence and the motives for self-medication behavior. OBJECTIVE To help understand the prevalence of allergic rhinitis and the motives for self-care treatment using social media data, this study investigated the relationship between the number of social media posts mentioning the main symptoms of allergic rhinitis and the sales volume of OTC rhinitis medications in Japan. METHODS We collected tweets over four years from 2017 to 2020 that included keywords corresponding to the main nasal symptoms of allergic rhinitis: “sneezing,” “runny nose,” and “stuffy nose.” We also obtained the sales volume of OTC drugs, including oral medications and nasal sprays, for the same period. We then calculated the Pearson correlation coefficient between time series data on the number of tweets per week and time series data on the sales volume of OTC drugs per week. RESULTS The results showed a much higher correlation (0.8432) between the time series data on the number of tweets mentioning “stuffy nose” and the time series data on the sales volume of nasal sprays than for the other two symptoms. There was also a high correlation (0.9317) between the seasonal components of these time series data. CONCLUSIONS We investigated the relationships between social media data and behavioral patterns, such as OTC drug sales volume. Exploring these relationships would be useful as a marketing indicator to predict sales volume using social media data. In future, in-depth investigations are required to cover other diseases and countries. We investigated the relationships between social media data and behavioral patterns, such as OTC drug sales volume. Exploring these relationships would be useful as a marketing indicator to predict sales volume using social media data. In future, in-depth investigations are required to cover other diseases and countries.


Sign in / Sign up

Export Citation Format

Share Document