When Data Management Meets Project Management

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37224 ◽

2019 ◽

Vol 3 ◽

Author(s):

Evgeniy Meyke

Keyword(s):

User Interfaces ◽

Web Application ◽

Laboratory Data ◽

Data Entry ◽

Data Types ◽

Laboratory Notebook ◽

Data Repositories ◽

Institutional Infrastructure ◽

Feature Request ◽

High Flexibility

Complex projects that collect, curate and analyse biodiversity data are often presented with the challenge of accommodating diverse data types, various curation and output workflows, and evolving project logistics that require rapid changes in the applications and data structures. At the same time, sustainability concerns and maintenance overheads pose a risk to the long term viability of such projects. We advocate the use of flexible, multiplatform tools that adapt to operational, day-to-day challenges while providing a robust, cost efficient, and maintainable framework that serves the needs data collectors, managers and users. EarthCape is a highly versatile platform for managing biodiversity research and collections data, associated molecular laboratory data (Fig. 1), multimedia, structured ecological surveys and monitoring schemes, and more. The platform includes a fully functional Windows client as well as a web application. The data are stored in the cloud or on-premises and can be accessed by users with various access and editing rights. Ease of customization (making changes to user interface and functionality) is critical for most environments that deal with operational research processes. For active researchers and curators, there is rarely time to wait for a cycle of development that follows a change or feature request. In EarthCape, most of the changes to the default setup can be implemented by the end users with minimum effort and require no programming skills. High flexibility and a range of customisation options is complemented with mapping to Darwin Core standard and integration with GBIF, Geolocate, Genbank, and Biodiversity Heritage Library APIs. The system is currently used daily for rapid data entry, digitization and sample tracking, by such organisations as Imperial College, University of Cambridge, University of Helsinki, University of Oxford. Being an operational data entry and retrieval tool, EarthCape sits at the bottom of Virtual Research Environments ecosystem. It is not a software or platform to build data repositories, but rather a very focused tool falling under "back office" software category. Routine label printing, laboratory notebook maintenance, rapid data entry set up, or any other of relatively loaded user interfaces make use of any industry standard relational database back end. This opens a wide scope for IT designers to implement desired integrations within their institutional infrastructure. APIs and developer access to core EarthCape libraries to build own applications and modules are under development. Basic data visualisation (charts, pivots, dashboards), mapping (full featured desktop GIS module), data outputs (report and label designer) are tailored not only to research analyses, but also for managing logistics and communication when working on (data) papers. The presentation will focus on the software platform featuring most prominent use cases from two areas: ecological research (managing complex network data digitization project) and museum collections management (herbarium and insect collections).

Download Full-text

TaxonWorks: An experience of migrating large datasets into the new cybertaxonomic infrastructure

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37159 ◽

2019 ◽

Vol 3 ◽

Author(s):

Dmitry Dmitriev

Keyword(s):

Web Application ◽

Data Entry ◽

Large Datasets ◽

Character Trait ◽

Data Migration ◽

Data Types ◽

Web Interfaces ◽

Illinois Natural History Survey ◽

Efficient Data ◽

History Survey

TaxonWorks (http://taxonworks.org) in an integrated, open-source, cybertaxonomic web application serving taxonomists and biodiversity scientists. It is designed to facilitate efficient data capture, storage, manipulation, and retrieval. It integrates a wide variety of data types used by biodiversity scientists, including, but not limited to, taxonomy (with validation based on codes of zoological, botanical, bacterial, and viral nomenclature), specimen data, bibliographies, media (images, PDFs, sounds, videos), morphology (character/trait matrices), distribution, biological associations. Available TaxonWorks web interfaces currently provide various data entry forms for simple and advanced querying of the database. TaxonWorks has integrated batch uploader functionality. But, for larger datasets, specialized migration scripts were used. Several projects, historically build in 3i (http://dmitriev.speciesfile.org), MX (http://mx.phenomix.org), SpeciesFiles (http://software.speciesfile.org), and other databases, have been or are being migrated into TaxonWorks. Of the projects moving into TaxonWorks, it is worth mentioning several: 3i World Auchenorrhyncha Database, LepIndex, Universal Chalcidoidea Database, Orthoptera SpeciesFile, Plecoptera SpeciesFile, Illinois Natural History Survey Insect Collection database, and several others. An experience of the data migration will be shared during the presentation.

Download Full-text

SigTools: Exploratory Visualization for Genomic Signals

10.1101/2021.08.02.454408 ◽

2021 ◽

Author(s):

Shohre Masoumi ◽

Maxwell W. Libbrecht ◽

Kay C. Wiese

Keyword(s):

User Interfaces ◽

Web Application ◽

Data Sets ◽

Use Case ◽

Data Types ◽

Sequencing Technologies ◽

Model Training ◽

And Behavior ◽

Genomic Signals ◽

The Given

Motivation: With the advancement of sequencing technologies, genomic data sets are constantly being expanded by high volumes of different data types. One recently introduced data type in genomic science is genomic signals, which are usually short-read coverage measurements over the genome. An example of genomic signals is Epigenomic marks which are utilized to locate functional and nonfunctional elements in genome annotation studies. To understand and evaluate the results of such studies, one needs to understand and analyze the characteristics of the input data. Results: SigTools is an R-based genomic signals visualization package developed with two objectives: 1) to facilitate genomic signals exploration in order to uncover insights for later model training, refinement, and development by including distribution and autocorrelation plots. 2) to enable genomic signals interpretation by including correlation, and aggregation plots. Moreover, Sigtools also provides text-based descriptive statistics of the given signals which can be practical when developing and evaluating learning models. We also include results from 2 case studies. The first examines several previously studied genomic signals called histone modifications. This use case demonstrates how SigTools can be beneficial for satisfying scientists curiosity in exploring and establishing recognized datasets. The second use case examines a dataset of novel chromatin state features which are novel genomic signals generated by a learning model. This use case demonstrates how SigTools can assist in exploring the characteristics and behavior of novel signals towards their interpretation. In addition, our corresponding web application, SigTools-Shiny, extends the accessibility scope of these modules to people who are more comfortable working with graphical user interfaces instead of command-line tools.

Download Full-text

Improvements for research data repositories: The case of text spam

Journal of Information Science ◽

10.1177/0165551521998636 ◽

2021 ◽

pp. 016555152199863

Author(s):

Ismael Vázquez ◽

María Novo-Lourés ◽

Reyes Pavón ◽

Rosalía Laza ◽

José Ramón Méndez ◽

...

Keyword(s):

Web Application ◽

Research Data ◽

Data Sets ◽

Data Repositories ◽

Software Applications ◽

Public Data ◽

Protection Mechanisms ◽

Experimental Protocols ◽

Learning Research ◽

Processing Steps

Current research has evolved in such a way scientists must not only adequately describe the algorithms they introduce and the results of their application, but also ensure the possibility of reproducing the results and comparing them with those obtained through other approximations. In this context, public data sets (sometimes shared through repositories) are one of the most important elements for the development of experimental protocols and test benches. This study has analysed a significant number of CS/ML ( Computer Science/ Machine Learning) research data repositories and data sets and detected some limitations that hamper their utility. Particularly, we identify and discuss the following demanding functionalities for repositories: (1) building customised data sets for specific research tasks, (2) facilitating the comparison of different techniques using dissimilar pre-processing methods, (3) ensuring the availability of software applications to reproduce the pre-processing steps without using the repository functionalities and (4) providing protection mechanisms for licencing issues and user rights. To show the introduced functionality, we created STRep (Spam Text Repository) web application which implements our recommendations adapted to the field of spam text repositories. In addition, we launched an instance of STRep in the URL https://rdata.4spam.group to facilitate understanding of this study.

Download Full-text

Domain Specific Design of User Interfaces - Case Handling and Data Entry Problems

Critical Issues in User Interface Systems Engineering ◽

10.1007/978-1-4471-1001-9_2 ◽

1996 ◽

pp. 21-36 ◽

Cited By ~ 1

Author(s):

Jan Gulliksen ◽

Bengt Sandblad

Keyword(s):

User Interfaces ◽

Data Entry ◽

Specific Design ◽

Domain Specific

Download Full-text

Improving the Usability of Organizational Data Systems

International Journal of Digital Curation ◽

10.2218/ijdc.v16i1.592 ◽

2021 ◽

Vol 16 (1) ◽

pp. 21

Author(s):

Chung-Yi Hou ◽

Matthew S. Mayernik

Keyword(s):

User Interfaces ◽

User Study ◽

Data Repository ◽

Data Systems ◽

User Interactions ◽

Data Repositories ◽

Web Interfaces ◽

Design And Implementation ◽

Usability Assessment ◽

Assessment Techniques

For research data repositories, web interfaces are usually the primary, if not the only, method that data users have to interact with repository systems. Data users often search, discover, understand, access, and sometimes use data directly through repository web interfaces. Given that sub-par user interfaces can reduce the ability of users to locate, obtain, and use data, it is important to consider how repositories’ web interfaces can be evaluated and improved in order to ensure useful and successful user interactions. This paper discusses how usability assessment techniques are being applied to improve the functioning of data repository interfaces at the National Center for Atmospheric Research (NCAR). At NCAR, a new suite of data system tools is being developed and collectively called the NCAR Digital Asset Services Hub (DASH). Usability evaluation techniques have been used throughout the NCAR DASH design and implementation cycles in order to ensure that the systems work well together for the intended user base. By applying user study, paper prototype, competitive analysis, journey mapping, and heuristic evaluation, the NCAR DASH Search and Repository experiences provide examples for how data systems can benefit from usability principles and techniques. Integrating usability principles and techniques into repository system design and implementation workflows helps to optimize the systems’ overall user experience.

Download Full-text

Nextstrain: real-time tracking of pathogen evolution

10.1101/224048 ◽

2017 ◽

Cited By ~ 21

Author(s):

James Hadfield ◽

Colin Megill ◽

Sidney M. Bell ◽

John Huddleston ◽

Barney Potter ◽

...

Keyword(s):

Public Health ◽

Real Time ◽

Web Application ◽

Sequence Data ◽

Data Types ◽

Bioinformatics Pipeline ◽

Public Health Importance ◽

Viral Genomes ◽

Effective Public Health ◽

Interactive Visualisation

AbstractSummaryUnderstanding the spread and evolution of pathogens is important for effective public health measures and surveillance. Nextstrain consists of a database of viral genomes, a bioinformatics pipeline for phylodynamics analysis, and an interactive visualisation platform. Together these present a real-time view into the evolution and spread of a range of viral pathogens of high public health importance. The visualization integrates sequence data with other data types such as geographic information, serology, or host species. Nextstrain compiles our current understanding into a single accessible location, publicly available for use by health professionals, epidemiologists, virologists and the public alike.Availability and implementationAll code (predominantly JavaScript and Python) is freely available from github.com/nextstrain and the web-application is available at nextstrain.org.

Download Full-text

HL7 Terminology Management for Disease Surveillance

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v10i1.8364 ◽

2018 ◽

Vol 10 (1) ◽

Author(s):

Emily Roberts ◽

Theron Jeppson ◽

Rachelle Boulton ◽

Josh Ridderhoff

Keyword(s):

Disease Surveillance ◽

Communicable Disease ◽

Laboratory Data ◽

Data Entry ◽

High Volume ◽

Electronic Data ◽

Disease Surveillance System ◽

Department Of Health ◽

Message Structure ◽

Set Up

Objective: The objective of this abstract is to illustrate how the Utah Department of Health processes a high volume of electronic data. We do this by translating what reporters send within an HL7 message into "epidemiologist" language for consumption into our disease surveillance system.Introduction: In 2013, the Utah Department of Health (UDOH) began working with hospital and reference laboratories to implement electronic laboratory reporting (ELR) of reportable communicable disease data. Laboratories utilize HL7 message structure and standard terminologies such as LOINC and SNOMED to send data to UDOH. These messages must be evaluated for validity, translated, and entered into Utah’s communicable disease surveillance system (UT-NEDSS), where they can be accessed by local and state investigators and epidemiologists. Despite the development and use of standardized terminologies, reporters may use different, outdated versions of these terminologies, may not use the appropriate codes, or may send local, home-grown terminologies. These variations cause problems when trying to interpret test results and automate data processing. UDOH has developed a two-step translation process that allows us to first standardize and clean incoming messages, and then translate them for consumption by UT-NEDSS. These processes allow us to efficiently manage several different terminologies and helps to standardize incoming data, maintain data quality, and streamline the data entry process.Methods: UDOH uses the Electronic Message Staging Area (EMSA) to receive ELR messages, manage terminologies such as LOINC and SNOMED, translate messages, and automatically enter laboratory data into UT-NEDSS. LOINCs and other terms, such as facility name, sent by reporting facilities in an HL7 message are considered child terms. All child terms are mapped to a master LOINC or term and each master LOINC or term is mapped to a specific value within UT-NEDSS. In EMSA, the rules engine used for automated processing of electronic data is set to run at the master level and these rules will determine how the message is processed. No rules are set up or run on child terms.Results: As of 09/20/2017, EMSA contains 2,613 unique child LOINCs that are mapped to 906 master LOINCs. Those 906 master LOINCs are mapped to 179 UT-NEDSS test types and 2003 child facility names are mapped to 1043 master facility namesConclusions: Mapping child terminologies from an HL7 message to a master vocabulary helps us to standardize incoming data, allows us to accept non-standard terminologies and correct reporting errors. Translating this data into a format that is understandable to epidemiologists and investigators enables UT-NEDSS to work effectively in identifying outbreaks and improving health outcomes. This framework is working for ELR and will continue to grow and accept more data and the different terminologies that come with that.

Download Full-text

LandScape: a web application for interactive genomic summary visualization

10.1101/866087 ◽

2019 ◽

Author(s):

Wenlong Jia ◽

Hechen Li ◽

Shiying Li ◽

Shuaicheng Li

Keyword(s):

Genetic Information ◽

Web Application ◽

Genomic Research ◽

File Format ◽

Data Types ◽

Web Based ◽

Link Type ◽

Level Data ◽

Real Time Visualization ◽

Information Landscape

ABSTRACTSummaryVisualizing integrated-level data from genomic research remains a challenge, as it requires sufficient coding skills and experience. Here, we present LandScapeoviz, a web-based application for interactive and real-time visualization of summarized genetic information. LandScape utilizes a well-designed file format that is capable of handling various data types, and offers a series of built-in functions to customize the appearance, explore results, and export high-quality diagrams that are available for publication.Availability and implementationLandScape is deployed at bio.oviz.org/demo-project/analyses/landscape for online use. Documentation and demo data are freely available on this website and GitHub (github.com/Nobel-Justin/Oviz-Bio-demo)[email protected]

Download Full-text

Linking Norms, Ratings, and Relations of Words and Concepts Across Multiple Language Varieties

10.31234/osf.io/tgw3z ◽

2020 ◽

Author(s):

Annika Tjuka ◽

Robert Forkel ◽

Johann-Mattis List

Keyword(s):

Web Application ◽

Age Of Acquisition ◽

Data Curation ◽

Data Sets ◽

Data Types ◽

Word Meanings ◽

Language Varieties ◽

Diverse Data ◽

Multiple Languages ◽

Word Frequencies

Psychologists and linguists have collected a great diversity of data for word and concept properties. In psychology, many studies accumulate norms and ratings such as word frequencies or age-of-acquisition often for a large number of words. Linguistics, on the other hand, provides valuable insights into relations of word meanings. We present a collection of those data sets for norms, ratings, and relations that cover different languages: ‘NoRaRe.’ To enable a comparison between the diverse data types, we established workflows that facilitate the expansion of the database. A web application allows convenient access to the data (https://digling.org/norare/). Furthermore, a software API ensures consistent data curation by providing tests to validate the data sets. The NoRaRe collection is linked to the database curated by the Concepticon project (https://concepticon.clld.org) which offers a reference catalog of unified concept sets. The link between words in the data sets and the Concepticon concept sets makes a cross-linguistic comparison possible. In three case studies, we test the validity of our approach, the accuracy of our workflow, and the applicability of our database. The results indicate that the NoRaRe database can be applied for the study of word properties across multiple languages. The data can be used by psychologists and linguists to benefit from the knowledge rooted in both research disciplines.

Download Full-text

Morpheus Web Testing: A Tool for Generating Test Cases for Widget Based Web Applications

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2121 ◽

2021 ◽

Author(s):

Romulo de Almeida Neves ◽

Willian Massami Watanabe ◽

Rafael Oliveira

Keyword(s):

User Interfaces ◽

Web Application ◽

Web Applications ◽

Test Cases ◽

Code Coverage ◽

Web Testing ◽

The Cost ◽

Software Lifecycle ◽

The Web

Context: Widgets are reusable User Interfaces (UIs) components frequently delivered in Web applications.In the web application, widgets implement different interaction scenarios, such as buttons, menus, and text input.Problem: Tests are performed manually, so the cost associated with preparing and executing test cases is high.Objective: Automate the process of generating functional test cases for web applications, using intermediate artifacts of the web development process that structure widgets in the web application. The goal of this process is to ensure the quality of the software, reduce overall software lifecycle time and the costs associated with tests.Method:We elaborated a test generation strategy and implemented this strategy in a tool, Morpheus Web Testing. Morpheus Web Testing extracts widget information from Java Server Faces artifacts to generate test cases for JSF web applications. We conducted a case study for comparing Morpheus Web Testing with a state of the art tool (CrawlJax).Results: The results indicate evidence that the approach Morpheus Web Testing managed to reach greater code coverage compared to a CrawlJax.Conclusion: The achieved coverage values represent evidence that the results obtained from the proposed approach contribute to the process of automated test software engineering in the industry.

Download Full-text