Documentation and Visualisation of Workflows for Effective Communication, Collaboration and Publication @ Source

Workflows processing data from research activities and driving in silico experiments are becoming an increasingly important method for conducting scientific research. Workflows have the advantage that not only can they be automated and used to process data repeatedly, but they can also be reused – in part or whole – enabling them to be evolved for use in new experiments. A number of studies have investigated strategies for storing and sharing workflows for the benefit of reuse. These have revealed that simply storing workflows in repositories without additional context does not enable workflows to be successfully reused. These studies have investigated what additional resources are needed to facilitate users of workflows and in particular to add provenance traces and to make workflows and their resources machine-readable. These additions also include adding metadata for curation, annotations for comprehension, and including data sets to provide additional context to the workflow. Ultimately though, these mechanisms still rely on researchers having access to the software to view and run the workflows. We argue that there are situations where researchers may want to understand a workflow that goes beyond what provenance traces provide and without having to run the workflow directly; there are many situations in which it can be difficult or impossible to run the original workflow. To that end, we have investigated the creation of an interactive workflow visualization that captures the flow chart element of the workflow with additional context including annotations, descriptions, parameters, metadata and input, intermediate, and results data that can be added to the record of a workflow experiment to enhance both curation and add value to enable reuse. We have created interactive workflow visualisations for the popular workflow creation tool KNIME, which does not provide users with an in-built function to extract provenance information that can otherwise only be viewed through the tool itself. Making use of the strengths of KNIME for adding documentation and user-defined metadata we can extract and create a visualisation and curation package that encourages and enhances curation@source, facilitating effective communication, collaboration, and reuse of workflows.

Download Full-text

Computer-aided methods for 3-D visualization of serial sections and thick biological specimens

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100129930 ◽

1992 ◽

Vol 50 (2) ◽

pp. 1060-1061

Author(s):

Mark Ellisman ◽

Maryann Martone ◽

Gabriel Soto ◽

Eleizer Masliah ◽

David Hessler ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Three Dimensional ◽

Neuritic Plaque ◽

Dimensional Structure ◽

Data Sets ◽

Molecular Physiology ◽

Research Activities ◽

Computer Aided ◽

Dimensional Reconstruction

Structurally-oriented biologists examine cells, tissues, organelles and macromolecules in order to gain insight into cellular and molecular physiology by relating structure to function. The understanding of these structures can be greatly enhanced by the use of techniques for the visualization and quantitative analysis of three-dimensional structure. Three projects from current research activities will be presented in order to illustrate both the present capabilities of computer aided techniques as well as their limitations and future possibilities.The first project concerns the three-dimensional reconstruction of the neuritic plaques found in the brains of patients with Alzheimer's disease. We have developed a software package “Synu” for investigation of 3D data sets which has been used in conjunction with laser confocal light microscopy to study the structure of the neuritic plaque. Tissue sections of autopsy samples from patients with Alzheimer's disease were double-labeled for tau, a cytoskeletal marker for abnormal neurites, and synaptophysin, a marker of presynaptic terminals.

Download Full-text

In Silico ADME Prediction: Data Sets and Models

Current Computer - Aided Drug Design ◽

10.2174/157340905774330318 ◽

2005 ◽

Vol 1 (4) ◽

pp. 365-376 ◽

Cited By ~ 6

Author(s):

Gonzalo Colmenarejo

Keyword(s):

In Silico ◽

Data Sets ◽

Adme Prediction ◽

In Silico Adme Prediction

Download Full-text

Data Cleaning in Occupational Therapy Research

The Occupational Therapy Journal of Research ◽

10.1177/153944929401400302 ◽

1994 ◽

Vol 14 (3) ◽

pp. 144-156

Author(s):

Marcel P. J. M. Dijkers ◽

Cynthia L. Creighton

Keyword(s):

Occupational Therapy ◽

Data Cleaning ◽

Research Data ◽

Data Sets ◽

Processing Data ◽

Research Findings

Errors in processing data prior to analysis can cause significant distortion of research findings. General principles and specific techniques for cleaning data sets are presented. Strategies are suggested for preventing errors in transcribing, coding, and keying research data.

Download Full-text

Inside Trending Topic Algorithm: How Do Human Interactions Drive Public Opinion in an Artificial Environment

Social Science Computer Review ◽

10.1177/08944393211041501 ◽

2021 ◽

pp. 089443932110415

Author(s):

Vanessa Russo ◽

Emiliano del Gobbo

Keyword(s):

Public Opinion ◽

Semantic Network ◽

Opinion Leader ◽

Computational Techniques ◽

Data Sets ◽

Process Data ◽

Data Set ◽

Social Actors ◽

Human Interactions ◽

Artificial Environment

The object of this research is to exploit the algorithm of Twitter’s trending topic (TT) and identify the elements capable of guiding public opinion in the Italian panorama. The underlying hypotheses that guide the whole article, confirmed by the research results, concern the existence of (a) a limited number of elements at the base of each popular hashtag with very high viral power and (b) hashtags transversal to the themes detected by the Twitter algorithm that define specific opinion polls. Through computational techniques, it was possible to extract and process data sets from six specific hashtags highlighted by TT. In a first step through social network analysis, we analyzed the hashtag semantic network to identify the hashtags transversal to the six TTs. Subsequently, we selected for each data set the contents with high sharing power and created a “potential opinion leader” index to identify users with influencer characteristics. Finally, a cross section of social actors able to guide public opinion in the Twittersphere emerged from the intersection between potentially influential users and the viral contents.

Download Full-text

A Multitask Parallel Algorithm Based on Multi-Core Processor

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.765-767.2590 ◽

2013 ◽

Vol 765-767 ◽

pp. 2590-2594

Author(s):

Qian Jin Wang

Keyword(s):

Parallel Processing ◽

Parallel Algorithm ◽

Data Handling ◽

Process Data ◽

Operation Speed ◽

Processing Data ◽

Parallel Data ◽

Hardware Resource ◽

Parallel Speedup ◽

Multi Core Processor

Multi-core processor has been a hot topic since it improves operation speed. It is not easy to get efficient parallel processing data algorithms because of waste of hardware resources. In this paper, a novel multitask parallel algorithm based on getting common substring of two strings is described in order to improve the data-handling capacity of the multi-processor. Firstly, this algorithm performs Task Parallel Library (TPL) in VS.NET, and then schedule the algorithm proposed in this paper to process data. This algorithm is tested by actual parallel data. The results demonstrate that this algorithm overcomes the problem of waste of hardware resource, can take full advantage of the features of multi-core parallel processing data thereby enhancing the parallel speedup, greatly improving the efficiency of data processing.

Download Full-text

PRIMEval: Optimization and screening of multiplex oligonucleotide assays

Scientific Reports ◽

10.1038/s41598-019-55883-4 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 3

Author(s):

Rick Conzemius ◽

Michaela Hendling ◽

Stephan Pabinger ◽

Ivan Barišić

Keyword(s):

In Silico ◽

Multiplex Polymerase Chain Reaction ◽

Data Sets ◽

Use Case ◽

Chain Reaction ◽

Experimental Conditions ◽

Oligonucleotide Hybridization ◽

Polymerase Chain ◽

Derived Data ◽

By Products

AbstractThe development of multiplex polymerase chain reaction and microarray assays is challenging due to primer dimer formation, unspecific hybridization events, the generation of unspecific by-products, primer depletion, and thus lower amplification efficiencies. We have developed a software workflow with three underlying algorithms that differ in their use case and specificity, allowing the complete in silico evaluation of such assays on user-derived data sets. We experimentally evaluated the method for the prediction of oligonucleotide hybridization events including resulting products and probes, self-dimers, cross-dimers and hairpins at different experimental conditions. The developed method allows explaining the observed artefacts through in silico WGS data and thermodynamic predictions. PRIMEval is available publicly at https://primeval.ait.ac.at.

Download Full-text

MolNetEnhancer: Enhanced Molecular Networks by Integrating Metabolome Mining and Annotation Tools

Metabolites ◽

10.3390/metabo9070144 ◽

2019 ◽

Vol 9 (7) ◽

pp. 144 ◽

Cited By ~ 38

Author(s):

Madeleine Ernst ◽

Kyo Bin Kang ◽

Andrés Mauricio Caraballo-Rodríguez ◽

Louis-Felix Nothias ◽

Joe Wandy ◽

...

Keyword(s):

In Silico ◽

Structural Information ◽

Large Data ◽

Molecular Networks ◽

Data Sets ◽

Data Set ◽

Metabolomics Data ◽

Molecular Networking ◽

Fragmentation Spectrum ◽

Efficient Matching

Metabolomics has started to embrace computational approaches for chemical interpretation of large data sets. Yet, metabolite annotation remains a key challenge. Recently, molecular networking and MS2LDA emerged as molecular mining tools that find molecular families and substructures in mass spectrometry fragmentation data. Moreover, in silico annotation tools obtain and rank candidate molecules for fragmentation spectra. Ideally, all structural information obtained and inferred from these computational tools could be combined to increase the resulting chemical insight one can obtain from a data set. However, integration is currently hampered as each tool has its own output format and efficient matching of data across these tools is lacking. Here, we introduce MolNetEnhancer, a workflow that combines the outputs from molecular networking, MS2LDA, in silico annotation tools (such as Network Annotation Propagation or DEREPLICATOR), and the automated chemical classification through ClassyFire to provide a more comprehensive chemical overview of metabolomics data whilst at the same time illuminating structural details for each fragmentation spectrum. We present examples from four plant and bacterial case studies and show how MolNetEnhancer enables the chemical annotation, visualization, and discovery of the subtle substructural diversity within molecular families. We conclude that MolNetEnhancer is a useful tool that greatly assists the metabolomics researcher in deciphering the metabolome through combination of multiple independent in silico pipelines.

Download Full-text

Estimasi potensi kerugian ekonomi akibat wasting pada balita di indonesia

JURNAL GIZI INDONESIA ◽

10.14710/jgi.7.2.127-132 ◽

2019 ◽

Vol 7 (2) ◽

pp. 127-132

Author(s):

Brigitte Sarah Renyoet ◽

Hildagardis Meliyani Erista Nai

Keyword(s):

Developing Countries ◽

High Risk ◽

Secondary Data ◽

Economic Losses ◽

Economic Potential ◽

Children Under Five ◽

Under Five ◽

Descriptive Research ◽

Research Activities ◽

Processing Data

Background: Wasting cases for children under five are currently increasing, the high risk of malnutrition continues to increase so that it has an effect on increasing the prevalence of nutritional problems which results in decreased productivity.Objectives: To estimate the economic potential lost due to wasting in children under five.Methods: Descriptive research, by processing data from various related agencies which are all in the form of secondary data. Calculate using the Konig (1995) formula and a correction factor from Horton's (1999) study. The research activities are carried out starting July 2018 until September 2018.Results: Nationally based on the prevalence of wasting in children under five in 2013 amounting to IDR 1.042 billion - IDR 4.687 billion or 0.01% - 0.06% of the total GDP of Indonesia.Conclusion: The prevalence of high wasting problems can increase the potential for economic losses and affect the economy of a country especially in developing countries and one of them is Indonesia.

Download Full-text

How to Automatically Document Data With the codebook Package to Facilitate Data Reuse

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245919838783 ◽

2019 ◽

Vol 2 (2) ◽

pp. 169-187 ◽

Cited By ~ 7

Author(s):

Ruben C. Arslan

Keyword(s):

R Package ◽

Data Reuse ◽

Data Sets ◽

Data Set ◽

Psychological Scales ◽

Rich Data ◽

Data Documentation ◽

Machine Readable ◽

Basic Standards ◽

Existing Data

Data documentation in psychology lags behind not only many other disciplines, but also basic standards of usefulness. Psychological scientists often prefer to invest the time and effort that would be necessary to document existing data well in other duties, such as writing and collecting more data. Codebooks therefore tend to be unstandardized and stored in proprietary formats, and they are rarely properly indexed in search engines. This means that rich data sets are sometimes used only once—by their creators—and left to disappear into oblivion. Even if they can find an existing data set, researchers are unlikely to publish analyses based on it if they cannot be confident that they understand it well enough. My codebook package makes it easier to generate rich metadata in human- and machine-readable codebooks. It uses metadata from existing sources and automates some tedious tasks, such as documenting psychological scales and reliabilities, summarizing descriptive statistics, and identifying patterns of missingness. The codebook R package and Web app make it possible to generate a rich codebook in a few minutes and just three clicks. Over time, its use could lead to psychological data becoming findable, accessible, interoperable, and reusable, thereby reducing research waste and benefiting both its users and the scientific community as a whole.

Download Full-text

Challenges of Data Acquisition and Analysis for Characteristics-Driven and Metrology-Based Optimization of Milling Process Development

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.704.233 ◽

2014 ◽

Vol 704 ◽

pp. 233-238

Author(s):

Laura Niendorf ◽

Markus Grosse Boeckmann ◽

Robert Schmitt

Keyword(s):

Data Mining ◽

Process Development ◽

Early Stage ◽

Milling Process ◽

Production Environment ◽

Process Data ◽

Related Data ◽

Research Activities ◽

Future Production ◽

Set Up

The research and practical use of data and data-mining in production environment is still at an early stage. Although almost every manufacturing company collects a lot of process and product related data they often do neither use nor deploy this data in order to optimize or even analyze their production processes. The acquisition of process data brings several advantages. On the one hand the implicit knowledge is permanently stored and on the other hand it is possible to learn from previous process failures. The acquired knowledge could then be applied to all future production tasks. Although many research activities can be observed since the late 90s, none of them managed the transfer to practical usage. In order to encourage the practical transfer of data-mining in production environment this paper presents a metrology-based test set-up and therewith arising challenges when consistently acquiring and processing inhomogeneous process, product and machine data. For the experimental set-up, on-machine metrology systems were developed and integrated into a 5-axis milling machine to gain much significant data.

Download Full-text