There is more to transparency in research than open-source codes of data processing

AbstractFischer plots are widely used in paleoenvironmental research as graphic representations of sea- and lake-level changes through mapping linearly corrected variation of accumulative cycle thickness over cycle number or stratum depth. Some kinds of paleoenvironmental proxy data (especially subsurface data, such as natural gamma-ray logging data), which preserve continuous cyclic signals and have been largely collected, are potential materials for constructing Fischer Plots. However, it is laborious to count the cycles preserved in these proxy data manually and map Fischer plots with these cycles. In this paper, we introduce an original open-source Python code “PyFISCHERPLOT” for constructing Fischer Plots in batches utilizing paleoenvironmental proxy data series. The principle of constructing Fischer plots based on proxy data, the data processing and usage of the PyFISCHERPLOT code and the application cases of the code are presented. The code is compared with existing methods for constructing Fischer plots.

Download Full-text

MITK Diffusion Imaging

Methods of Information in Medicine ◽

10.3414/me11-02-0031 ◽

2012 ◽

Vol 51 (05) ◽

pp. 441-448 ◽

Cited By ~ 52

Author(s):

P. F. Neher ◽

I. Reicht ◽

T. van Bruggen ◽

C. Goch ◽

M. Reisert ◽

...

Keyword(s):

Data Processing ◽

Open Source ◽

High Performance ◽

Diffusion Imaging ◽

Software Tool ◽

Brain Anatomy ◽

Sustainable Evaluation ◽

Imaging Research ◽

Interactive Data

SummaryBackground: Diffusion-MRI provides a unique window on brain anatomy and insights into aspects of tissue structure in living humans that could not be studied previously. There is a major effort in this rapidly evolving field of research to develop the algorithmic tools necessary to cope with the complexity of the datasets.Objectives: This work illustrates our strategy that encompasses the development of a modularized and open software tool for data processing, visualization and interactive exploration in diffusion imaging research and aims at reinforcing sustainable evaluation and progress in the field.Methods: In this paper, the usability and capabilities of a new application and toolkit component of the Medical Imaging and Interaction Toolkit (MITK, www.mitk.org), MITKDI, are demonstrated using in-vivo datasets.Results: MITK-DI provides a comprehensive software framework for high-performance data processing, analysis and interactive data exploration, which is designed in a modular, extensible fashion (using CTK) and in adherence to widely accepted coding standards (e.g. ITK, VTK). MITK-DI is available both as an open source software development toolkit and as a ready-to-use in stallable application.Conclusions: The open source release of the modular MITK-DI tools will increase verifiability and comparability within the research community and will also be an important step towards bringing many of the current techniques towards clinical application.

Download Full-text

The impact of sequencing depth on the inferred taxonomic composition and AMR gene content of metagenomic samples

10.1101/593301 ◽

2019 ◽

Author(s):

H. Soon Gweon ◽

Liam P. Shaw ◽

Jeremy Swann ◽

Nicola De Maio ◽

Manal AbuOun ◽

...

Keyword(s):

Data Processing ◽

Open Source ◽

Open Source Software ◽

River Sediment ◽

Taxonomic Composition ◽

Sequencing Depth ◽

Gene Content ◽

Metagenomic Data ◽

Software Pipeline ◽

Shotgun Metagenomics

ABSTRACTBackgroundShotgun metagenomics is increasingly used to characterise microbial communities, particularly for the investigation of antimicrobial resistance (AMR) in different animal and environmental contexts. There are many different approaches for inferring the taxonomic composition and AMR gene content of complex community samples from shotgun metagenomic data, but there has been little work establishing the optimum sequencing depth, data processing and analysis methods for these samples. In this study we used shotgun metagenomics and sequencing of cultured isolates from the same samples to address these issues. We sampled three potential environmental AMR gene reservoirs (pig caeca, river sediment, effluent) and sequenced samples with shotgun metagenomics at high depth (∼200 million reads per sample). Alongside this, we cultured single-colony isolates ofEnterobacteriaceaefrom the same samples and used hybrid sequencing (short- and long-reads) to create high-quality assemblies for comparison to the metagenomic data. To automate data processing, we developed an open-source software pipeline, ‘ResPipe’.ResultsTaxonomic profiling was much more stable to sequencing depth than AMR gene content. 1 million reads per sample was sufficient to achieve <1% dissimilarity to the full taxonomic composition. However, at least 80 million reads per sample were required to recover the full richness of different AMR gene families present in the sample, and additional allelic diversity of AMR genes was still being discovered in effluent at 200 million reads per sample. Normalising the number of reads mapping to AMR genes using gene length and an exogenous spike ofThermus thermophilusDNA substantially changed the estimated gene abundance distributions. While the majority of genomic content from cultured isolates from effluent was recoverable using shotgun metagenomics, this was not the case for pig caeca or river sediment.ConclusionsSequencing depth and profiling method can critically affect the profiling of polymicrobial animal and environmental samples with shotgun metagenomics. Both sequencing of cultured isolates and shotgun metagenomics can recover substantial diversity that is not identified using the other methods. Particular consideration is required when inferring AMR gene content or presence by mapping metagenomic reads to a database. ResPipe, the open-source software pipeline we have developed, is freely available (https://gitlab.com/hsgweon/ResPipe).

Download Full-text

Arduino based platform for process control learning

The Journal of Engineering and Exact Sciences ◽

10.18540/jcecvl6iss5pp0585-0593 ◽

2020 ◽

Vol 6 (5) ◽

pp. 0585-0593

Author(s):

Bruna Couto Molinar Henrique ◽

Leonardo Couto Molinar Henrique ◽

Humberto Molinar Henrique

Keyword(s):

Open Source ◽

Real Time ◽

Low Cost ◽

Control Unit ◽

Absolute Error ◽

Internal Model Control ◽

Pid Controllers ◽

Experimental Unit ◽

Source Codes ◽

Model Control

This work deals with implementation of an experimental flowrate control unit using free and low-cost hardware and software. The open-source software Processing was used to develop the source codes and user graphical interface and the open-source electronic prototyping platform Arduino was used to acquire data from an experimental unit. Work presents descriptions of the experimental setup, the real-time PID controllers used and theoretical/conceptual issues of Arduino. PID controllers based on internal model control, minimization of the integral of time-weighted absolute error, Ziegler-Nichols, and others were tuned for setpoint and load changes and real-time runs were carried out in order to make real-time use of control theory learned in academy. Results showed the developed platform proved to be suitable for use in experimental setups allowing users compare their ideas and expectations with the experimental evidence in a real and low-cost fashion. In addition, the instrumentation is simple to configure with acceptable level noise and particularly useful for control/automation learning with educational purposes.

Download Full-text

Performance Evaluation of Open Source Seismic Data Processing Packages

Algorithms and Architectures for Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-24650-0_37 ◽

2011 ◽

pp. 433-442 ◽

Cited By ~ 2

Author(s):

Izzatdin A. Aziz ◽

Andrzej M. Goscinski ◽

Michael M. Hobbs

Keyword(s):

Performance Evaluation ◽

Data Processing ◽

Open Source ◽

Seismic Data ◽

Seismic Data Processing

Download Full-text

NoSQL Databases

Advances in Data Mining and Database Management - Handbook of Research on Cloud Infrastructures for Big Data Analytics ◽

10.4018/978-1-4666-5864-6.ch008 ◽

2014 ◽

pp. 186-215 ◽

Cited By ~ 2

Author(s):

Ganesh Chandra Deka

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Open Source ◽

Data Storage ◽

Big Data Processing ◽

Nosql Databases ◽

Data Intensive ◽

Huge Data ◽

Data Intensive Applications

NoSQL databases are designed to meet the huge data storage requirements of cloud computing and big data processing. NoSQL databases have lots of advanced features in addition to the conventional RDBMS features. Hence, the “NoSQL” databases are popularly known as “Not only SQL” databases. A variety of NoSQL databases having different features to deal with exponentially growing data-intensive applications are available with open source and proprietary option. This chapter discusses some of the popular NoSQL databases and their features on the light of CAP theorem.

Download Full-text

Optimization Scenarios for Open Source Software Used in E-Learning Activities

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Optimizing Contemporary Application and Processes in Open Source Software ◽

10.4018/978-1-5225-5314-4.ch005 ◽

2018 ◽

pp. 102-123

Author(s):

Utku Köse

Keyword(s):

Open Source ◽

Open Source Software ◽

Learning Experience ◽

Learning Activities ◽

Machine Learning Techniques ◽

Software Systems ◽

Source Codes ◽

E Learning ◽

Additional Costs ◽

Teaching Learning

Using open software in e-learning application is one of the most popular ways of improving effectiveness of e-learning-based processes without thinking about additional costs and even focusing on modifying the software according to needs. Because of that, it is important to have an idea about what is needed while using an e-learning-oriented open software system and how to deal with its source codes. At this point, it is a good option to add some additional features and functions to make the open source software more intelligent and practical to make both teaching-learning experiences during e-learning processes. In this context, the objective of this chapter is to discuss some possible applications of artificial intelligence to include optimization processes within open source software systems used in e-learning activities. In detail, the chapter focuses more on using swarm intelligence and machine learning techniques for this aim and expresses some theoretical views for improving the effectiveness of such software for a better e-learning experience.

Download Full-text

Emerging Trends of Big Data in Cloud Computing

Applications of Big Data in Large- and Small-Scale Systems - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-6673-2.ch003 ◽

2021 ◽

pp. 38-55

Author(s):

Poonam Nandal ◽

Deepa Bura ◽

Meeta Singh

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Open Source ◽

Software Framework ◽

Effective Solution ◽

Apache Hadoop ◽

Large Sets ◽

Exponential Increase ◽

Emerging Trends

In today's world where data is accumulating at an ever-increasing rate, processing of this big data was a necessity rather than a need. This required some tools for processing as well as analysis of the data that could be achieved to obtain some meaningful result or outcome out of it. There are many tools available in market which could be used for processing of big data. But the main focus on this chapter is on Apache Hadoop which could be regarded as an open source software based framework which could be efficiently deployed for processing, storing, analyzing, and to produce meaningful insights from large sets of data. It is always said that if exponential increase of data is processing challenge then Hadoop could be considered as one of the effective solution for processing, managing, analyzing, and storing this big data. Hadoop versions and components are also illustrated in the later section of the paper. This chapter majorly focuses on the technique, methodology, components, and methodologies adopted by Apache Hadoop software framework for big data processing.

Download Full-text

SeaSeis:A Simple Open-source Seismic Data Processing System

74th EAGE Conference and Exhibition - Workshops ◽

10.3997/2214-4609.20149869 ◽

2012 ◽

Author(s):

Bjorn Olofsson

Keyword(s):

Data Processing ◽

Open Source ◽

Seismic Data ◽

Processing System ◽

Seismic Data Processing ◽

Data Processing System

Download Full-text