Neuroscience Cloud Analysis As a Service

AbstractA major goal of computational neuroscience is to develop powerful analysis tools that operate on large datasets. These methods provide an essential toolset to unlock scientific insights from new experiments. Unfortunately, a major obstacle currently impedes progress: while existing analysis methods are frequently shared as open source software, the infrastructure needed to deploy these methods – at scale, reproducibly, cheaply, and quickly – remains totally inaccessible to all but a minority of expert users. As a result, many users can not fully exploit these tools, due to constrained computational resources (limited or costly compute hardware) and/or mismatches in expertise (experimentalists vs. large-scale computing experts). In this work we develop Neuroscience Cloud Analysis As a Service (NeuroCAAS): a fully-managed infrastructure platform, based on modern large-scale computing advances, that makes state-of-the-art data analysis tools accessible to the neuroscience community. We offer NeuroCAAS as an open source service with a drag-and-drop interface, entirely removing the burden of infrastructure expertise, purchasing, maintenance, and deployment. NeuroCAAS is enabled by three key contributions. First, NeuroCAAS cleanly separates tool implementation from usage, allowing cutting-edge methods to be served directly to the end user with no need to read or install any analysis software. Second, NeuroCAAS automatically scales as needed, providing reliable, highly elastic computational resources that are more efficient than personal or lab-supported hardware, without management overhead. Finally, we show that many popular data analysis tools offered through NeuroCAAS outperform typical analysis solutions (in terms of speed and cost) while improving ease of use and maintenance, dispelling the myth that cloud compute is prohibitively expensive and technically inaccessible. By removing barriers to fast, efficient cloud computation, NeuroCAAS can dramatically accelerate both the dissemination and the effective use of cutting-edge analysis tools for neuroscientific discovery.

Download Full-text

Open Plot Project: an open-source toolkit for 3-D structural data analysis

Solid Earth ◽

10.5194/se-2-53-2011 ◽

2011 ◽

Vol 2 (1) ◽

pp. 53-63 ◽

Cited By ~ 18

Author(s):

S. Tavani ◽

P. Arbues ◽

M. Snidero ◽

N. Carrera ◽

J. A. Muñoz

Keyword(s):

Spatial Distribution ◽

Data Analysis ◽

Open Source ◽

Open Source Software ◽

Source Code ◽

Structural Data ◽

Geological Modelling ◽

Analysis Tools ◽

Transect Analysis ◽

Selection Of

Abstract. In this work we present the Open Plot Project, an open-source software for structural data analysis, including a 3-D environment. The software includes many classical functionalities of structural data analysis tools, like stereoplot, contouring, tensorial regression, scatterplots, histograms and transect analysis. In addition, efficient filtering tools are present allowing the selection of data according to their attributes, including spatial distribution and orientation. This first alpha release represents a stand-alone toolkit for structural data analysis. The presence of a 3-D environment with digitalising tools allows the integration of structural data with information extracted from georeferenced images to produce structurally validated dip domains. This, coupled with many import/export facilities, allows easy incorporation of structural analyses in workflows for 3-D geological modelling. Accordingly, Open Plot Project also candidates as a structural add-on for 3-D geological modelling software. The software (for both Windows and Linux O.S.), the User Manual, a set of example movies (complementary to the User Manual), and the source code are provided as Supplement. We intend the publication of the source code to set the foundation for free, public software that, hopefully, the structural geologists' community will use, modify, and implement. The creation of additional public controls/tools is strongly encouraged.

Download Full-text

Neuroscience Cloud Analysis as a Service: An Open Source Platform for Scalable, Reproducible Data Analysis

SSRN Electronic Journal ◽

10.2139/ssrn.3877562 ◽

2021 ◽

Author(s):

Taiga Abe ◽

Ian Kinsella ◽

Shreya Saxena ◽

E. Kelly Buchanan ◽

Joao Couto ◽

...

Keyword(s):

Data Analysis ◽

Open Source ◽

Cloud Analysis

Download Full-text

Interoperable and scalable data analysis with microservices: Applications in Metabolomics

10.1101/213603 ◽

2017 ◽

Cited By ~ 2

Author(s):

Payam Emami Khoonsari ◽

Pablo Moreno ◽

Sven Bergmann ◽

Joachim Burman ◽

Marco Capuccini ◽

...

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Large Scale ◽

Metabolite Identification ◽

Access Point ◽

Scientific Discipline ◽

Resonance Spectroscopy ◽

Magnetic Resonance Spectroscopy Study ◽

Analysis Workflow ◽

Computational Resources

Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed in parallel using the Kubernetes container orchestrator. The access point is a virtual research environment which can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and established workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry studies, one nuclear magnetic resonance spectroscopy study and one fluxomics study, showing that the method scales dynamically with increasing availability of computational resources. We achieved a complete integration of the major software suites resulting in the first turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, multivariate statistics, and metabolite identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science.

Download Full-text

MS-PyCloud: An open-source, cloud computing-based pipeline for LC-MS/MS data analysis

10.1101/320887 ◽

2018 ◽

Cited By ~ 2

Author(s):

Li Chen ◽

Bai Zhang ◽

Michael Schnaubelt ◽

Punit Shah ◽

Paul Aiyetan ◽

...

Keyword(s):

Cloud Computing ◽

Data Analysis ◽

Open Source ◽

High Performance ◽

Large Scale ◽

Rapid Development ◽

Data File ◽

Data Sets ◽

Proteomics Data ◽

Amazon Web Services

ABSTRACTRapid development and wide adoption of mass spectrometry-based proteomics technologies have empowered scientists to study proteins and their modifications in complex samples on a large scale. This progress has also created unprecedented challenges for individual labs to store, manage and analyze proteomics data, both in the cost for proprietary software and high-performance computing, and the long processing time that discourages on-the-fly changes of data processing settings required in explorative and discovery analysis. We developed an open-source, cloud computing-based pipeline, MS-PyCloud, with graphical user interface (GUI) support, for LC-MS/MS data analysis. The major components of this pipeline include data file integrity validation, MS/MS database search for spectral assignment, false discovery rate estimation, protein inference, determination of protein post-translation modifications, and quantitation of specific (modified) peptides and proteins. To ensure the transparency and reproducibility of data analysis, MS-PyCloud includes open source software tools with comprehensive testing and versioning for spectrum assignments. Leveraging public cloud computing infrastructure via Amazon Web Services (AWS), MS-PyCloud scales seamlessly based on analysis demand to achieve fast and efficient performance. Application of the pipeline to the analysis of large-scale iTRAQ/TMT LC-MS/MS data sets demonstrated the effectiveness and high performance of MS-PyCloud. The software can be downloaded at: https://bitbucket.org/mschnau/ms-pycloud/downloads/

Download Full-text

Cloud Computing for BioLabs

Cloud Technology ◽

10.4018/978-1-4666-6539-2.ch058 ◽

2015 ◽

pp. 1272-1293

Author(s):

Abraham Pouliakis ◽

Aris Spathis ◽

Christine Kottaridi ◽

Antonia Mourtzikou ◽

Marilena Stamouli ◽

...

Keyword(s):

Cloud Computing ◽

Data Analysis ◽

Drug Design ◽

Large Scale ◽

Future Research ◽

New Paradigm ◽

Computing Power ◽

The Everyday ◽

Potential Applications ◽

Computational Resources

Cloud computing has quickly emerged as an exciting new paradigm providing models of computing and services. Via cloud computing technology, bioinformatics tools can be made available as services to anyone, anywhere, and via any device. Large bio-datasets, highly complex algorithms, computing power demanding analysis methods, and the sudden need for hardware and computational resources provide an ideal environment for large-scale bio-data analysis for cloud computing. Cloud computing is already applied in the fields of biology and biochemistry, via numerous paradigms providing novel ideas stimulating future research. The concept of BioCloud has rapidly emerged with applications related to genomics, drug design, biology tools on the cloud, bio-databases, cloud bio-computing, and numerous applications related to biology and biochemistry. In this chapter, the authors present research results related to biology-related laboratories (BioLabs) as well as potential applications for the everyday clinical routine.

Download Full-text

Large Scale Participatory Acoustic Sensor Data Analysis: Tools and Reputation Models to Enhance Effectiveness

2011 IEEE Seventh International Conference on eScience ◽

10.1109/escience.2011.29 ◽

2011 ◽

Cited By ~ 8

Author(s):

Anthony Truskinger ◽

Haofan Yang ◽

Jason Wimmer ◽

Jinglan Zhang ◽

Ian Williamson ◽

...

Keyword(s):

Data Analysis ◽

Large Scale ◽

Sensor Data ◽

Acoustic Sensor ◽

Analysis Tools

Download Full-text

NeuDATool: An open source neutron data analysis tools, supporting GPU hardware acceleration, and across-computer cluster nodes parallel

Chinese Journal of Chemical Physics ◽

10.1063/1674-0068/cjcp2005077 ◽

2020 ◽

Vol 33 (6) ◽

pp. 727-732 ◽

Cited By ~ 1

Author(s):

Chang-li Ma ◽

He Cheng ◽

Tai-sen Zuo ◽

Gui-sheng Jiao ◽

Ze-hua Han ◽

...

Keyword(s):

Data Analysis ◽

Open Source ◽

Hardware Acceleration ◽

Neutron Data ◽

Computer Cluster ◽

Analysis Tools ◽

Source Neutron

Download Full-text

Rabix: an open-source workflow executor supporting recomputability and interoperability of workflow descriptions

10.1101/074708 ◽

2016 ◽

Cited By ~ 4

Author(s):

Gaurav Kaushik ◽

Sinisa Ivkovic ◽

Janko Simonovic ◽

Nebojsa Tijanic ◽

Brandi Davis-Dusenbery ◽

...

Keyword(s):

Data Analysis ◽

Open Source ◽

Job Scheduling ◽

Workflow Engine ◽

Biomedical Data ◽

File Organization ◽

Analysis Tools ◽

Flexible Framework ◽

The Common

As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optim1izations tocomputation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executora, an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions.

Download Full-text

High-performance computing service for bioinformatics and data science

Journal of the Medical Library Association JMLA ◽

10.5195/jmla.2018.512 ◽

2018 ◽

Vol 106 (4) ◽

Author(s):

Jean-Paul Courneya ◽

Alexa Mayo

Keyword(s):

Open Source ◽

High Throughput ◽

High Performance ◽

Large Scale ◽

Data Science ◽

Wet Work ◽

High Throughput Data ◽

Guided Learning ◽

Computational Resources ◽

Performance Computing

Despite having an ideal setup in their labs for wet work, researchers often lack the computational infrastructure to analyze the magnitude of data that result from “-omics” experiments. In this innovative project, the library supports analysis of high-throughput data from global molecular profiling experiments by offering a high-performance computer with open source software along with expert bioinformationist support. The audience for this new service is faculty, staff, and students for whom using the university’s large scale, CORE computational resources is not warranted because these resources exceed the needs of smaller projects. In the library’s approach, users are empowered to analyze high-throughput data that they otherwise would not be able to on their own computers. To develop the project, the library’s bioinformationist identified the ideal computing hardware and a group of open source bioinformatics software to provide analysis options for experimental data such as scientific images, sequence reads, and flow cytometry files. To close the loop between learning and practice, the bioinformationist developed self-guided learning materials and workshops or consultations on topics such as the National Center for Biotechnology Information’s BLAST, Bioinformatics on the Cloud, and ImageJ. Researchers apply the data analysis techniques that they learned in the classroom in an ideal computing environment.

Download Full-text

Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies

ISRN Bioinformatics ◽

10.1155/2013/481545 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 9

Author(s):

Shanrong Zhao ◽

Kurt Prenger ◽

Lance Smith

Keyword(s):

Data Analysis ◽

Large Scale ◽

Scale Up ◽

Local Environment ◽

Transcriptome Profiling ◽

Cost Effective ◽

Rna Seq ◽

Practical Challenge ◽

Amazon Web Services ◽

Computational Resources

RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets.

Download Full-text