Two useful Python tools - dimpy and tablefile for data analysis applications

Mapping Intimacies ◽

10.21203/rs.3.rs-1004075/v2 ◽

2021 ◽

Author(s):

Dwaipayan Deb

Keyword(s):

Data Analysis ◽

Data File ◽

List Type ◽

Python Package ◽

Mathematics And Physics

Abstract Python’s list array is more powerful than arrays in other languages like C, C++, Fortran, or Java. However, in some cases it becomes tedious and complicated to construct a multidimensional ‘list’ type array in Python. A Python tool namely ‘dimpy’ is discussed in this paper which can easily generate any multidimensional ‘list’ type array in python. Another Python package called tablefile for reading and analysing column-wise data from a data-file is also discussed. How these two tools may be useful and reduce steps of programming is shown by using some mathematics and physics related problems.

Download Full-text

OceanSpy: A Python package to facilitate ocean model data analysis and visualization

The Journal of Open Source Software ◽

10.21105/joss.01506 ◽

2019 ◽

Vol 4 (39) ◽

pp. 1506 ◽

Cited By ~ 1

Author(s):

Mattia Almansi ◽

Renske Gelderloos ◽

Thomas Haine ◽

Atousa Saberi ◽

Ali Siddiqui

Keyword(s):

Data Analysis ◽

Ocean Model ◽

Model Data ◽

Python Package

Download Full-text

MS-PyCloud: An open-source, cloud computing-based pipeline for LC-MS/MS data analysis

10.1101/320887 ◽

2018 ◽

Cited By ~ 2

Author(s):

Li Chen ◽

Bai Zhang ◽

Michael Schnaubelt ◽

Punit Shah ◽

Paul Aiyetan ◽

...

Keyword(s):

Cloud Computing ◽

Data Analysis ◽

Open Source ◽

High Performance ◽

Large Scale ◽

Rapid Development ◽

Data File ◽

Data Sets ◽

Proteomics Data ◽

Amazon Web Services

ABSTRACTRapid development and wide adoption of mass spectrometry-based proteomics technologies have empowered scientists to study proteins and their modifications in complex samples on a large scale. This progress has also created unprecedented challenges for individual labs to store, manage and analyze proteomics data, both in the cost for proprietary software and high-performance computing, and the long processing time that discourages on-the-fly changes of data processing settings required in explorative and discovery analysis. We developed an open-source, cloud computing-based pipeline, MS-PyCloud, with graphical user interface (GUI) support, for LC-MS/MS data analysis. The major components of this pipeline include data file integrity validation, MS/MS database search for spectral assignment, false discovery rate estimation, protein inference, determination of protein post-translation modifications, and quantitation of specific (modified) peptides and proteins. To ensure the transparency and reproducibility of data analysis, MS-PyCloud includes open source software tools with comprehensive testing and versioning for spectrum assignments. Leveraging public cloud computing infrastructure via Amazon Web Services (AWS), MS-PyCloud scales seamlessly based on analysis demand to achieve fast and efficient performance. Application of the pipeline to the analysis of large-scale iTRAQ/TMT LC-MS/MS data sets demonstrated the effectiveness and high performance of MS-PyCloud. The software can be downloaded at: https://bitbucket.org/mschnau/ms-pycloud/downloads/

Download Full-text

Facilitating open-science with realistic fMRI simulation: validation and application

10.1101/532424 ◽

2019 ◽

Cited By ~ 3

Author(s):

Cameron T. Ellis ◽

Christopher Baldassano ◽

Anna C. Schapiro ◽

Ming Bo Cai ◽

Jonathan D. Cohen

Keyword(s):

Optimal Design ◽

Open Source ◽

Real Data ◽

Open Science ◽

Synthetic Dataset ◽

Fmri Data ◽

List Type ◽

Simulation Validation ◽

Python Package

AbstractBackgroundWith advances in methods for collecting and analyzing fMRI data, there is a concurrent need to understand how to reliably evaluate and optimally use these methods. Simulations of fMRI data can aid in both the evaluation of complex designs and the analysis of data.New MethodWe present fmrisim, a new Python package for standardized, realistic simulation of fMRI data. This package is part of BrainIAK: a recently released open-source Python toolbox for advanced neuroimaging analyses. We describe how to use fmrisim to extract noise properties from real fMRI data and then create a synthetic dataset with matched noise properties and a user-specified signal.ResultsWe validate the noise generated by fmrisim to show that it can approximate the noise properties of real data. We further show how fmrisim can help researchers find the optimal design in terms of power.Comparison with other methodsfmrisim ports the functionality of other packages to the Python platform while extending what is available in order to make it seamless to simulate realistic fMRI data.ConclusionsThe fmrisim package holds promise for improving the design of fMRI experiments, which may facilitate both the pre-registration of such experiments as well as the analysis of fMRI data.Highlightsfmrisim can simulate fMRI data matched to the noise properties of real fMRI.This can help researchers investigate the power of their fMRI designs.This also facilitates open science by making it easy to pre-register analysis pipelines.

Download Full-text

Impact framework: A python package for writing data analysis workflows to interpret microbial physiology

Metabolic Engineering Communications ◽

10.1016/j.mec.2019.e00089 ◽

2019 ◽

Vol 9 ◽

pp. e00089 ◽

Cited By ~ 1

Author(s):

Naveen Venayak ◽

Kaushik Raj ◽

Radhakrishnan Mahadevan

Keyword(s):

Data Analysis ◽

Microbial Physiology ◽

Impact Framework ◽

Python Package

Download Full-text

JMASM 53: MicceriRD

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1604190720 ◽

2020 ◽

Vol 18 (2) ◽

pp. 2-4

Author(s):

Michael Lance

Keyword(s):

Monte Carlo ◽

Data Analysis ◽

Monte Carlo Simulations ◽

Python Package ◽

Fortran 77

Fortran 77 and 90 modules (REALPOPS.lib) exist for invoking the 8 distributions estimated by Micceri (1989). These respective modules were created by Sawilowsky et al. (1990) and Sawilowsky and Fahoome (2003). The MicceriRD (Micceri’s Real Distributions) Python package was created because Python is increasingly used for data analysis and, in some cases, Monte Carlo simulations.

Download Full-text

DeepForest: A Python package for RGB deep learning tree crown delineation

10.1101/2020.07.07.191551 ◽

2020 ◽

Cited By ~ 2

Author(s):

Ben. G. Weinstein ◽

Sergio Marconi ◽

Mélaine Aubry-Kientz ◽

Gregoire Vincent ◽

Henry Senyondo ◽

...

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Training Data ◽

Fine Tuning ◽

List Type ◽

Crown Delineation ◽

Using Data ◽

Individual Trees ◽

Tree Crown Delineation ◽

Python Package

AbstractRemote sensing of forested landscapes can transform the speed, scale, and cost of forest research. The delineation of individual trees in remote sensing images is an essential task in forest analysis. Here we introduce a new Python package, DeepForest, that detects individual trees in high resolution RGB imagery using deep learning.While deep learning has proven highly effective in a range of computer vision tasks, it requires large amounts of training data that are typically difficult to obtain in ecological studies. DeepForest overcomes this limitation by including a model pre-trained on over 30 million algorithmically generated crowns from 22 forests and fine-tuned using 10,000 hand-labeled crowns from 6 forests.The package supports the application of this general model to new data, fine tuning the model to new datasets with user labeled crowns, training new models, and evaluating model predictions. This simplifies the process of using and retraining deep learning models for a range of forests, sensors, and spatial resolutions.We illustrate the workflow of DeepForest using data from the National Ecological Observatory Network, a tropical forest in French Guiana, and street trees from Portland, Oregon.

Download Full-text

Utilizing Scp for the analysis and replication of single-cell proteomics data

10.1101/2021.04.12.439408 ◽

2021 ◽

Cited By ~ 1

Author(s):

Christophe Vanderaa ◽

Laurent Gatto

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Single Cell ◽

List Type ◽

Batch Effects ◽

Proteomics Data ◽

Batch Correction ◽

Data Analyses ◽

Technological Advances ◽

Landmark Data

AbstractIntroductionMass spectrometry-based proteomics is actively embracing quantitative, single cell-level analyses. Indeed, recent advances in sample preparation and mass spectrometry (MS) have enabled the emergence of quantitative MS-based single-cell proteomics (SCP). While exciting and promising, SCP still has many rough edges. The current analysis workflows are custom and build from scratch. The field is therefore craving for standardized software that promotes principled and reproducible SCP data analyses.Areas coveredThis special report represents the first step toward the formalization of standard SCP data analysis. Scp, the software that accompanies this work can successfully reproduces one of the landmark data in the field of SCP. We created a repository containing the reproduction workflow with comprehensive documentation in order to favor further dissemination and improvement of SCP data analyses.Expert opinionReproducing SCP data analyses uncovers important challenges in SCP data analysis. We describe two such challenges in detail: batch correction and data missingness. We provide the current state-of-the-art and illustrate the associated limitations. We also highlights the intimate dependence that exists between batch effects and data missingness and provides future tracks for dealing with these exciting challenges.1Article highlightsSingle-cell proteomics (SCP) is emerging thanks to several recent technological advances, but further progress is lagging due to principled and systematic data analysis.This work offers a standardized solution for the processing of SCP data demonstrated by the reproduction of a landmark SCP work.Two important challenges remain: batch effects and data missingness. Furthermore, these challenges are not independent and therefore need to be modeled simultaneously.

Download Full-text

Scedar: a scalable Python package for single-cell RNA-seq exploratory data analysis

10.1101/375196 ◽

2018 ◽

Author(s):

Yuanchao Zhang ◽

Man S. Kim ◽

Erin R. Reichenberger ◽

Ben Stear ◽

Deanne M. Taylor

Keyword(s):

Data Analysis ◽

Single Cell ◽

Large Scale ◽

Exploratory Data Analysis ◽

Statistical Distributions ◽

Rna Seq ◽

Exploratory Data ◽

Further Development ◽

Python Package ◽

Open Source Development

AbstractIn single-cell RNA-seq (scRNA-seq) experiments, the number of individual cells has increased exponentially, and the sequencing depth of each cell has decreased significantly. As a result, analyzing scRNA-seq data requires extensive considerations of program efficiency and method selection. In order to reduce the complexity of scRNA-seq data analysis, we present scedar, a scalable Python package for scRNA-seq exploratory data analysis. The package provides a convenient and reliable interface for performing visualization, imputation of gene dropouts, detection of rare transcriptomic profiles, and clustering on large-scale scRNA-seq datasets. The analytical methods are efficient, and they also do not assume that the data follow certain statistical distributions. The package is extensible and modular, which would facilitate the further development of functionalities for future requirements with the open-source development community. The scedar package is distributed under the terms of the MIT license at https://pypi.org/project/scedar.

Download Full-text

ordpy: A Python package for data analysis with permutation entropy and ordinal network methods

Chaos An Interdisciplinary Journal of Nonlinear Science ◽

10.1063/5.0049901 ◽

2021 ◽

Vol 31 (6) ◽

pp. 063110

Author(s):

Arthur A. B. Pessa ◽

Haroldo V. Ribeiro

Keyword(s):

Data Analysis ◽

Permutation Entropy ◽

Network Methods ◽

Python Package

Download Full-text

USING SPSS FOR RESEARCH AND DATA ANALYSIS

Knowledge International Journal ◽

10.35120/kij3203363a ◽

2019 ◽

Vol 32 (3) ◽

pp. 363-368

Author(s):

Agron Alili ◽

Dejan Krstev

Keyword(s):

Data Analysis ◽

Software Package ◽

Science Research ◽

Data File ◽

Health Science ◽

Environmental Research ◽

Statistical Software ◽

Health Science Research ◽

The Social ◽

Different Types

There is no question that business, education, and all fields of science have come to rely heavily on the computer. This dependence has become so great that it is no longer possible to understand social and health science research without substantial knowledge of statistics and without at least some rudimentary understanding of statistical software. The number and types of statistical software packages that are available continue to grow each year. In this paper we have chosen to work with SPSS, or the Statistical Package for the Social Sciences. SPSS was chosen because of its popularity within both academic and business circles, making it the most widely used package of its type. SPSS is also a versatile package that allows many different types of analyses. transformations, and forms of output - in short, it will more than adequately serve our purposes. The SPSS software package is continually being updated and improved, and so with each major revision comes a new version of that package. In this paper, we will describe and use the most recent version of SPSS, called SPSS for Windows, in order to use this text for data analysis, your must have access to the SPSS for Windows software package.The capability of SPSS is truly astounding. The package enables you to obtain statistics ranging from simple descriptive numbers to complex analyses of multivariate matrices. You can plot the data in histograms, scatterplots, and other ways. You can combine files, split files, and sort files. You can modify existing variables and create new ones. In short, you can do just about anything you'd ever want with a set of data using this software package. A number of specific SPSS procedures are relevant to the kinds of statistical analyses covered in an introductory level statistics or research methods course typically found in the social and health sciences, natural sciences, or business. Yet, we will touch on just a fraction of the many things that SPSS can do. Our aim is to help то become familiar with SPSS, and we hope that this introduction will both reinforce our understanding of statistics and lead us to see what a powerful tool SPSS is, how it can actually help you better understand your data, how it can enable you to test hypotheses that were once too difficult to consider, and how it can save our incredible amounts of time as well as reduce the likelihood of making errors in data analyses. We show how to create a data file and generate an output file. We also discuss how to name and save the different types of files created in the three main SPSS windows. This paper will present a software presentation from a survey on socio-economic and environmental research.

Download Full-text