anndata: Annotated data

AMAS: a fast tool for alignment manipulation and computing of summary statistics

10.7287/peerj.preprints.1355v1 ◽

2015 ◽

Author(s):

Marek L Borowiec

Keyword(s):

Amino Acid ◽

Source Code ◽

Data Sets ◽

Command Line ◽

Summary Statistics ◽

Computationally Efficient ◽

Python Package ◽

Alignment Length ◽

Amino Acid Alphabet ◽

Gc Contents

The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, and creation of replicate data sets. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It performs better at concatenation and summarizing alignments than other popular tools. AMAS is a Python 3 program that relies solely on Python’s core modules. AMAS source code and manual can be downloaded from http://github.com/marekborowiec/AMAS/

Download Full-text

Estimating Variance Components from Sparse Data Matrices in Large-Scale Educational Assessments

Applied Measurement in Education ◽

10.1080/08957347.2014.973562 ◽

2015 ◽

Vol 28 (1) ◽

pp. 1-13 ◽

Cited By ~ 5

Author(s):

Christine DeMars

Keyword(s):

Variance Components ◽

Large Scale ◽

Sparse Data ◽

Educational Assessments ◽

Data Matrices

Download Full-text

AMAS: a fast tool for alignment manipulation and computing of summary statistics

10.7287/peerj.preprints.1355 ◽

2015 ◽

Author(s):

Marek L Borowiec

Keyword(s):

Amino Acid ◽

Source Code ◽

Data Sets ◽

Command Line ◽

Summary Statistics ◽

Computationally Efficient ◽

Python Package ◽

Alignment Length ◽

Amino Acid Alphabet ◽

Gc Contents

The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, and creation of replicate data sets. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It performs better at concatenation and summarizing alignments than other popular tools. AMAS is a Python 3 program that relies solely on Python’s core modules. AMAS source code and manual can be downloaded from http://github.com/marekborowiec/AMAS/

Download Full-text

Monet: An open-source Python package for analyzing and integrating scRNA-Seq data using PCA-based latent spaces

10.1101/2020.06.08.140673 ◽

2020 ◽

Cited By ~ 1

Author(s):

Florian Wagner

Keyword(s):

Data Analysis ◽

Open Source ◽

Method Development ◽

Efficient Solutions ◽

Computationally Efficient ◽

Batch Correction ◽

Ongoing Work ◽

Latent Space ◽

Advanced Analysis ◽

Python Package

AbstractSingle-cell RNA-Seq is a powerful technology that enables the transcriptomic profiling of the different cell populations that make up complex tissues. However, the noisy and high-dimensional nature of the generated data poses significant challenges for its analysis and integration. Here, I describe Monet, an open-source Python package designed to provide effective and computationally efficient solutions to some of the most common challenges encountered in scRNA-Seq data analysis, and to serve as a toolkit for scRNA-Seq method development. At its core, Monet implements algorithms to infer the dimensionality and construct a PCA-based latent space from a given dataset. This latent space, represented by a MonetModel object, then forms the basis for data analysis and integration. In addition to validating these core algorithms, I provide demonstrations of some more advanced analysis tasks currently supported, such as batch correction and label transfer, which are useful for analyzing multiple datasets from the same tissue. Monet is available at https://github.com/flo-compbio/monet. Ongoing work is focused on providing electronic notebooks with tutorials for individual analysis tasks, and on developing interoperability with other Python scRNA-Seq software. The author welcomes suggestions for future improvements.

Download Full-text

A Method for Analyzing Sparse Data Matrices in the Generalizability Theory Framework

Applied Psychological Measurement ◽

10.1177/0146621602026003006 ◽

2002 ◽

Vol 26 (3) ◽

pp. 321-338 ◽

Cited By ~ 22

Author(s):

Christopher W. T. Chiu ◽

Edward W. Wolfe

Keyword(s):

Generalizability Theory ◽

Sparse Data ◽

Theory Framework ◽

Data Matrices

Download Full-text

AMAS: a fast tool for alignment manipulation and computing of summary statistics

PeerJ ◽

10.7717/peerj.1660 ◽

2016 ◽

Vol 4 ◽

pp. e1660 ◽

Cited By ~ 163

Author(s):

Marek L. Borowiec

Keyword(s):

Amino Acid ◽

Source Code ◽

Data Sets ◽

Command Line ◽

Summary Statistics ◽

Computationally Efficient ◽

Python Package ◽

Alignment Length ◽

Amino Acid Alphabet ◽

Gc Contents

The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, creation of replicate data sets, and removal of taxa. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It is computationally efficient, utilizes parallel processing, and performs better at concatenation than other popular tools. AMAS is a Python 3 program that relies solely on Python’s core modules and needs no additional dependencies. AMAS source code and manual can be downloaded fromhttp://github.com/marekborowiec/AMAS/under GNU General Public License.

Download Full-text

Discrimination of monomer from multimer DNA disk-shaped toruses by freezeetch TEM

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100111161 ◽

1984 ◽

Vol 42 ◽

pp. 210-211

Author(s):

George C. Ruben ◽

Kenneth A. Marx

Keyword(s):

Biochemical Data ◽

Dna Structures ◽

Ice Surface ◽

Double Stranded Dna ◽

Data Support ◽

Dna Organization ◽

Trivalent Cations

In vitro collapse of DNA by trivalent cations like spermidine produces torus (donut) shaped DNA structures thought to have a DNA organization similar to certain double stranded DNA bacteriophage and viruses. This has prompted our studies of these structures using freeze-etch low Pt-C metal (9Å) replica TEM. With a variety of DNAs the TEM and biochemical data support a circumferential DNA winding model for hydrated DNA torus organization. Since toruses are almost invariably oriented nearly horizontal to the ice surface one of the most accessible parameters of a torus population is annulus (ring) thickness. We have tabulated this parameter for populations of both nicked, circular (Fig. 1: n=63) and linear (n=40: data not shown) ϕX-174 DNA toruses. In both cases, as can be noted in Fig. 1, there appears to be a compact grouping of toruses possessing smaller dimensions separated from a dispersed population possessing considerably larger dimensions.

Download Full-text