PyEI: A Python package for ecological inference

Abstract Background Bisulfite sequencing is commonly used to measure DNA methylation. Processing bisulfite sequencing data is often challenging owing to the computational demands of mapping a low-complexity, asymmetrical library and the lack of a unified processing toolset to produce an analysis-ready methylation matrix from read alignments. To address these shortcomings, we have developed BiSulfite Bolt (BSBolt), a fast and scalable bisulfite sequencing analysis platform. BSBolt performs a pre-alignment sequencing read assessment step to improve efficiency when handling asymmetrical bisulfite sequencing libraries. Findings We evaluated BSBolt against simulated and real bisulfite sequencing libraries. We found that BSBolt provides accurate and fast bisulfite sequencing alignments and methylation calls. We also compared BSBolt to several existing bisulfite alignment tools and found BSBolt outperforms Bismark, BSSeeker2, BISCUIT, and BWA-Meth based on alignment accuracy and methylation calling accuracy. Conclusion BSBolt offers streamlined processing of bisulfite sequencing data through an integrated toolset that offers support for simulation, alignment, methylation calling, and data aggregation. BSBolt is implemented as a Python package and command line utility for flexibility when building informatics pipelines. BSBolt is available at https://github.com/NuttyLogic/BSBolt under an MIT license.

Download Full-text

SynBiopython: an open-source software library for Synthetic Biology

Synthetic Biology ◽

10.1093/synbio/ysab001 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Jing Wui Yeoh ◽

Neil Swainston ◽

Peter Vegh ◽

Valentin Zulkower ◽

Pablo Carbonell ◽

...

Keyword(s):

Synthetic Biology ◽

Open Source ◽

Open Source Software ◽

Development Projects ◽

Software Library ◽

Current State ◽

Starting Point ◽

Common Problems ◽

Data Tracking ◽

Python Package

Abstract Advances in hardware automation in synthetic biology laboratories are not yet fully matched by those of their software counterparts. Such automated laboratories, now commonly called biofoundries, require software solutions that would help with many specialized tasks such as batch DNA design, sample and data tracking, and data analysis, among others. Typically, many of the challenges facing biofoundries are shared, yet there is frequent wheel-reinvention where many labs develop similar software solutions in parallel. In this article, we present the first attempt at creating a standardized, open-source Python package. A number of tools will be integrated and developed that we envisage will become the obvious starting point for software development projects within biofoundries globally. Specifically, we describe the current state of available software, present usage scenarios and case studies for common problems, and finally describe plans for future development. SynBiopython is publicly available at the following address: http://synbiopython.org.

Download Full-text

TAILOR-MS, a Python Package that Deciphers Complex Triacylglycerol Fatty Acyl Structures: Applications for Bovine Milk and Infant Formulas

Analytical Chemistry ◽

10.1021/acs.analchem.0c04373 ◽

2021 ◽

Author(s):

Kang-Yu Peng ◽

Malinda Salim ◽

Joseph Pelle ◽

Gisela Ramirez ◽

Ben J. Boyd

Keyword(s):

Bovine Milk ◽

Fatty Acyl ◽

Infant Formulas ◽

Python Package

Download Full-text

A flexible framework for anomaly Detection via dimensionality reduction

Neural Computing and Applications ◽

10.1007/s00521-021-05839-5 ◽

2021 ◽

Author(s):

Alireza Vafaei Sadr ◽

Bruce A. Bassett ◽

M. Kunz

Keyword(s):

Anomaly Detection ◽

Dimensionality Reduction ◽

Dimensional Space ◽

High Dimensions ◽

Detection Algorithms ◽

Latent Space ◽

Wide Range ◽

Flexible Framework ◽

Online Anomaly Detection ◽

Python Package

AbstractAnomaly detection is challenging, especially for large datasets in high dimensions. Here, we explore a general anomaly detection framework based on dimensionality reduction and unsupervised clustering. DRAMA is released as a general python package that implements the general framework with a wide range of built-in options. This approach identifies the primary prototypes in the data with anomalies detected by their large distances from the prototypes, either in the latent space or in the original, high-dimensional space. DRAMA is tested on a wide variety of simulated and real datasets, in up to 3000 dimensions, and is found to be robust and highly competitive with commonly used anomaly detection algorithms, especially in high dimensions. The flexibility of the DRAMA framework allows for significant optimization once some examples of anomalies are available, making it ideal for online anomaly detection, active learning, and highly unbalanced datasets. Besides, DRAMA naturally provides clustering of outliers for subsequent analysis.

Download Full-text

Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Records

Political Analysis ◽

10.1093/pan/mpw001 ◽

2016 ◽

Vol 24 (2) ◽

pp. 263-272 ◽

Cited By ~ 29

Author(s):

Kosuke Imai ◽

Kabir Khanna

Keyword(s):

Mean Squared Error ◽

False Positive Rate ◽

True Positive Rate ◽

Voter Registration ◽

Racial Groups ◽

Ecological Inference ◽

Inference Problem ◽

Individual Level ◽

Positive Rate ◽

Election Results

In both political behavior research and voting rights litigation, turnout and vote choice for different racial groups are often inferred using aggregate election results and racial composition. Over the past several decades, many statistical methods have been proposed to address this ecological inference problem. We propose an alternative method to reduce aggregation bias by predicting individual-level ethnicity from voter registration records. Building on the existing methodological literature, we use Bayes's rule to combine the Census Bureau's Surname List with various information from geocoded voter registration records. We evaluate the performance of the proposed methodology using approximately nine million voter registration records from Florida, where self-reported ethnicity is available. We find that it is possible to reduce the false positive rate among Black and Latino voters to 6% and 3%, respectively, while maintaining the true positive rate above 80%. Moreover, we use our predictions to estimate turnout by race and find that our estimates yields substantially less amounts of bias and root mean squared error than standard ecological inference estimates. We provide open-source software to implement the proposed methodology.

Download Full-text