Easy-Prime: a machine learning–based prime editor design tool

Yichao Li; Jingjing Chen; Shengdar Q. Tsai; Yong Cheng

doi:10.1186/s13059-021-02458-0

Increasing workflow development speed and reproducibility with Vectools

F1000Research ◽

10.12688/f1000research.16301.2 ◽

2018 ◽

Vol 7 ◽

pp. 1499

Author(s):

Tyler Weirick ◽

Raphael Müller ◽

Shizuka Uchida

Keyword(s):

Machine Learning ◽

Command Line ◽

Simple Machine ◽

Wide Range ◽

Command Line Tool ◽

Speed Up

Despite advances in bioinformatics, custom scripts remain a source of difficulty, slowing workflow development and hampering reproducibility. Here, we introduce Vectools, a command-line tool-suite to reduce reliance on custom scripts and improve reproducibility by offering a wide range of common easy-to-use functions for table and vector manipulation. Vectools also offers a number of vector related functions to speed up workflow development, such as simple machine learning and common statistics functions.

Get full-text (via PubEx)

Increasing workflow development speed and reproducibility with Vectools

F1000Research ◽

10.12688/f1000research.16301.1 ◽

2018 ◽

Vol 7 ◽

pp. 1499

Author(s):

Tyler Weirick ◽

Raphael Müller ◽

Shizuka Uchida

Keyword(s):

Machine Learning ◽

Command Line ◽

Simple Machine ◽

Wide Range ◽

Command Line Tool ◽

Speed Up

Despite advances in bioinformatics, custom scripts remain a source of difficulty, slowing workflow development and hampering reproducibility. Here, we introduce Vectools, a command-line tool-suite to reduce reliance on custom scripts and improve reproducibility by offering a wide range of common easy-to-use functions for table and vector manipulation. Vectools also offers a number of vector related functions to speed up workflow development, such as simple machine learning and common statistics functions.

Get full-text (via PubEx)

UniverSC: a flexible cross-platform single-cell data processing pipeline

10.21203/rs.3.rs-244461/v1 ◽

2021 ◽

Author(s):

S. Kelly ◽

Kai Battenberg ◽

Nicola Hetherington ◽

Makoto Hayashi ◽

Aki Minoda

Keyword(s):

Single Cell ◽

Sequencing Analysis ◽

Command Line ◽

Rna Molecules ◽

Processing Pipeline ◽

Wide Range ◽

Single Cell Rna Sequencing ◽

Command Line Tool ◽

Cross Platform ◽

Cell Data

Abstract Single-cell RNA-sequencing analysis to quantify RNA molecules in individual cells has become popular owing to the large amount of information one can obtain from each experiment. We have developed UniverSC (https://github.com/minoda-lab/universc), a universal single-cell processing tool that supports any UMI-based platform. Our command-line tool enables consistent and comprehensive integration, comparison, and evaluation across data generated from a wide range of platforms.

Get full-text (via PubEx)

UniverSC: a flexible cross-platform single-cell data processing pipeline

10.1101/2021.01.19.427209 ◽

2021 ◽

Author(s):

S. Thomas Kelly ◽

Kai Battenberg ◽

Nicola A. Hetherington ◽

Makoto Hayashi ◽

Aki Minoda

Keyword(s):

Single Cell ◽

Sequencing Analysis ◽

Command Line ◽

Rna Molecules ◽

Processing Pipeline ◽

Wide Range ◽

Single Cell Rna Sequencing ◽

Command Line Tool ◽

Cross Platform ◽

Cell Data

AbstractSingle-cell RNA-sequencing analysis to quantify RNA molecules in individual cells has become popular owing to the large amount of information one can obtain from each experiment. We have developed UniverSC (https://github.com/minoda-lab/universc), a universal single-cell processing tool that supports any UMI-based platform. Our command-line tool enables consistent and comprehensive integration, comparison, and evaluation across data generated from a wide range of platforms.

Get full-text (via PubEx)

Prediction of prokaryotic transposases from protein features with machine learning approaches

Microbial Genomics ◽

10.1099/mgen.0.000611 ◽

2021 ◽

Vol 7 (7) ◽

Author(s):

Qian Wang ◽

Jun Ye ◽

Teng Xu ◽

Ning Zhou ◽

Zhongqiu Lu ◽

...

Keyword(s):

Machine Learning ◽

Antibiotic Resistance ◽

Mutual Information ◽

Ensemble Classifier ◽

Training Dataset ◽

Command Line ◽

Learning Approaches ◽

Command Line Tool ◽

Selection Operator ◽

Insight Into

Identification of prokaryotic transposases (Tnps) not only gives insight into the spread of antibiotic resistance and virulence but the process of DNA movement. This study aimed to develop a classifier for predicting Tnps in bacteria and archaea using machine learning (ML) approaches. We extracted a total of 2751 protein features from the training dataset including 14852 Tnps and 14852 controls, and selected 75 features as predictive signatures using the combined mutual information and least absolute shrinkage and selection operator algorithms. By aggregating these signatures, an ensemble classifier that integrated a collection of individual ML-based classifiers, was developed to identify Tnps. Further validation revealed that this classifier achieved good performance with an average AUC of 0.955, and met or exceeded other common methods. Based on this ensemble classifier, a stand-alone command-line tool designated TnpDiscovery was established to maximize the convenience for bioinformaticians and experimental researchers toward Tnp prediction. This study demonstrates the effectiveness of ML approaches in identifying Tnps, facilitating the discovery of novel Tnps in the future.

Get full-text (via PubEx)

PyBDA: a command line tool for automated analysis of big biological data sets

BMC Bioinformatics ◽

10.1186/s12859-019-3087-8 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Simon Dirmeier ◽

Mario Emmenlauer ◽

Christoph Dehio ◽

Niko Beerenwinkel

Keyword(s):

Machine Learning ◽

High Performance ◽

Single Cells ◽

Automated Analysis ◽

Biological Data ◽

Machine Learning Algorithms ◽

Data Sets ◽

Command Line ◽

Command Line Tool ◽

High Performance Computing Cluster

Abstract Background Analysing large and high-dimensional biological data sets poses significant computational difficulties for bioinformaticians due to lack of accessible tools that scale to hundreds of millions of data points. Results We developed a novel machine learning command line tool called PyBDA for automated, distributed analysis of big biological data sets. By using Apache Spark in the backend, PyBDA scales to data sets beyond the size of current applications. It uses Snakemake in order to automatically schedule jobs to a high-performance computing cluster. We demonstrate the utility of the software by analyzing image-based RNA interference data of 150 million single cells. Conclusion PyBDA allows automated, easy-to-use data analysis using common statistical methods and machine learning algorithms. It can be used with simple command line calls entirely making it accessible to a broad user base. PyBDA is available at https://pybda.rtfd.io.

Get full-text (via PubEx)

nQuire: A Statistical Framework For Ploidy Estimation Using Next Generation Sequencing

10.1101/143537 ◽

2017 ◽

Cited By ~ 1

Author(s):

Clemens L. Weiß ◽

Marina Pais ◽

Liliana M. Cano ◽

Sophien Kamoun ◽

Hernán A. Burbano

Keyword(s):

Next Generation Sequencing ◽

Intraspecific Variation ◽

Ploidy Level ◽

Three Dimensions ◽

Command Line ◽

Next Generation ◽

Statistical Framework ◽

Wide Range ◽

Command Line Tool ◽

Generation Sequencing

AbstractIntraspecific variation in ploidy occurs in a wide range of species including pathogenic and nonpathogenic eukaryotes such as yeasts and oomycetes. Ploidy can be inferred indirectly - without measuring DNA content - from experiments using next-generation sequencing (NGS). We present nQuire, a statistical framework that distinguishes between diploids, triploids and tetraploids using NGS. The command-line tool models the distribution of base frequencies at variable sites using a Gaussian Mixture Model, and uses maximum likelihood to select the most plausible ploidy model. nQuire handles large genomes at high coverage efficiently and uses standard input file formats.We demonstrate the utility of nQuire analyzing individual samples of the pathogenic oomycete Phytophthora infestans and the Baker’s yeast Saccharomyces cerevisiae. Using these organisms we show the dependence between reliability of the ploidy assignment and sequencing depth. Additionally, we employ normalized maximized log-likelihoods generated by nQuire to ascertain ploidy level in a population of samples with ploidy heterogeneity. Using these normalized values we cluster samples in three dimensions using multivariate Gaussian mixtures. The cluster assignments retrieved from a S. cerevisiae population recovered the true ploidy level in over 96% of samples. Finally, we show that nQuire can be used regionally to identify chromosomal aneuploidies.nQuire provides a statistical framework to study organisms with intraspecific variation in ploidy. nQuire is likely to be useful in epidemiological studies of pathogens, artificial selection experiments, and for historical or ancient samples where intact nuclei are not preserved. It is implemented as a stand-alone Linux command line tool in the C programming language and is available at github.com/clwgg/nQuire under the MIT license.

Get full-text (via PubEx)

CRISPR/ Cas9 Off-Targets: Computational Analysis of Causes, Prediction, Detection and Overcoming Strategies

Current Bioinformatics ◽

10.2174/1574893616666210708150439 ◽

2021 ◽

Vol 16 ◽

Author(s):

Roshan Kumar Roy ◽

Ipsita Debashree ◽

Sonal Srivastava ◽

Narayan Rishi ◽

Ashish Srivastava

Keyword(s):

Machine Learning ◽

Gene Therapy ◽

Genome Editing ◽

Gene Editing ◽

Computational Analysis ◽

Regulatory Mechanisms ◽

Learning Approaches ◽

Cellular Mechanisms ◽

Wide Range ◽

Target Activity

: CRISPR/Cas9 technology is a highly flexible RNA-guided endonuclease (RGEN) based gene-editing tool that has transformed the field of genomics, gene therapy, and genome/epigenome imaging. Its wide range of applications provides immense scope for understanding as well as manipulating genetic/epigenetic elements. However, the RGEN is prone to off-target mutagenesis that leads to deleterious effects. This review details the molecular and cellular mechanisms underlying the off-target activity, various available detection and prediction methodology ranging from sequencing to machine learning approaches, and the strategies to overcome/minimise off-targets. A coherent and concise method increasing target precision would prove indispensable to concrete manipulation and interpretation of genome editing results that can revolutionise therapeutics, including clarity in genome regulatory mechanisms during development.

Get full-text (via PubEx)

Platform for Analysing and Encouraging Student Activity on Contest and E-learning Systems

OLYMPIADS IN INFORMATICS ◽

10.15388/ioi.2018.07 ◽

2018 ◽

Vol 12 ◽

pp. 85-98

Author(s):

Bojan Kostadinov ◽

Mile Jovanov ◽

Emil STANKOV

Keyword(s):

Machine Learning ◽

Data Collection ◽

Educational Policy ◽

Learning Systems ◽

Data Sources ◽

Or Education ◽

Student Activity ◽

The World ◽

E Learning ◽

Analyse Data

Data collection and machine learning are changing the world. Whether it is medicine, sports or education, companies and institutions are investing a lot of time and money in systems that gather, process and analyse data. Likewise, to improve competitiveness, a lot of countries are making changes to their educational policy by supporting STEM disciplines. Therefore, it’s important to put effort into using various data sources to help students succeed in STEM. In this paper, we present a platform that can analyse student’s activity on various contest and e-learning systems, combine and process the data, and then present it in various ways that are easy to understand. This in turn enables teachers and organizers to recognize talented and hardworking students, identify issues, and/or motivate students to practice and work on areas where they’re weaker.

Get full-text (via PubEx)

Efficient Prediction of Structural and Electronic Properties of Hybrid 2D Materials Using DFT and Machine Learning

10.26434/chemrxiv.6254756.v1 ◽

2018 ◽

Author(s):

Sherif Tawfik ◽

Olexandr Isayev ◽

Catherine Stampfl ◽

Joseph Shapter ◽

David Winkler ◽

...

Keyword(s):

Machine Learning ◽

Band Gap ◽

Density Functional ◽

2D Materials ◽

Van Der Waals ◽

Building Blocks ◽

Machine Learning Techniques ◽

Interlayer Distance ◽

Computational Screening ◽

Wide Range

Materials constructed from different van der Waals two-dimensional (2D) heterostructures offer a wide range of benefits, but these systems have been little studied because of their experimental and computational complextiy, and because of the very large number of possible combinations of 2D building blocks. The simulation of the interface between two different 2D materials is computationally challenging due to the lattice mismatch problem, which sometimes necessitates the creation of very large simulation cells for performing density-functional theory (DFT) calculations. Here we use a combination of DFT, linear regression and machine learning techniques in order to rapidly determine the interlayer distance between two different 2D heterostructures that are stacked in a bilayer heterostructure, as well as the band gap of the bilayer. Our work provides an excellent proof of concept by quickly and accurately predicting a structural property (the interlayer distance) and an electronic property (the band gap) for a large number of hybrid 2D materials. This work paves the way for rapid computational screening of the vast parameter space of van der Waals heterostructures to identify new hybrid materials with useful and interesting properties.

Get full-text (via PubEx)