Crisflash: open-source software to generate CRISPR guide RNAs against genomes annotated with individual variation

Adrien L S Jacquin; Duncan T Odom; Margus Lukk

doi:10.1093/bioinformatics/btz019

Crisflash: open-source software to generate CRISPR guide RNAs against genomes annotated with individual variation

Bioinformatics ◽

10.1093/bioinformatics/btz019 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3146-3147 ◽

Cited By ~ 10

Author(s):

Adrien L S Jacquin ◽

Duncan T Odom ◽

Margus Lukk

Keyword(s):

Open Source Software ◽

Software Tool ◽

Supplementary Information ◽

Small Scale ◽

Genome Sequences ◽

Guide Rnas ◽

Genome Modification ◽

Order Of Magnitude ◽

Sgrna Design ◽

Reference Genomes

Abstract Summary CRISPR/Cas9 system requires short guide RNAs (sgRNAs) to direct genome modification. Most currently available tools for sgRNA design operate only with standard reference genomes, and are best suited for small-scale projects. To address these limitations, we developed Crisflash, a software tool for fast sgRNA design and potential off-target discovery, built for performance and flexibility. Crisflash can rapidly design CRISPR guides against any sequenced genome or genome sequences, and can optimize guide accuracy by incorporating user-supplied variant data. Crisflash is over an order of magnitude faster than comparable tools, even using a single CPU core, and efficiently and robustly scores the potential off-targeting of all possible candidate CRISPR guide oligonucleotides. Availability and implementation https://github.com/crisflash Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CellTracker (not only) for dummies

Bioinformatics ◽

10.1093/bioinformatics/btv686 ◽

2015 ◽

Vol 32 (6) ◽

pp. 955-957 ◽

Cited By ~ 46

Author(s):

Filippo Piccinini ◽

Alexa Kiss ◽

Peter Horvath

Keyword(s):

Graphical User Interface ◽

Open Source Software ◽

Phase Contrast ◽

Cell Tracking ◽

Source Code ◽

Software Tool ◽

Time Lapse ◽

Supplementary Information ◽

Differential Interference Contrast ◽

User Friendly

Abstract Motivation: Time-lapse experiments play a key role in studying the dynamic behavior of cells. Single-cell tracking is one of the fundamental tools for such analyses. The vast majority of the recently introduced cell tracking methods are limited to fluorescently labeled cells. An equally important limitation is that most software cannot be effectively used by biologists without reasonable expertise in image processing. Here we present CellTracker, a user-friendly open-source software tool for tracking cells imaged with various imaging modalities, including fluorescent, phase contrast and differential interference contrast (DIC) techniques. Availability and implementation: CellTracker is written in MATLAB (The MathWorks, Inc., USA). It works with Windows, Macintosh and UNIX-based systems. Source code and graphical user interface (GUI) are freely available at: http://celltracker.website/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

CRISPR-Local: a local single-guide RNA (sgRNA) design tool for non-reference plant genomes

Bioinformatics ◽

10.1093/bioinformatics/bty970 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2501-2503 ◽

Cited By ~ 9

Author(s):

Jiamin Sun ◽

Hao Liu ◽

Jianxiao Liu ◽

Shikun Cheng ◽

Yong Peng ◽

...

Keyword(s):

Design Tool ◽

Supplementary Information ◽

Guide Rna ◽

Plant Genomes ◽

Guide Rnas ◽

Genome Wide ◽

Reference Plant ◽

Sgrna Design ◽

Computational Resources ◽

Local Tool

Abstract Summary CRISPR-Local is a high-throughput local tool for designing single-guide RNAs (sgRNAs) in plants and other organisms that factors in genetic variation and is optimized to generate genome-wide sgRNAs. CRISPR-Local outperforms other sgRNA design tools in the following respects: (i) designing sgRNAs suitable for non-reference varieties; (ii) screening for sgRNAs that are capable of simultaneously targeting multiple genes; (iii) saving computational resources by avoiding repeated calculations from multiple submissions and (iv) running offline, with both command-line and graphical user interface modes and the ability to export multiple formats for further batch analysis or visualization. We have applied CRISPR-Local to 71 public plant genomes, using both CRISPR/Cas9 and CRISPR/cpf1 systems. Availability and implementation CRISPR-Local can be freely downloaded from http://crispr.hzau.edu.cn/CRISPR-Local/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PAVOOC: designing CRISPR sgRNAs using 3D protein structures and functional domain annotations

Bioinformatics ◽

10.1093/bioinformatics/bty935 ◽

2018 ◽

Vol 35 (13) ◽

pp. 2309-2310 ◽

Cited By ~ 3

Author(s):

Moritz Schaefer ◽

Djork-Arné Clevert ◽

Bertram Weiss ◽

Andreas Steffen

Keyword(s):

Protein Structures ◽

Design Tool ◽

Protein Crystal ◽

Supplementary Information ◽

Functional Domain ◽

Web Based ◽

Homology Directed Repair ◽

Guide Rnas ◽

Sgrna Design ◽

Repair Template

Abstract Summary Single-guide RNAs (sgRNAs) targeting the same gene can significantly vary in terms of efficacy and specificity. PAVOOC (Prediction And Visualization of On- and Off-targets for CRISPR) is a web-based CRISPR sgRNA design tool that employs state of the art machine learning models to prioritize most effective candidate sgRNAs. In contrast to other tools, it maps sgRNAs to functional domains and protein structures and visualizes cut sites on corresponding protein crystal structures. Furthermore, PAVOOC supports homology-directed repair template generation for genome editing experiments and the visualization of the mutated amino acids in 3D. Availability and implementation PAVOOC is available under https://pavooc.me and accessible using modern browsers (Chrome/Chromium recommended). The source code is hosted at github.com/moritzschaefer/pavooc under the MIT License. The backend, including data processing steps, and the frontend are implemented in Python 3 and ReactJS, respectively. All components run in a simple Docker environment. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Tibanna: software for scalable execution of portable pipelines on the cloud

Bioinformatics ◽

10.1093/bioinformatics/btz379 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4424-4426 ◽

Cited By ~ 1

Author(s):

Soohyun Lee ◽

Jeremy Johnson ◽

Carl Vitzthum ◽

Koray Kırlı ◽

Burak H Alver ◽

...

Keyword(s):

Open Source Software ◽

Source Code ◽

Software Tool ◽

Application Programming Interface ◽

Supplementary Information ◽

Supplementary Data ◽

Description Language ◽

Amazon Web Services ◽

Application Programming ◽

Programming Interface

Abstract Summary We introduce Tibanna, an open-source software tool for automated execution of bioinformatics pipelines on Amazon Web Services (AWS). Tibanna accepts reproducible and portable pipeline standards including Common Workflow Language (CWL), Workflow Description Language (WDL) and Docker. It adopts a strategy of isolation and optimization of individual executions, combined with a serverless scheduling approach. Pipelines are executed and monitored using local commands or the Python Application Programming Interface (API) and cloud configuration is automatically handled. Tibanna is well suited for projects with a range of computational requirements, including those with large and widely fluctuating loads. Notably, it has been used to process terabytes of data for the 4D Nucleome (4DN) Network. Availability and implementation Source code is available on GitHub at https://github.com/4dn-dcic/tibanna. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Large scale microbiome profiling in the cloud

Bioinformatics ◽

10.1093/bioinformatics/btz356 ◽

2019 ◽

Vol 35 (14) ◽

pp. i13-i22 ◽

Cited By ~ 1

Author(s):

Camilo Valdes ◽

Vitalii Stebliankin ◽

Giri Narasimhan

Keyword(s):

Large Scale ◽

Bacterial Population ◽

Reference Genome ◽

Supplementary Information ◽

Bacterial Genomes ◽

Reference Collection ◽

Order Of Magnitude ◽

Spark Framework ◽

Reference Genomes ◽

Microbiome Profiling

Abstract Motivation Bacterial metagenomics profiling for metagenomic whole sequencing (mWGS) usually starts by aligning sequencing reads to a collection of reference genomes. Current profiling tools are designed to work against a small representative collection of genomes, and do not scale very well to larger reference genome collections. However, large reference genome collections are capable of providing a more complete and accurate profile of the bacterial population in a metagenomics dataset. In this paper, we discuss a scalable, efficient and affordable approach to this problem, bringing big data solutions within the reach of laboratories with modest resources. Results We developed Flint, a metagenomics profiling pipeline that is built on top of the Apache Spark framework, and is designed for fast real-time profiling of metagenomic samples against a large collection of reference genomes. Flint takes advantage of Spark’s built-in parallelism and streaming engine architecture to quickly map reads against a large (170 GB) reference collection of 43 552 bacterial genomes from Ensembl. Flint runs on Amazon’s Elastic MapReduce service, and is able to profile 1 million Illumina paired-end reads against over 40 K genomes on 64 machines in 67 s—an order of magnitude faster than the state of the art, while using a much larger reference collection. Streaming the sequencing reads allows this approach to sustain mapping rates of 55 million reads per hour, at an hourly cluster cost of $8.00 USD, while avoiding the necessity of storing large quantities of intermediate alignments. Availability and implementation Flint is open source software, available under the MIT License (MIT). Source code is available at https://github.com/camilo-v/flint. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs

Bioinformatics ◽

10.1093/bioinformatics/btaa045 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2690-2696

Author(s):

Jarkko Toivonen ◽

Pratyush K Das ◽

Jussi Taipale ◽

Esko Ukkonen

Keyword(s):

Markov Models ◽

Expectation Maximization Algorithm ◽

Software Tool ◽

Specific Weight ◽

Training Data ◽

Supplementary Information ◽

Markov Modeling ◽

Binding Motifs ◽

The Difference ◽

Probability Matrices

Abstract Motivation Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. Results We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. Availability and implementation Software implementation is available from https://github.com/jttoivon/moder2. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A two-dimensional glacier–fjord coupled model applied to estimate submarine melt rates and front position changes of Hansbreen, Svalbard

Journal of Glaciology ◽

10.1017/jog.2018.61 ◽

2018 ◽

Vol 64 (247) ◽

pp. 745-758 ◽

Cited By ~ 2

Author(s):

E. DE ANDRÉS ◽

J. OTERO ◽

F. NAVARRO ◽

A. PROMIŃSKA ◽

J. LAPAZARAN ◽

...

Keyword(s):

Coupled Model ◽

Small Scale ◽

Two Dimensional ◽

Time Step ◽

Software Packages ◽

Front Position ◽

Circulation Patterns ◽

Glacier Front ◽

Order Of Magnitude ◽

Frontal Ablation

ABSTRACTWe have developed a two-dimensional coupled glacier–fjord model, which runs automatically using Elmer/Ice and MITgcm software packages, to investigate the magnitude of submarine melting along a vertical glacier front and its potential influence on glacier calving and front position changes. We apply this model to simulate the Hansbreen glacier–Hansbukta proglacial–fjord system, Southwestern Svalbard, during the summer of 2010. The limited size of this system allows us to resolve some of the small-scale processes occurring at the ice–ocean interface in the fjord model, using a 0.5 s time step and a 1 m grid resolution near the glacier front. We use a rich set of field data spanning the period April–August 2010 to constrain, calibrate and validate the model. We adjust circulation patterns in the fjord by tuning subglacial discharge inputs that best match observed temperature while maintaining a compromise with observed salinity, suggesting a convectively driven circulation in Hansbukta. The results of our model simulations suggest that both submarine melting and crevasse hydrofracturing exert important controls on seasonal frontal ablation, with submarine melting alone not being sufficient for reproducing the observed patterns of seasonal retreat. Both submarine melt and calving rates accumulated along the entire simulation period are of the same order of magnitude, ~100 m. The model results also indicate that changes in submarine melting lag meltwater production by 4–5 weeks, which suggests that it may take up to a month for meltwater to traverse the englacial and subglacial drainage network.

Download Full-text

Small-scale spatio-temporal characteristics of accumulation rates in western Dronning Maud Land, Antarctica

Journal of Glaciology ◽

10.3189/002214308784886243 ◽

2008 ◽

Vol 54 (185) ◽

pp. 315-323 ◽

Cited By ~ 10

Author(s):

Helgard Anschütz ◽

Daniel Steinhage ◽

Olaf Eisen ◽

Hans Oerter ◽

Martin Horwath ◽

...

Keyword(s):

Accumulation Rate ◽

Temporal Variations ◽

Spatial Variations ◽

Small Scale ◽

Accumulation Rates ◽

Order Of Magnitude ◽

Dronning Maud Land ◽

Spatio Temporal ◽

Ground Penetrating ◽

Firn Core

AbstractSpatio-temporal variations of the recently determined accumulation rate are investigated using ground-penetrating radar (GPR) measurements and firn-core studies. The study area is located on Ritscherflya in western Dronning Maud Land, Antarctica, at an elevation range 1400–1560 m. Accumulation rates are derived from internal reflection horizons (IRHs), tracked with GPR, which are connected to a dated firn core. GPR-derived internal layer depths show small relief along a 22 km profile on an ice flowline. Average accumulation rates are about 190 kg m−2 a−1 (1980–2005) with spatial variability (1σ) of 5% along the GPR profile. The interannual variability obtained from four dated firn cores is one order of magnitude higher, showing 1σ standard deviations around 30%. Mean temporal variations of GPRderived accumulation rates are of the same magnitude or even higher than spatial variations. Temporal differences between 1980–90 and 1990–2005, obtained from two dated IRHs along the GPR profile, indicate temporally non-stationary processes, linked to spatial variations. Comparison with similarly obtained accumulation data from another coastal area in central Dronning Maud Land confirms this observation. Our results contribute to understanding spatio-temporal variations of the accumulation processes, necessary for the validation of satellite data (e.g. altimetry studies and gravity missions such as Gravity Recovery and Climate Experiment (GRACE)).

Download Full-text

EARRINGS: an efficient and accurate adapter trimmer entails no a priori adapter sequences

Bioinformatics ◽

10.1093/bioinformatics/btab025 ◽

2021 ◽

Author(s):

Ting-Hsuan Wang ◽

Cheng-Ching Huang ◽

Jui-Hung Hung

Keyword(s):

Open Source Software ◽

Large Scale ◽

A Priori ◽

Supplementary Information ◽

Supplementary Data ◽

Comparable Accuracy ◽

Meta Analyses ◽

Next Generation Sequencing Ngs ◽

Adapter Trimming ◽

Generation Sequencing

Abstract Motivation Cross-sample comparisons or large-scale meta-analyses based on the next generation sequencing (NGS) involve replicable and universal data preprocessing, including removing adapter fragments in contaminated reads (i.e. adapter trimming). While modern adapter trimmers require users to provide candidate adapter sequences for each sample, which are sometimes unavailable or falsely documented in the repositories (such as GEO or SRA), large-scale meta-analyses are therefore jeopardized by suboptimal adapter trimming. Results Here we introduce a set of fast and accurate adapter detection and trimming algorithms that entail no a priori adapter sequences. These algorithms were implemented in modern C++ with SIMD and multithreading to accelerate its speed. Our experiments and benchmarks show that the implementation (i.e. EARRINGS), without being given any hint of adapter sequences, can reach comparable accuracy and higher throughput than that of existing adapter trimmers. EARRINGS is particularly useful in meta-analyses of a large batch of datasets and can be incorporated in any sequence analysis pipelines in all scales. Availability and implementation EARRINGS is open-source software and is available at https://github.com/jhhung/EARRINGS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TriPOINT: a software tool to prioritize important genes in pathways and their non-coding regulators

Bioinformatics ◽

10.1093/bioinformatics/bty998 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2686-2689

Author(s):

Asa Thibodeau ◽

Dong-Guk Shin

Keyword(s):

Gene Expression ◽

Software Tool ◽

Supplementary Information ◽

Analysis Tool ◽

Graph Representations ◽

Expression Levels ◽

Conducting Pathway ◽

Pathway Analysis Tool ◽

Pathway Analyses ◽

Gene Expression Levels

Abstract Summary Current approaches for pathway analyses focus on representing gene expression levels on graph representations of pathways and conducting pathway enrichment among differentially expressed genes. However, gene expression levels by themselves do not reflect the overall picture as non-coding factors play an important role to regulate gene expression. To incorporate these non-coding factors into pathway analyses and to systematically prioritize genes in a pathway we introduce a new software: Triangulation of Perturbation Origins and Identification of Non-Coding Targets. Triangulation of Perturbation Origins and Identification of Non-Coding Targets is a pathway analysis tool, implemented in Java that identifies the significance of a gene under a condition (e.g. a disease phenotype) by studying graph representations of pathways, analyzing upstream and downstream gene interactions and integrating non-coding regions that may be regulating gene expression levels. Availability and implementation The TriPOINT open source software is freely available at https://github.uconn.edu/ajt06004/TriPOINT under the GPL v3.0 license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text