Large-scale discovery of recombinases for integrating DNA into the human genome

Recent microbial genome sequencing efforts have revealed a vast reservoir of mobile genetic elements containing integrases that could be useful genome engineering tools. Large serine recombinases (LSRs), such as Bxb1 and PhiC31, are bacteriophage-encoded integrases that can facilitate the insertion of phage DNA into bacterial genomes. However, only a few LSRs have been previously characterized and they have limited efficiency in human cells. Here, we developed a systematic computational discovery workflow that searches across the bacterial tree of life to expand the diversity of known LSRs and their cognate DNA attachment sites by >100-fold. We validated this approach via experimental characterization of LSRs, leading to three classes of LSRs distinguished from one another by their efficiency and specificity. We identify landing pad LSRs that efficiently integrate into native attachment sites in a human cell context, human genome-targeting LSRs with computationally predictable pseudosites, and multi-targeting LSRs that can unidirectionally integrate cargos with similar efficiency and superior specificity to commonly used transposases. LSRs from each category were functionally characterized in human cells, overall achieving up to 7-fold higher plasmid recombination than Bxb1 and genome insertion efficiencies of 40-70% with cargo sizes over 7 kb. Overall, we establish a paradigm for the large-scale discovery of microbial recombinases directly from sequencing data and the reconstruction of their target sites. This strategy provided a rich resource of over 60 experimentally characterized LSRs that can function in human cells and thousands of additional candidates for large-payload genome editing without double-stranded DNA breaks.

Download Full-text

A heterodimer of evolved designer-recombinases precisely excises a human genomic DNA locus

Nucleic Acids Research ◽

10.1093/nar/gkz1078 ◽

2019 ◽

Vol 48 (1) ◽

pp. 472-485 ◽

Cited By ~ 5

Author(s):

Felix Lansing ◽

Maciej Paszkowski-Rogacz ◽

Lukas Theo Schmitt ◽

Paul Martin Schneider ◽

Teresa Rojo Romanos ◽

...

Keyword(s):

Human Genome ◽

Genome Editing ◽

Large Scale ◽

Genome Engineering ◽

Genomic Sequence ◽

Safe Alternative ◽

Protein Coding ◽

Dna Binding Specificity ◽

Human Genomic ◽

Human Genomic Sequence

Abstract Site-specific recombinases (SSRs) such as the Cre/loxP system are useful genome engineering tools that can be repurposed by altering their DNA-binding specificity. However, SSRs that delete a natural sequence from the human genome have not been reported thus far. Here, we describe the generation of an SSR system that precisely excises a 1.4 kb fragment from the human genome. Through a streamlined process of substrate-linked directed evolution we generated two separate recombinases that, when expressed together, act as a heterodimer to delete a human genomic sequence from chromosome 7. Our data indicates that designer-recombinases can be generated in a manageable timeframe for precision genome editing. A large-scale bioinformatics analysis suggests that around 13% of all human protein-coding genes could be targetable by dual designer-recombinase induced genomic deletion (dDRiGD). We propose that heterospecific designer-recombinases, which work independently of the host DNA repair machinery, represent an efficient and safe alternative to nuclease-based genome editing technologies.

Download Full-text

Multiplex base editing to convert TAG into TAA codons in the human genome

10.1101/2021.07.13.452007 ◽

2021 ◽

Author(s):

Yuting Chen ◽

Eriona Hysolli ◽

Anlu Chen ◽

Stephen Casper ◽

Songlei Liu ◽

...

Keyword(s):

Amino Acids ◽

Human Genome ◽

Mammalian Cells ◽

Large Scale ◽

Essential Gene ◽

Human Cells ◽

Viral Resistance ◽

Computational Tool ◽

Base Editing ◽

Genome Wide

Large-scale recoding has been shown to enable novel amino acids, biocontainment and viral resistance in bacteria only so far. Here we extend this to human cells demonstrating exceptional base editing to convert TAG to TAA for 33 essential genes via a single transfection, and examine base-editing genome-wide (observing ~ 40 C-to-T off-target events in essential gene exons). We also introduce GRIT, a computational tool for recoding. This demonstrates the feasibility of recoding, and multiplex editing in mammalian cells.

Download Full-text

Practical guide for managing large-scale human genome data in research

Journal of Human Genetics ◽

10.1038/s10038-020-00862-1 ◽

2020 ◽

Vol 66 (1) ◽

pp. 39-52

Author(s):

Tomoya Tanjo ◽

Yosuke Kawai ◽

Katsushi Tokunaga ◽

Osamu Ogasawara ◽

Masao Nagasaki

Keyword(s):

Data Processing ◽

Whole Genome Sequencing ◽

Human Genome ◽

Genome Sequencing ◽

Large Scale ◽

Genomic Data ◽

Whole Genome ◽

Sequencing Data ◽

Genome Data ◽

Human Genome Data

AbstractStudies in human genetics deal with a plethora of human genome sequencing data that are generated from specimens as well as available on public domains. With the development of various bioinformatics applications, maintaining the productivity of research, managing human genome data, and analyzing downstream data is essential. This review aims to guide struggling researchers to process and analyze these large-scale genomic data to extract relevant information for improved downstream analyses. Here, we discuss worldwide human genome projects that could be integrated into any data for improved analysis. Obtaining human whole-genome sequencing data from both data stores and processes is costly; therefore, we focus on the development of data format and software that manipulate whole-genome sequencing. Once the sequencing is complete and its format and data processing tools are selected, a computational platform is required. For the platform, we describe a multi-cloud strategy that balances between cost, performance, and customizability. A good quality published research relies on data reproducibility to ensure quality results, reusability for applications to other datasets, as well as scalability for the future increase of datasets. To solve these, we describe several key technologies developed in computer science, including workflow engine. We also discuss the ethical guidelines inevitable for human genomic data analysis that differ from model organisms. Finally, the future ideal perspective of data processing and analysis is summarized.

Download Full-text

Enhanced homology-directed human genome engineering by controlled timing of CRISPR/Cas9 delivery

eLife ◽

10.7554/elife.04766 ◽

2014 ◽

Vol 3 ◽

Cited By ~ 554

Author(s):

Steven Lin ◽

Brett T Staahl ◽

Ravi K Alla ◽

Jennifer A Doudna

Keyword(s):

Genome Engineering ◽

High Efficiency ◽

Embryonic Stem ◽

Human Cells ◽

Dna Breaks ◽

Site Specific ◽

Guide Rna ◽

Homology Directed Repair ◽

Double Strand Dna Breaks ◽

Cell Mortality

The CRISPR/Cas9 system is a robust genome editing technology that works in human cells, animals and plants based on the RNA-programmed DNA cleaving activity of the Cas9 enzyme. Building on previous work (<xref ref-type="bibr" rid="bib13">Jinek et al., 2013</xref>), we show here that new genetic information can be introduced site-specifically and with high efficiency by homology-directed repair (HDR) of Cas9-induced site-specific double-strand DNA breaks using timed delivery of Cas9-guide RNA ribonucleoprotein (RNP) complexes. Cas9 RNP-mediated HDR in HEK293T, human primary neonatal fibroblast and human embryonic stem cells was increased dramatically relative to experiments in unsynchronized cells, with rates of HDR up to 38% observed in HEK293T cells. Sequencing of on- and potential off-target sites showed that editing occurred with high fidelity, while cell mortality was minimized. This approach provides a simple and highly effective strategy for enhancing site-specific genome engineering in both transformed and primary human cells.

Download Full-text

SpeedSeq: Ultra-fast personal genome analysis and interpretation

10.1101/012179 ◽

2014 ◽

Cited By ~ 1

Author(s):

Colby Chiang ◽

Ryan M Layer ◽

Gregory G Faust ◽

Michael R Lindberg ◽

David B Rose ◽

...

Keyword(s):

Human Genome ◽

Genome Sequencing ◽

Genome Analysis ◽

Large Scale ◽

Fusion Gene ◽

Computing Time ◽

Disease Diagnosis ◽

Superior Performance ◽

Personal Genome ◽

Sequencing Data

Comprehensive interpretation of human genome sequencing data is a challenging bioinformatic problem that typically requires weeks of analysis, with extensive hands-on expert involvement. This informatics bottleneck inflates genome sequencing costs, poses a computational burden for large-scale projects, and impedes the adoption of time-critical clinical applications such as personalized cancer profiling and newborn disease diagnosis, where the actionable timeframe can measure in hours or days. We developed SpeedSeq, an open-source genome analysis platform that vastly reduces computing time. SpeedSeq accomplishes read alignment, duplicate removal, variant detection and functional annotation of a 50X human genome in <24 hours, even using one low-cost server. SpeedSeq offers competitive or superior performance to current methods for detecting germline and somatic single nucleotide variants (SNVs), indels, and structural variants (SVs) and includes novel functionality for SV genotyping, SV annotation, fusion gene detection, and rapid identification of actionable mutations. SpeedSeq will help bring timely genome analysis into the clinical realm.

Download Full-text

SARS-CoV-2 variants reveal features critical for replication in primary human cells

PLoS Biology ◽

10.1371/journal.pbio.3001006 ◽

2021 ◽

Vol 19 (3) ◽

pp. e3001006

Author(s):

Marie O. Pohl ◽

Idoia Busnadiego ◽

Verena Kufner ◽

Irina Glas ◽

Umut Karakus ◽

...

Keyword(s):

Large Scale ◽

Human Cells ◽

Sequencing Data ◽

Furin Cleavage ◽

Bronchial Epithelial ◽

Naturally Occurring ◽

Furin Cleavage Site ◽

Low Passage ◽

Viral Sequencing ◽

Primary Human Cells

Since entering the human population, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2; the causative agent of Coronavirus Disease 2019 [COVID-19]) has spread worldwide, causing >100 million infections and >2 million deaths. While large-scale sequencing efforts have identified numerous genetic variants in SARS-CoV-2 during its circulation, it remains largely unclear whether many of these changes impact adaptation, replication, or transmission of the virus. Here, we characterized 14 different low-passage replication-competent human SARS-CoV-2 isolates representing all major European clades observed during the first pandemic wave in early 2020. By integrating viral sequencing data from patient material, virus stocks, and passaging experiments, together with kinetic virus replication data from nonhuman Vero-CCL81 cells and primary differentiated human bronchial epithelial cells (BEpCs), we observed several SARS-CoV-2 features that associate with distinct phenotypes. Notably, naturally occurring variants in Orf3a (Q57H) and nsp2 (T85I) were associated with poor replication in Vero-CCL81 cells but not in BEpCs, while SARS-CoV-2 isolates expressing the Spike D614G variant generally exhibited enhanced replication abilities in BEpCs. Strikingly, low-passage Vero-derived stock preparation of 3 SARS-CoV-2 isolates selected for substitutions at positions 5/6 of E and were highly attenuated in BEpCs, revealing a key cell-specific function to this region. Rare isolate-specific deletions were also observed in the Spike furin cleavage site during Vero-CCL81 passage, but these were rapidly selected against in BEpCs, underscoring the importance of this site for SARS-CoV-2 replication in primary human cells. Overall, our study uncovers sequence features in SARS-CoV-2 variants that determine cell-specific replication and highlights the need to monitor SARS-CoV-2 stocks carefully when phenotyping newly emerging variants or potential variants of concern.

Download Full-text

Plasmids or no plasmids? A comparison between the agilent TapeStation and whole-genome sequencing data in a large-scale bacterial sequencing project

10.26226/morressier.56d5ba27d462b80296c95fe7 ◽

2016 ◽

Author(s):

Sarah Alexander

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Project

Download Full-text

Prime-editors (nickases), hRad51–Cas9 nickase fusions and dCas9 have the same problem as conventional CRISPR-Cas9 of plasmid/Cas9 integration after making a double stranded break

10.31219/osf.io/jf6pe ◽

2019 ◽

Author(s):

Sandeep Chakraborty

Keyword(s):

Cellular Response ◽

Sequencing Data ◽

Dna Breaks ◽

Precise Integration ◽

Guide Rna ◽

Double Strand Dna Breaks ◽

Human Therapy ◽

P53 Activation ◽

Programmable Nucleases

‘Prime-editing’ proposes to replace traditional programmable nucleases (CRISPR-Cas9) using a catalytically impaired Cas9 (dCas9) connected to a engineered reverse transcriptase, and a guide RNA encoding both the target site and the desired change. With just a ‘nick’ on one strand, it is hypothe- sized, the negative, uncontrollable effects arising from double-strand DNA breaks (DSBs) - translocations, complex proteins, integrations and p53 activation - will be eliminated. However, sequencing data pro- vided (Accid:PRJNA565979) reveal plasmid integration, indicating that DSBs occur. Also, looking at only 16 off-targets is inadequate to assert that Prime-editing is more precise. Integration of plasmid occurs in all three versions (PE1/2/3). Interestingly, dCas9 which is known to be toxic in E. coli and yeast, is shown to have residual endonuclease activity. This also affects studies that use dCas9, like base- editors and de/methylations systems. Previous work using hRad51–Cas9 nickases also show significant integration in on-targets, as well as off-target integration [1]. Thus, we show that cellular response to nicking involves DSBs, and subsequent plasmid/Cas9 integration. This is an unacceptable outcome for any in vivo application in human therapy.

Download Full-text