De novo assembly, delivery and expression of a 101 kb human gene in mouse cells

AbstractDesign and large-scale synthesis of DNA has been applied to the functional study of viral and microbial genomes. New and expanded technology development is required to unlock the transformative potential of such bottom-up approaches to the study of larger mammalian genomes. Two major challenges include assembling and delivering long DNA sequences. Here we describe a pipeline for de novo DNA assembly and delivery that enables functional evaluation of mammalian genes on the length scale of 100 kb. The DNA assembly step is supported by an integrated robotic workcell. We assembled the 101 kb human HPRT1 gene in yeast, delivered it to mouse embryonic stem cells, and showed expression of the human protein from its full-length gene. This pipeline provides a framework for producing systematic, designer variants of any mammalian gene locus for functional evaluation in cells.Significance StatementMammalian genomes consist of a tiny proportion of relatively well-characterized coding regions and vast swaths of poorly characterized “dark matter” containing critical but much less well-defined regulatory sequences. Given the dominant role of noncoding DNA in common human diseases and traits, the interconnectivity of regulatory elements, and the importance of genomic context, de novo design, assembly, and delivery can enable large-scale manipulation of these elements on a locus scale. Here we outline a pipeline for de novo assembly, delivery and expression of mammalian genes replete with native regulatory sequences. We expect this pipeline will be useful for dissecting the function of non-coding sequence variation in mammalian genomes.

Download Full-text

De novo assembly and delivery to mouse cells of a 101 kb functional human gene

Genetics ◽

10.1093/genetics/iyab038 ◽

2021 ◽

Author(s):

Leslie A Mitchell ◽

Laura H McCulloch ◽

Sudarshan Pinglay ◽

Henri Berger ◽

Nazario Bosco ◽

...

Keyword(s):

Dna Sequences ◽

Large Scale ◽

De Novo ◽

Human Gene ◽

Embryonic Stem ◽

Building Blocks ◽

Functional Study ◽

Dna Assembly ◽

Functional Evaluation ◽

Mouse Cells

Download Full-text

GimmeMotifs: an analysis framework for transcription factor motif analysis

10.1101/474403 ◽

2018 ◽

Cited By ~ 8

Author(s):

Niklas Bruse ◽

Simon J. van Heeringen

Keyword(s):

Dna Sequences ◽

Motif Discovery ◽

High Throughput Sequencing ◽

Performance Metrics ◽

De Novo ◽

Expression Patterns ◽

Regulatory Elements ◽

Ensemble Method ◽

Regulatory Sequences ◽

Motif Analysis

AbstractBackgroundTranscription factors (TFs) bind to specific DNA sequences, TF motifs, in cis-regulatory sequences and control the expression of the diverse transcriptional programs encoded in the genome. The concerted action of TFs within the chromatin context enables precise temporal and spatial expression patterns. To understand how TFs control gene expression it is essential to model TF binding. TF motif information can help to interpret the exact role of individual regulatory elements, for instance to predict the functional impact of non-coding variants.FindingsHere we present GimmeMotifs, a comprehensive computational framework for TF motif analysis. Compared to the previously published version, this release adds a whole range of new functionality and analysis methods. It now includes tools for de novo motif discovery, motif scanning and sequence analysis, motif clustering, calculation of performance metrics and visualization. Included with GimmeMotifs is a non-redundant database of clustered motifs. Compared to other motif databases, this collection of motifs shows competitive performance in discriminating bound from unbound sequences. Using our de novo motif discovery pipeline we find large differences in performance between de novo motif finders on ChIP-seq data. Using an ensemble method such as implemented in GimmeMotifs will generally result in improved motif identification compared to a single motif finder. Finally, we demonstrate maelstrom, a new ensemble method that enables comparative analysis of TF motifs between multiple high-throughput sequencing experiments, such as ChIP-seq or ATAC-seq. Using a collection of ~200 H3K27ac ChIP-seq data sets we identify TFs that play a role in hematopoietic differentiation and lineage commitment.ConclusionGimmeMotifs is a fully-featured and flexible framework for TF motif analysis. It contains both command-line tools as well as a Python API and is freely available at: https://github.com/vanheeringen-lab/gimmemotifs.

Download Full-text

ABySS 2.0: Resource-Efficient Assembly of Large Genomes using a Bloom Filter

10.1101/068338 ◽

2016 ◽

Cited By ~ 4

Author(s):

Shaun D Jackman ◽

Benjamin P Vandervalk ◽

Hamid Mohamadi ◽

Justin Chu ◽

Sarah Yeo ◽

...

Keyword(s):

Human Genome ◽

Dna Sequences ◽

Message Passing ◽

Large Scale ◽

De Novo ◽

Bloom Filter ◽

Genomic Variation ◽

De Bruijn Graph ◽

Single Individual ◽

Probabilistic Data Structure

AbstractThe assembly of DNA sequences de novo is fundamental to genomics research. It is the first of many steps towards elucidating and characterizing whole genomes. Downstream applications, including analysis of genomic variation between species, between or within individuals critically depends on robustly assembled sequences. In the span of a single decade, the sequence throughput of leading DNA sequencing instruments has increased drastically, and coupled with established and planned large-scale, personalized medicine initiatives to sequence genomes in the thousands and even millions, the development of efficient, scalable and accurate bioinformatics tools for producing high-quality reference draft genomes is timely.With ABySS 1.0, we originally showed that assembling the human genome using short 50 bp sequencing reads was possible by aggregating the half terabyte of compute memory needed over several computers using a standardized message-passing system (MPI). We present here its re-design, which departs from MPI and instead implements algorithms that employ a Bloom filter, a probabilistic data structure, to represent a de Bruijn graph and reduce memory requirements.We present assembly benchmarks of human Genome in a Bottle 250 bp Illumina paired-end and 6 kbp mate-pair libraries from a single individual, yielding a NG50 (NGA50) scaffold contiguity of 3.5 (3.0) Mbp using less than 35 GB of RAM, a modest memory requirement by today’s standard that is often available on a single computer. We also investigate the use of BioNano Genomics and 10x Genomics’ Chromium data to further improve the scaffold contiguity of this assembly to 42 (15) Mbp.

Download Full-text

The KAP1 Corepressor Functions To Coordinate the Assembly of De Novo HP1-Demarcated Microenvironments of Heterochromatin Required for KRAB Zinc Finger Protein-Mediated Transcriptional Repression

Molecular and Cellular Biology ◽

10.1128/mcb.00487-06 ◽

2006 ◽

Vol 26 (22) ◽

pp. 8623-8638 ◽

Cited By ~ 195

Author(s):

Smitha P. Sripathy ◽

Jessica Stevens ◽

David C. Schultz

Keyword(s):

Rna Polymerase Ii ◽

Zinc Finger ◽

Histone Modifications ◽

Dna Sequences ◽

Transcriptional Repression ◽

Zinc Finger Protein ◽

De Novo ◽

Regulatory Sequences ◽

Phd Finger ◽

Finger Protein

ABSTRACT KAP1/TIF1β is proposed to be a universal corepressor protein for the KRAB zinc finger protein (KRAB-zfp) superfamily of transcriptional repressors. To characterize the role of KAP1 and KAP1-interacting proteins in transcriptional repression, we investigated the regulation of stably integrated reporter transgenes by hormone-responsive KRAB and KAP1 repressor proteins. Here, we demonstrate that depletion of endogenous KAP1 levels by small interfering RNA (siRNA) significantly inhibited KRAB-mediated transcriptional repression of a chromatin template. Similarly, reduction in cellular levels of HP1α/β/γ and SETDB1 by siRNA attenuated KRAB-KAP1 repression. We also found that direct tethering of KAP1 to DNA was sufficient to repress transcription of an integrated transgene. This activity is absolutely dependent upon the interaction of KAP1 with HP1 and on an intact PHD finger and bromodomain of KAP1, suggesting that these domains function cooperatively in transcriptional corepression. The achievement of the repressed state by wild-type KAP1 involves decreased recruitment of RNA polymerase II, reduced levels of histone H3 K9 acetylation and H3K4 methylation, an increase in histone occupancy, enrichment of trimethyl histone H3K9, H3K36, and histone H4K20, and HP1 deposition at proximal regulatory sequences of the transgene. A KAP1 protein containing a mutation of the HP1 binding domain failed to induce any change in the histone modifications associated with DNA sequences of the transgene, implying that HP1-directed nuclear compartmentalization is required for transcriptional repression by the KRAB/KAP1 repression complex. The combination of these data suggests that KAP1 functions to coordinate activities that dynamically regulate changes in histone modifications and deposition of HP1 to establish a de novo microenvironment of heterochromatin, which is required for repression of gene transcription by KRAB-zfps.

Download Full-text

Origin and evolution of developmental enhancers in the mammalian neocortex

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1603718113 ◽

2016 ◽

Vol 113 (19) ◽

pp. E2617-E2626 ◽

Cited By ~ 49

Author(s):

Deena Emera ◽

Jun Yin ◽

Steven K. Reilly ◽

Jake Gockley ◽

James P. Noonan

Keyword(s):

De Novo ◽

Regulatory Elements ◽

Evolutionary Constraint ◽

Regulatory Sequences ◽

En Bloc ◽

Origin And Evolution ◽

Stem Lineage ◽

Coexpressed Genes ◽

Human And Mouse ◽

Insight Into

Morphological innovations such as the mammalian neocortex may involve the evolution of novel regulatory sequences. However, de novo birth of regulatory elements active during morphogenesis has not been extensively studied in mammals. Here, we use H3K27ac-defined regulatory elements active during human and mouse corticogenesis to identify enhancers that were likely active in the ancient mammalian forebrain. We infer the phylogenetic origins of these enhancers and find that ∼20% arose in the mammalian stem lineage, coincident with the emergence of the neocortex. Implementing a permutation strategy that controls for the nonrandom variation in the ages of background genomic sequences, we find that mammal-specific enhancers are overrepresented near genes involved in cell migration, cell signaling, and axon guidance. Mammal-specific enhancers are also overrepresented in modules of coexpressed genes in the cortex that are associated with these pathways, notably ephrin and semaphorin signaling. Our results also provide insight into the mechanisms of regulatory innovation in mammals. We find that most neocortical enhancers did not originate by en bloc exaptation of transposons. Young neocortical enhancers exhibit smaller H3K27ac footprints and weaker evolutionary constraint in eutherian mammals than older neocortical enhancers. Based on these observations, we present a model of the enhancer life cycle in which neocortical enhancers initially emerge from genomic background as short, weakly constrained “proto-enhancers.” Many proto-enhancers are likely lost, but some may serve as nucleation points for complex enhancers to evolve.

Download Full-text

Interaction between two different regulatory elements activates the murine alpha A-crystallin gene promoter in explanted lens epithelia.

Molecular and Cellular Biology ◽

10.1128/mcb.7.5.1807 ◽

1987 ◽

Vol 7 (5) ◽

pp. 1807-1814 ◽

Cited By ~ 32

Author(s):

A B Chepelinsky ◽

B Sommer ◽

J Piatigorsky

Keyword(s):

Dna Sequences ◽

Gene Promoter ◽

Regulatory Elements ◽

Regulatory Sequences ◽

Base Pairs ◽

Hybrid Gene ◽

Promoter Sequences ◽

Hybrid Genes ◽

Cat Gene ◽

Cat Expression

Previous experiments have indicated that 5' flanking DNA sequences (nucleotides-366 to +46) are capable of regulating the lens-specific transcription of the murine alpha A-crystallin gene. Here we have analyzed these 5' regulatory sequences by transfecting explanted embryonic chicken lens epithelia with different alpha A-crystallin-CAT (chloramphenicol acetyltransferase) hybrid genes (alpha A-crystallin promoter sequences fused to the bacterial CAT gene in the pSVO-CAT expression vector). The results indicated the presence of a proximal (-88 to +46) and a distal (-111 to -88) domain which must interact for promoter function. Deletion experiments showed that the sequence between -88 and -60 was essential for function of the proximal domain in the explanted epithelia. A synthetic oligonucleotide containing the sequence between -111 and -84 activated the proximal domain when placed in either orientation 57 base pairs upstream from position -88 of the alpha A-crystallin-CAT hybrid gene.

Download Full-text

Paired CRISPR/Cas9 guide-RNAs enable high-throughput deletion scanning (ScanDel) of a Mendelian disease locus for functionally critical non-coding elements

10.1101/092445 ◽

2016 ◽

Cited By ~ 2

Author(s):

Molly Gasperini ◽

Gregory M. Findlay ◽

Aaron McKenna ◽

Jennifer H. Milbank ◽

Choli Lee ◽

...

Keyword(s):

Large Scale ◽

Transcriptional Start Site ◽

Regulatory Elements ◽

Mendelian Disease ◽

Regulatory Sequences ◽

Coding Region ◽

Endogenous Gene ◽

Distal Regulatory Elements

AbstractThe extent to which distal non-coding mutations contribute to Mendelian disease remains a major unknown in human genetics. Given that a gene’s in vivo function can be appropriately modeled in vitro, CRISPR/Cas9 genome editing enables the large-scale perturbation of distal non-coding regions to identify functional elements in their native context. However, early attempts at such screens have relied on one individual guide RNA (gRNA) per cell, resulting in sparse mutagenesis with minimal redundancy across regions of interest. To address this, we developed a system that uses pairs of gRNAs to program thousands of kilobase-scale deletions that scan across a targeted region in a tiling fashion (“ScanDel”). As a proof-of-concept, we applied ScanDel to program 4,342 overlapping 1- and 2- kilobase (Kb) deletions that tile a 206 Kb region centered on HPRT1, the gene underlying Lesch-Nyhan syndrome, with median 27-fold redundancy per base. Programmed deletions were functionally assayed by selecting for loss of HPRT1 function with 6-thioguanine. HPRT1 exons served as positive controls, and all were successfully identified as functionally critical by the screen. Remarkably, HPRT1 function appeared robust to deletion of any intergenic or deeply intronic non-coding region across the 206 Kb locus, indicating that proximal regulatory sequences are sufficient for its expression. A sparser mutagenesis screen of the same 206 Kb with individual gRNAs also failed to identify critical distal regulatory elements. Although our screen did find programmed deletions and individual gRNAs with putative functional consequences that targeted exon-proximal non-coding sequences (e.g. the promoter), long-read sequencing revealed that this signal was driven almost entirely by rare, unexpected deletions that extended into exonic sequence. These targeted validation experiments defined a small region surrounding the transcriptional start site as the only non-coding sequence essential to HPRT1 function. Overall, our results suggest that distal regulatory elements are not critical for HPRT1 expression, and underscore the necessity of comprehensive edited-locus genotyping for validating the results of CRISPR screens. The application of ScanDel to additional loci will enable more insight into the extent to which the disruption of distal non-coding elements contributes to Mendelian diseases. In addition, dense, redundant, large-scale deletion scanning with gRNA pairs will facilitate a deeper understanding of endogenous gene regulation in the human genome.

Download Full-text

Hemoglobins from bacteria to man: evolution of different patterns of gene expression.

Journal of Experimental Biology ◽

10.1242/jeb.201.8.1099 ◽

1998 ◽

Vol 201 (8) ◽

pp. 1099-1117 ◽

Cited By ~ 2

Author(s):

R Hardison

Keyword(s):

Dna Sequences ◽

Cpg Island ◽

Globin Gene ◽

Gene Clusters ◽

Regulatory Elements ◽

Chromatin Domain ◽

Regulatory Sequences ◽

Beta Globin ◽

Ancestral Gene ◽

Beta Globin Gene

The discovery of hemoglobins in virtually all kingdoms of organisms has shown (1) that the ancestral gene for hemoglobin is ancient, and (2) that hemoglobins can serve additional functions besides transport of oxygen between tissues, ranging from intracellular oxygen transport to catalysis of redox reactions. These different functions of the hemoglobins illustrate the acquisition of new roles by a pre-existing structural gene, which requires changes not only in the coding regions but also in the regulatory elements of the genes. The evolution of different regulated functions within an ancient gene family allows an examination of the types of biosequence data that are informative for various types of issues. Alignment of amino acid sequences is informative for the phylogenetic relationships among the hemoglobins in bacteria, fungi, protists, plants and animals. Although many of these diverse hemoglobins are induced by low oxygen concentrations, to date none of the molecular mechanisms for their hypoxic induction shows common regulatory proteins; hence, a search for matches in non-coding DNA sequences would not be expected to be fruitful. Indeed, alignments of non-coding DNA sequences do not reveal significant matches even between mammalian alpha- and beta-globin gene clusters, which diverged approximately 450 million years ago and are still expressed in a coordinated and balanced manner. They are in very different genomic contexts that show pronounced differences in regulatory mechanisms. The alpha-globin gene is in constitutively active chromatin and is encompassed by a CpG island, which is a dominant determinant of its regulation, whereas the beta-globin gene is in A+T-rich genomic DNA. Non-coding sequence matches are not seen between avian and mammalian beta-globin gene clusters, which diverged approximately 250 million years ago, despite the fact that regulation of both gene clusters requires tissue-specific activation of a chromatin domain regulated by a locus control region. The cis-regulatory sequences needed for domain opening and enhancement do show common binding sites for transcription factors. In contrast, alignments of non-coding sequences from species representing multiple eutherian mammalian orders, some of which diverged as long as 135 million years ago, are reliable predictors of novel cis-regulatory elements, both proximal and distal to the genes. Examples include a potential target for the hematopoietic transcription factor TAL1.

Download Full-text

Interaction between two different regulatory elements activates the murine alpha A-crystallin gene promoter in explanted lens epithelia

Molecular and Cellular Biology ◽

10.1128/mcb.7.5.1807-1814.1987 ◽

1987 ◽

Vol 7 (5) ◽

pp. 1807-1814

Author(s):

A B Chepelinsky ◽

B Sommer ◽

J Piatigorsky

Keyword(s):

Dna Sequences ◽

Gene Promoter ◽

Regulatory Elements ◽

Regulatory Sequences ◽

Base Pairs ◽

Hybrid Gene ◽

Promoter Sequences ◽

Hybrid Genes ◽

Cat Gene ◽

Cat Expression

Download Full-text

Novel generation of human satellite DNA-based artificial chromosomes in mammalian cells

Journal of Cell Science ◽

10.1242/jcs.113.18.3207 ◽

2000 ◽

Vol 113 (18) ◽

pp. 3207-3216 ◽

Cited By ~ 1

Author(s):

E. Csonka ◽

I. Cserpan ◽

K. Fodor ◽

G. Hollo ◽

R. Katona ◽

...

Keyword(s):

Dna Sequences ◽

Satellite Dna ◽

Mammalian Cells ◽

Large Scale ◽

De Novo ◽

Mammalian Species ◽

Genetic Material ◽

Exogenous Dna ◽

Artificial Chromosomes ◽

Acrocentric Chromosomes

An in vivo approach has been developed for generation of artificial chromosomes, based on the induction of intrinsic, large-scale amplification mechanisms of mammalian cells. Here, we describe the successful generation of prototype human satellite DNA-based artificial chromosomes via amplification-dependent de novo chromosome formations induced by integration of exogenous DNA sequences into the centromeric/rDNA regions of human acrocentric chromosomes. Subclones with mitotically stable de novo chromosomes were established, which allowed the initial characterization and purification of these artificial chromosomes. Because of the low complexity of their DNA content, they may serve as a useful tool to study the structure and function of higher eukaryotic chromosomes. Human satellite DNA-based artificial chromosomes containing amplified satellite DNA, rDNA, and exogenous DNA sequences were heterochromatic, however, they provided a suitable chromosomal environment for the expression of the integrated exogenous genetic material. We demonstrate that induced de novo chromosome formation is a reproducible and effective methodology in generating artificial chromosomes from predictable sequences of different mammalian species. Satellite DNA-based artificial chromosomes formed by induced large-scale amplifications on the short arm of human acrocentric chromosomes may become safe or low risk vectors in gene therapy.

Download Full-text