Cis-regulatory elements and human evolution

Mapping Intimacies ◽

10.1101/005652 ◽

2014 ◽

Author(s):

Adam Siepel ◽

Leonardo Arbiza

Keyword(s):

Transcriptional Regulation ◽

Human Evolution ◽

Large Scale ◽

Regulatory Elements ◽

Data Sets ◽

Regulatory Evolution ◽

Genome Wide ◽

A Genome ◽

Wide Scale ◽

Human Polymorphism

Modification of gene regulation has long been considered an important force in human evolution, particularly through changes to cis-regulatory elements (CREs) that function in transcriptional regulation. For decades, however, the study of cis-regulatory evolution was severely limited by the available data. New data sets describing the locations of CREs and genetic variation within and between species have now made it possible to study CRE evolution much more directly on a genome-wide scale. Here, we review recent research on the evolution of CREs in humans based on large-scale genomic data sets. We consider inferences based on primate divergence,human polymorphism, and combinations of divergence and polymorphism. We then consider "new frontiers" in this field stemming from recent research on transcriptional regulation.

Integrative analysis of epigenetics data identifies gene-specific regulatory elements

10.1101/585125 ◽

2019 ◽

Cited By ~ 2

Author(s):

Florian Schmidt ◽

Alexander Marx ◽

Marie Hebel ◽

Martin Wegner ◽

Nina Baumgarten ◽

...

Keyword(s):

Transcriptional Regulation ◽

Cell Types ◽

Regulatory Elements ◽

Chromatin Accessibility ◽

Genome Wide ◽

A Genome ◽

Gene Level ◽

Regulatory Landscape ◽

Regulatory Sites ◽

Validation Experiments

AbstractUnderstanding the complexity of transcriptional regulation is a major goal of computational biology. Because experimental linkage of regulatory sites to genes is challenging, computational methods considering epigenomics data have been proposed to create tissue-specific regulatory maps. However, we showed that these approaches are not well suited to account for the variations of the regulatory landscape between cell-types. To overcome these drawbacks, we developed a new method called STITCHIT, that identifies and links putative regulatory sites to genes. Within STITCHIT, we consider the chromatin accessibility signal of all samples jointly to identify regions exhibiting a signal variation related to the expression of a distinct gene. STITCHIToutperforms previous approaches in various validation experiments and was used with a genome-wide CRISPR-Cas9 screen to prioritize novel doxorubicin-resistance genes and their associated non-coding regulatory regions. We believe that our work paves the way for a more refined understanding of transcriptional regulation at the gene-level.

Interrogating CD8+ T cell reactivity on a genome-wide scale

Science Translational Medicine ◽

10.1126/scitranslmed.aaz0302 ◽

2019 ◽

Vol 11 (506) ◽

pp. eaaz0302

Author(s):

Kamila Naxerova

Keyword(s):

T Cell ◽

Large Scale ◽

Cd8 T Cell ◽

Cell Reactivity ◽

Human T Cell ◽

Genome Wide ◽

A Genome ◽

T Cell Antigens ◽

Wide Scale ◽

T Cell Reactivity

A new method enables large-scale identification of human T cell antigens.

Technical Note: Efficient and accurate estimation of genotype odds ratios in biobank-based unbalanced case-control studies

10.1101/646018 ◽

2019 ◽

Author(s):

Rounak Dey ◽

Seunggeun Lee

Keyword(s):

Large Scale ◽

Association Studies ◽

Case Control ◽

Accurate Estimation ◽

Genome Wide Association Studies ◽

Odds Ratios ◽

Case Control Studies ◽

Genome Wide ◽

A Genome ◽

Wide Scale

AbstractIn genome-wide association studies (GWASs), genotype log-odds ratios (LORs) quantify the effects of the variants on the binary phenotypes, and calculating the genotype LORs for all of the markers is required for several downstream analyses. Calculating genotype LORs at a genome-wide scale is computationally challenging, especially when analyzing large-scale biobank data, which involves performing thousands of GWASs phenome-wide. Since most of the binary phenotypes in biobank-based studies have unbalanced (case : control = 1 : 10) or often extremely unbalanced (case : control = 1 : 100) case-control ratios, the existing methods cannot provide a scalable and accurate way to estimate the genotype LORs. The traditional logistic regression provides biased LOR estimates in such situations. Although the Firth bias correction method can provide unbiased LOR estimates, it is not scalable for genome-wide or phenome-wide scale association analyses typically used in biobank-based studies, especially when the number of non-genetic covariates is large. On the other hand, the saddlepoint approximation-based test (fastSPA), which can provide accurate p values and is scalable to analyse large-scale biobank data, does not provide the genotype LOR estimates as it is a score-based test. Here, we propose a scalable method based on score statistics, to accurately estimate the genotype LORs, adjusting for non-genetic covariates. Comparing to the Firth method, our proposed method reduces the computational complexity from O(nK2 + K3) to O(n), where n is the sample-size, and K is the number of non-genetic covariates. Our method is ~ 10x faster than the Firth method when 15 covariates are being adjusted for. Through extensive numerical simulations, we show that the proposed method is both scalable and accurate in estimating the genotype ORs in genome-wide or phenome-wide scale.

Filtering the Junk: Assigning Function to the Mosquito Non-Coding Genome

Insects ◽

10.3390/insects12020186 ◽

2021 ◽

Vol 12 (2) ◽

pp. 186

Author(s):

Elise J. Farley ◽

Heather Eggleston ◽

Michelle M. Riehle

Keyword(s):

Regulatory Elements ◽

Protein Coding ◽

Apicomplexan Parasites ◽

Susceptibility To Infection ◽

Genome Wide ◽

A Genome ◽

Gene And Protein Expression ◽

Wide Scale ◽

Differential Gene ◽

Analytical Approaches

The portion of the mosquito genome that does not code for proteins contains regulatory elements that likely underlie variation for important phenotypes including resistance and susceptibility to infection with arboviruses and Apicomplexan parasites. Filtering the non-coding genome to uncover these functional elements is an expanding area of research, though identification of non-coding regulatory elements is challenging due to the lack of an amino acid-like code for the non-coding genome and a lack of sequence conservation across species. This review focuses on three types of non-coding regulatory elements: (1) microRNAs (miRNAs), (2) long non-coding RNAs (lncRNAs), and (3) enhancers, and summarizes current advances in technical and analytical approaches for measurement of each of these elements on a genome-wide scale. The review also summarizes and highlights novel findings following application of these techniques in mosquito-borne disease research. Looking beyond the protein-coding genome is essential for understanding the complexities that underlie differential gene expression in response to arboviral or parasite infection in mosquito disease vectors. A comprehensive understanding of the regulation of gene and protein expression will inform transgenic and other vector control methods rooted in naturally segregating genetic variation.

Large-Scale Gene Expression Data Analysis: A New Challenge to Computational Biologists

Genome Research ◽

10.1101/gr.9.8.681 ◽

1999 ◽

Vol 9 (8) ◽

pp. 681-688 ◽

Cited By ~ 3

Author(s):

Michael Q. Zhang

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Yeast Genome ◽

Expression Data ◽

Dna Arrays ◽

Gene Expression Data Analysis ◽

Genome Wide ◽

A Genome ◽

Wide Scale

The use of high-density DNA arrays to monitor gene expression at a genome-wide scale constitutes a fundamental advance in biology. In particular, the expression pattern of all genes in Saccharomyces cerevisiae can be interrogated using microarray analysis where cDNAs are hybridized to an array of each of the ∼6000 genes in the yeast genome. In this survey I review three recent experiments related to transcriptional regulation and discuss the great challenge for computational biologists trying to extract functional information from such large-scale gene expression data.

CRISPRi-Seq for the Identification and Characterisation of Essential Mycobacterial Genes and Transcriptional Units

10.1101/358275 ◽

2018 ◽

Cited By ~ 6

Author(s):

Timothy J. de Wet ◽

Irene Gobe ◽

Musa M. Mhlanga ◽

Digby F. Warner

Keyword(s):

High Throughput ◽

Large Scale ◽

Essential Gene ◽

Experimental Models ◽

Bacterial Gene ◽

Growth And Survival ◽

Genome Wide ◽

A Genome ◽

Wide Scale ◽

Transcriptional Units

AbstractHigh-throughput essentiality screens have enabled genome-wide assessments of the genetic requirements for growth and survival of a variety of bacteria in different experimental models. The reliance in many of these studies on transposon (Tn)-based gene inactivation has, however, limited the ability to probe essential gene function or design targeted screens. We interrogated the potential of targeted, large-scale, pooled CRISPR interference (CRISPRi)-based screens to extend conventional Tn approaches in mycobacteria through the capacity for positionally regulable gene repression. Here, we report the utility of the “CRISPRi-Seq” method for targeted, pooled essentiality screening, confirming strong overlap with Tn-Seq datasets. In addition, we exploit this high-throughput approach to provide insight into CRISPRi functionality. By interrogating polar effects and combining image-based phenotyping with CRISPRi-mediated depletion of selected essential genes, we demonstrate that CRISPRi-Seq can functionally validate Transcriptional Units within operons. Together, these observations suggest the utility of CRISPRi-Seq to provide insights into (myco)bacterial gene regulation and expression on a genome-wide scale.

ATAC-STARR-seq v2

10.17504/protocols.io.b2nuqdew ◽

2021 ◽

Author(s):

Tyler Hansen ◽

Emily Hodges

Keyword(s):

Regulatory Region ◽

Regulatory Elements ◽

Chromatin Accessibility ◽

High Signal ◽

Regulatory Pathways ◽

Information Accessibility ◽

Genome Wide ◽

A Genome ◽

Wide Scale ◽

Accessible Chromatin

Massively parallel reporter assays test the capacity of putative cis-regulatory elements (CREs) to drive transcription on a genome-wide scale. In nearly all cases, chromatin accessibility is necessary to drive activity, so most CREs are inactive due to chromatin context rather than intrinsic DNA sequence properties. Here, we combined assay for transposase-accessible chromatin (ATAC-seq) with self-transcribing active regulatory region sequencing (STARR-seq) to selectively assay the regulatory potential of nucleosome-free DNA genome-wide. Our approach enabled high-resolution testing of ~50 million unique DNA fragments tiling ~101,000 accessible chromatin regions in human lymphoblastoid cells. To illustrate the application of our approach, we show that 30% of all accessible regions contain an activator, a silencer or both. Benchmarking against standard ATAC-seq, our approach faithfully captures chromatin accessibility and transcription factor (TF) footprints with high signal-to-noise. Integrating three layers of genomic information (accessibility, TF occupancy, and activity) provided by ATAC-STARR-seq, we stratified active and silent CREs by the presence of several TF footprints and show that CREs with specific TF combinations are associated with distinct gene regulatory pathways. Altogether, these data highlight the power of ATAC-STARR-seq to comprehensively investigate the regulatory landscape of the human genome from a single DNA source.

A genome-wide map of regulatory elements in zebrafish

Lab Animal ◽

10.1038/s41684-020-00695-7 ◽

2020 ◽

Vol 50 (1) ◽

pp. 17-17

Author(s):

Alexandra Le Bras

Keyword(s):

Regulatory Elements ◽

Genome Wide ◽

A Genome

A Genome-Wide Analysis of Pathogenesis-Related Protein-1 (PR-1) Genes from Piper nigrum Reveals Its Critical Role during Phytophthora capsici Infection

Genes ◽

10.3390/genes12071007 ◽

2021 ◽

Vol 12 (7) ◽

pp. 1007

Author(s):

Divya Kattupalli ◽

Asha Sreenivasan ◽

Eppurathu Vasudevan Soniya

Keyword(s):

Phytophthora Capsici ◽

Regulatory Elements ◽

Piper Nigrum ◽

Binding Motif ◽

Altered Expression ◽

Genome Wide ◽

A Genome ◽

Pathogenesis Related Protein ◽

Pathogenesis Related

Black pepper (Piper nigrum L.) is a prominent spice that is an indispensable ingredient in cuisine and traditional medicine. Phytophthora capsici, the causative agent of footrot disease, causes a drastic constraint in P. nigrum cultivation and productivity. To counterattack various biotic and abiotic stresses, plants employ a broad array of mechanisms that includes the accumulation of pathogenesis-related (PR) proteins. Through a genome-wide survey, eleven PR-1 genes that belong to a CAP superfamily protein with a caveolin-binding motif (CBM) and a CAP-derived peptide (CAPE) were identified from P. nigrum. Despite the critical functional domains, PnPR-1 homologs differ in their signal peptide motifs and core amino acid composition in the functional protein domains. The conserved motifs of PnPR-1 proteins were identified using MEME. Most of the PnPR-1 proteins were basic in nature. Secondary and 3D structure analyses of the PnPR-1 proteins were also predicted, which may be linked to a functional role in P. nigrum. The GO and KEGG functional annotations predicted their function in the defense responses of plant-pathogen interactions. Furthermore, a transcriptome-assisted FPKM analysis revealed PnPR-1 genes mapped to the P. nigrum-P. capsici interaction pathway. An altered expression pattern was detected for PnPR-1 transcripts among which a significant upregulation was noted for basic PnPR-1 genes such as CL10113.C1 and Unigene17664. The drastic variation in the transcript levels of CL10113.C1 was further validated through qRT-PCR and it showed a significant upregulation in infected leaf samples compared with the control. A subsequent analysis revealed the structural details, phylogenetic relationships, conserved sequence motifs and critical cis-regulatory elements of PnPR-1 genes. This is the first genome-wide study that identified the role of PR-1 genes during P. nigrum-P. capsici interactions. The detailed in silico experimental analysis revealed the vital role of PnPR-1 genes in regulating the first layer of defense towards a P. capsici infection in Panniyur-1 plants.

F-Box Genes in the Wheat Genome and Expression Profiling in Wheat at Different Developmental Stages

Genes ◽

10.3390/genes11101154 ◽

2020 ◽

Vol 11 (10) ◽

pp. 1154

Author(s):

Min Jeong Hong ◽

Jin-Baek Kim ◽

Yong Weon Seo ◽

Dae Yeon Kim

Keyword(s):

Developmental Stages ◽

Brachypodium Distachyon ◽

Triticum Aestivum L ◽

Wheat Genome ◽

Post Translational Modification ◽

Genome Wide ◽

A Genome ◽

High Sequence Homology ◽

Wide Scale

Genes of the F-box family play specific roles in protein degradation by post-translational modification in several biological processes, including flowering, the regulation of circadian rhythms, photomorphogenesis, seed development, leaf senescence, and hormone signaling. F-box genes have not been previously investigated on a genome-wide scale; however, the establishment of the wheat (Triticum aestivum L.) reference genome sequence enabled a genome-based examination of the F-box genes to be conducted in the present study. In total, 1796 F-box genes were detected in the wheat genome and classified into various subgroups based on their functional C-terminal domain. The F-box genes were distributed among 21 chromosomes and most showed high sequence homology with F-box genes located on the homoeologous chromosomes because of allohexaploidy in the wheat genome. Additionally, a synteny analysis of wheat F-box genes was conducted in rice and Brachypodium distachyon. Transcriptome analysis during various wheat developmental stages and expression analysis by quantitative real-time PCR revealed that some F-box genes were specifically expressed in the vegetative and/or seed developmental stages. A genome-based examination and classification of F-box genes provide an opportunity to elucidate the biological functions of F-box genes in wheat.