sequence motif Latest Research Papers

Somatic mutations in cancer genes have been ubiquitously detected in clonal expansions across healthy human tissue, including in clonal hematopoiesis. However, mutated and wildtype cells are morphologically and phenotypically similar, limiting the ability to link genotypes with cellular phenotypes. To overcome this limitation, we leveraged multi-modality single-cell sequencing, capturing the mutation with transcriptomes and methylomes in stem and progenitors from individuals with DNMT3A R882 mutated clonal hematopoiesis. DNMT3A mutations resulted in myeloid over lymphoid bias, and in expansion of immature myeloid progenitors primed toward megakaryocytic-erythroid fate. We observed dysregulated expression of lineage and leukemia stem cell markers. DNMT3A R882 led to preferential hypomethylation of polycomb repressive complex 2 targets and a specific sequence motif. Notably, the hypomethylation motif is enriched in binding motifs of key hematopoietic transcription factors, serving as a potential mechanistic link between DNMT3A R882 mutations and aberrant transcriptional phenotypes. Thus, single-cell multi-omics pave the road to defining the downstream consequences of mutations that drive human clonal mosaicism.

Download Full-text

A framework for mutational signature analysis based on DNA shape parameters

PLoS ONE ◽

10.1371/journal.pone.0262495 ◽

2022 ◽

Vol 17 (1) ◽

pp. e0262495

Author(s):

Aleksandra Karolak ◽

Jurica Levatić ◽

Fran Supek

Keyword(s):

Dna Repair ◽

Base Pair ◽

Mutation Frequency ◽

Structural Features ◽

Sequence Motif ◽

Signature Analysis ◽

Mutational Signatures ◽

Radiation Chemical ◽

Mutational Signature ◽

Dna Shape

The mutation risk of a DNA locus depends on its oligonucleotide context. In turn, mutability of oligonucleotides varies across individuals, due to exposure to mutagenic agents or due to variable efficiency and/or accuracy of DNA repair. Such variability is captured by mutational signatures, a mathematical construct obtained by a deconvolution of mutation frequency spectra across individuals. There is a need to enhance methods for inferring mutational signatures to make better use of sparse mutation data (e.g., resulting from exome sequencing of cancers), to facilitate insight into underlying biological mechanisms, and to provide more accurate mutation rate baselines for inferring positive and negative selection. We propose a conceptualization of mutational signatures that represents oligonucleotides via descriptors of DNA conformation: base pair, base pair step, and minor groove width parameters. We demonstrate how such DNA structural parameters can accurately predict mutation occurrence due to DNA repair failures or due to exposure to diverse mutagens such as radiation, chemical exposure, and the APOBEC cytosine deaminase enzymes. Furthermore, the mutation frequency of DNA oligomers classed by structural features can accurately capture systematic variability in mutagenesis of >1,000 tumors originating from diverse human tissues. A nonnegative matrix factorization was applied to mutation spectra stratified by DNA structural features, thereby extracting novel mutational signatures. Moreover, many of the known trinucleotide signatures were associated with an additional spectrum in the DNA structural descriptor space, which may aid interpretation and provide mechanistic insight. Overall, we suggest that the power of DNA sequence motif-based mutational signature analysis can be enhanced by drawing on DNA shape features.

Download Full-text

Frequent co-regulation of splicing and polyadenylation by RNA-binding proteins inferred with MAPP

10.1101/2022.01.09.475576 ◽

2022 ◽

Author(s):

Maciej Bak ◽

Erik van Nimwegen ◽

Ralf Schmidt ◽

Mihaela Zavolan ◽

Andreas J Gruber

Keyword(s):

Binding Proteins ◽

Rna Binding ◽

Rna Binding Proteins ◽

Cell Types ◽

Mrna Processing ◽

Sequence Motif ◽

Rna Seq ◽

Global Regulators ◽

Polypyrimidine Tract Binding Protein ◽

A Site

Maturation of eukaryotic pre-mRNAs via splicing, 3' end cleavage and polyadenylation is modulated across cell types and conditions by a variety of RNA-binding proteins (RBPs). Although over 1'500 proteins are associated with RNAs in human cells, their binding motifs, targets and functions still remain to be elucidated, especially in the complex environment of human tissues and in the context of diseases. To overcome the lack of methods for systematic and automated detection of sequence motif-guided changes in pre-mRNA processing based on RNA sequencing (RNA-seq) data we have developed MAPP (Motif Activity on Pre-mRNA Processing). We demonstrate MAPP's functionality by applying it to RNA-seq data from 284 RBP knock-down experiments in the ENCODE project, from which MAPP not only infers position-dependent impact profiles of known regulators, but also reveals RBPs that modulate both the inclusion of cassette exons and the poly(A) site choice. Among these, the Polypyrimidine Tract Binding Protein 1 (PTBP1) has a similar activity in glioblastoma samples. This highlights the ability of MAPP to unveil global regulators of mRNA processing under physiological and pathological conditions.

Download Full-text

Transcriptome-wide analysis of microRNA-mRNA correlations in unperturbed tissue transcriptomes identifies microRNA targeting determinants.

10.1101/2021.12.22.473932 ◽

2021 ◽

Author(s):

Juan Manuel Trinidad ◽

Rafael Sebastian Fort ◽

Guillermo Trinidad ◽

Beatriz Garat ◽

Maria A Duhagon

Keyword(s):

Gene Expression ◽

Small Rna ◽

Current Knowledge ◽

Gc Content ◽

Sequence Motif ◽

Z Score ◽

Rna Seq ◽

Microrna Target ◽

Candidate Sequence ◽

Interaction Sites

MicroRNAs are small RNAs that regulate gene expression through complementary base pairing with their target mRNAs. Given the small size of the pairing region and the large number of mRNAs that each microRNA can control, the identification of biologically relevant targets is difficult. Since current knowledge of target recognition and repression has mainly relied on in vitro studies, we sought to determine if the interrogation of gene expression data of unperturbed tissues could yield new insight into these processes. The transcriptome-wide repression at the microRNA-mRNA canonical interaction sites (seed and 3'-supplementary region, identified by sole base complementarity) was calculated as a normalized Spearman correlation (Z-score) between the abundance of the transcripts in the PRAD-TCGA tissues (RNA-seq and small RNA-seq data of 546 samples). Using the repression values obtained we confirmed established properties or microRNA targeting efficacy, such as the preference for gene regions (3'UTR>CDS>5'UTR), the proportionality between repression and seed length (6mer<7mer<8mer) and the contribution to the repression exerted by the supplementary pairing at 13-16nt of the microRNA. Our results suggest that the 7mer-m8 seed could be more repressive than the 7mer-A1, while they have similar efficacy when they interact using the 3'-supplementary pairing. Strikingly, the 6mer+suppl sites yielded normalized Z-score of repression similar to the sole 7mer-m8 or 7mer-A1 seeds, which raise awareness of its potential biological relevance. We then used the approach to further characterize the 3'-supplementary pairing, using 39 microRNAs that hold repressive 3'-supplementary interactions. The analysis of the bridge between seed and 3'-supplementary pairing site confirmed the optimum +1 offset previously evidenced, but higher offsets appear to hold similar repressive strength. In addition, they show a low GC content at position 13-16, and base preferences that allow the selection of a candidate sequence motif. Overall, our study demonstrates that transcriptome-wide analysis of microRNA-mRNA correlations in large, matched RNA-seq and small-RNA-seq data has the power to uncover hints of microRNA targeting determinants operating in the in vivo unperturbed set. Finally, we made available a bioinformatic tool to analyze microRNA-target mRNA interactions using our approach.

Download Full-text

Extension of Mitogenome Enrichment Based on Single Long-Range PCR: mtDNAs and Putative Mitochondrial-Derived Peptides of Five Rodent Hibernators

Frontiers in Genetics ◽

10.3389/fgene.2021.685806 ◽

2021 ◽

Vol 12 ◽

Author(s):

Sarah V. Emser ◽

Helmut Schaschl ◽

Eva Millesi ◽

Ralf Steinborn

Keyword(s):

Long Range ◽

Mammalian Species ◽

Consensus Sequence ◽

Ground Squirrels ◽

Sequence Motif ◽

Diverse Range ◽

Oxphos System ◽

Future Experimentation ◽

Long Range Pcr ◽

Alpine Marmot

Enriching mitochondrial DNA (mtDNA) for sequencing entire mitochondrial genomes (mitogenomes) can be achieved by single long-range PCR. This avoids interference from the omnipresent nuclear mtDNA sequences (NUMTs). The approach is currently restricted to the use of samples collected from humans and ray-finned fishes. Here, we extended the use of single long-range PCR by introducing back-to-back oligonucleotides that target a sequence of extraordinary homology across vertebrates. The assay was applied to five hibernating rodents, namely alpine marmot, Arctic and European ground squirrels, and common and garden dormice, four of which have not been fully sequenced before. Analysis of the novel mitogenomes focussed on the prediction of mitochondrial-derived peptides (MDPs) providing another level of information encoded by mtDNA. The comparison of MOTS-c, SHLP4 and SHLP6 sequences across vertebrate species identified segments of high homology that argue for future experimentation. In addition, we evaluated four candidate polymorphisms replacing an amino acid in mitochondrially encoded subunits of the oxidative phosphorylation (OXPHOS) system that were reported in relation to cold-adaptation. No obvious pattern was found for the diverse sets of mammalian species that either apply daily or multiday torpor or otherwise cope with cold. In summary, our single long-range PCR assay applying a pair of back-to-back primers that target a consensus sequence motif of Vertebrata has potential to amplify (intact) mitochondrial rings present in templates from a taxonomically diverse range of vertebrates. It could be promising for studying novel mitogenomes, mitotypes of a population and mitochondrial heteroplasmy in a sensitive, straightforward and flexible manner.

Download Full-text

Global proteome response of human cancer cell lines to low dose eIF4E/eIF4G inhibition

10.26686/wgtn.17142047 ◽

2021 ◽

Author(s):

◽

Rory Nicholas Besaans

Keyword(s):

Cell Lines ◽

Translation Initiation ◽

Muscle Wasting ◽

Scaffolding Protein ◽

Initiation Factor ◽

Chronic Obstructive ◽

Sequence Motif ◽

Initiation Complex ◽

Co Morbidity ◽

Pateamine A

<p>Cachexia is a debilitating muscle wasting disease and co-morbidity strongly associated with chronic inflammatory conditions such as cancer, chronic heart failure, chronic obstructive pulmonary disease and sepsis. Cachexia has a strong negative impact on quality of life and research suggests that 20% of cancer patients will die of cachexia. Translation initiation is the most highly regulated step of protein synthesis and the eukaryotic initiation factor 4F (eIF4F) translation initiation complex is the gatekeeper of this process; the eIF4F complex is composed of eIFG, a scaffolding protein, eIF4E, an mRNA cap-recognition protein and eIF4A, an RNA helicase. Inhibition of eIF4A by pateamine A has been shown to rescue muscle wasting in vitro and in vivo, this result has been reproduced with other eIF4A inhibitors. Pateamine A is a sponge-derived natural product with nanomolar toxicity to cancer cells. Surprisingly, at doses well below its anti-neoplastic activity it exerts distinct effects on cachexia. The research in this thesis follows on from previous work in our laboratory with pateamine A in human cell lines. Work on the effects of pateamine A on the proteome suggests that not all the proteins changing in expression are explainable by stressing the translation initiation complex. A model by which motifs in the 5’ UTRs of transcripts are a recognised and removed from the system in a selective manner could help explain these effects. We aimed to target eIF4E, another component of the eIF4F system, with two compounds to see if a comparable dose of eIF4E inhibitors could elicit a pateamine-like response. DMSO, a solvent used extensively in this thesis, had unexpected effects on translation. We conclude that 4E1RCat, a compound developed as a selective inhibitor of eIF4E, is not likely to be useable in further work, due to its window of activity coinciding with an unacceptable concentration of DMSO. Ribavirin, our second compound, showed a proteomic response consistent with its classification as an eIF4E translation initiation inhibitor. The proteome response seen with our eIF4E inhibitors is consistent with disruption of translation initiation. However, the data for 4E1RCat was deemed untrustworthy in the wake of revelations that DMSO, the vehicle in which it is dissolved, exerts an almost identical response. From the results obtained, it was not possible to confidently test whether protein downregulation occurred in response to a 5’UTR sequence motif, as seen for inhibitors of eIF4A. Coupled with the uncertainty associated with the 4E1Rcat results, there were relatively few downregulated proteins from the treatments, and many of these could be explained by the direct biological response to the function of the compound in the treatment. All in all, we have obtained new insights into the effects of DMSO on the proteome which will aid further experimentation. This thesis has laid the groundwork for further investigation of the effects of eIF4F inhibition in the context of better understanding the remediation of cachexia through the eIF4F system.</p>

Download Full-text

Global proteome response of human cancer cell lines to low dose eIF4E/eIF4G inhibition

10.26686/wgtn.17142047.v1 ◽

2021 ◽

Author(s):

◽

Rory Nicholas Besaans

Keyword(s):

Cell Lines ◽

Translation Initiation ◽

Muscle Wasting ◽

Scaffolding Protein ◽

Initiation Factor ◽

Chronic Obstructive ◽

Sequence Motif ◽

Initiation Complex ◽

Co Morbidity ◽

Pateamine A

<p>Cachexia is a debilitating muscle wasting disease and co-morbidity strongly associated with chronic inflammatory conditions such as cancer, chronic heart failure, chronic obstructive pulmonary disease and sepsis. Cachexia has a strong negative impact on quality of life and research suggests that 20% of cancer patients will die of cachexia. Translation initiation is the most highly regulated step of protein synthesis and the eukaryotic initiation factor 4F (eIF4F) translation initiation complex is the gatekeeper of this process; the eIF4F complex is composed of eIFG, a scaffolding protein, eIF4E, an mRNA cap-recognition protein and eIF4A, an RNA helicase. Inhibition of eIF4A by pateamine A has been shown to rescue muscle wasting in vitro and in vivo, this result has been reproduced with other eIF4A inhibitors. Pateamine A is a sponge-derived natural product with nanomolar toxicity to cancer cells. Surprisingly, at doses well below its anti-neoplastic activity it exerts distinct effects on cachexia. The research in this thesis follows on from previous work in our laboratory with pateamine A in human cell lines. Work on the effects of pateamine A on the proteome suggests that not all the proteins changing in expression are explainable by stressing the translation initiation complex. A model by which motifs in the 5’ UTRs of transcripts are a recognised and removed from the system in a selective manner could help explain these effects. We aimed to target eIF4E, another component of the eIF4F system, with two compounds to see if a comparable dose of eIF4E inhibitors could elicit a pateamine-like response. DMSO, a solvent used extensively in this thesis, had unexpected effects on translation. We conclude that 4E1RCat, a compound developed as a selective inhibitor of eIF4E, is not likely to be useable in further work, due to its window of activity coinciding with an unacceptable concentration of DMSO. Ribavirin, our second compound, showed a proteomic response consistent with its classification as an eIF4E translation initiation inhibitor. The proteome response seen with our eIF4E inhibitors is consistent with disruption of translation initiation. However, the data for 4E1RCat was deemed untrustworthy in the wake of revelations that DMSO, the vehicle in which it is dissolved, exerts an almost identical response. From the results obtained, it was not possible to confidently test whether protein downregulation occurred in response to a 5’UTR sequence motif, as seen for inhibitors of eIF4A. Coupled with the uncertainty associated with the 4E1Rcat results, there were relatively few downregulated proteins from the treatments, and many of these could be explained by the direct biological response to the function of the compound in the treatment. All in all, we have obtained new insights into the effects of DMSO on the proteome which will aid further experimentation. This thesis has laid the groundwork for further investigation of the effects of eIF4F inhibition in the context of better understanding the remediation of cachexia through the eIF4F system.</p>

Download Full-text

Cell free extrachromosomal circular DNA is common in human urine

10.1101/2021.12.02.471038 ◽

2021 ◽

Author(s):

Wei Lv ◽

Xiaoguang Pan ◽

Peng Han ◽

Ziyu Wang ◽

Hao Yuan ◽

...

Keyword(s):

Cpg Islands ◽

Gc Content ◽

Disease Diagnosis ◽

Sequence Motif ◽

Sequencing Analysis ◽

Direct Repeats ◽

Genomic Distribution ◽

Circular Dna ◽

Comprehensive Characterization

AbstractCell free extrachromosomal circular DNA (eccDNA) is evolving as a potential biomarker in liquid biopsies for disease diagnosis. In this study, an optimized next generation sequencing-based Circle-Seq method was developed to investigate urinary cell free eccDNA (ucf-eccDNA) from 28 adult healthy volunteers (mean age = 28, 19 males/ 9 females). The genomic distributions and sequence compositions of ucf-eccDNAs were comprehensively characterized. Approximately 1.2 million unique ucf-eccDNAs are identified, covering 14.9% of the human genome. Comprehensive characterization of ucf-eccDNAs show that ucf-eccDNAs contain higher GC content than flanking genomic regions. Most eccDNAs are less than 1000 bp and present four pronounced peaks at 203, 361, 550 and 728 bp, indicating the association between eccDNAs and the numbers of intact nucleosomes. Analysis of genomic distribution of ucf-eccDNAs show that eccDNAs are found in all chromosomes but enriched in chromosomes i.e. chr.17, 19 and 20 with high density of protein-codding genes, CpG islands, SINE and simple repeat elements. Lastly, analysis of sequence motif signatures at eccDNA junction sites reveal that direct repeats (DRs) are commonly found, indicating a potential role of DRs in eccDNA biogenesis. This work underscores the deep sequencing analysis of ucf-eccDNAs and provides a valuable reference resource for exploring potential applications of ucf-eccDNA as diagnostic biomarkers of urogenital disorders in the future.Significance StatementExtrachromosomal circular DNA (eccDNA) is an important genetic element and a biomarker for disease diagnosis and treatment. In this study, we conduct a comprehensive characterization of urinary cell free eccDNA (ucf-eccDNA) in 28 heathy subjects. Over one million ucf-eccDNAs are identified. Ucf-eccDNAs are characterized as high GC content. The size of most ucf-eccDNAs is less than 1000 bp and enriched in four peaks resembling the size of single, double, triple, and quadruple nucleosomes. The genomic distribution of ucf-eccDNAs is enriched in generic regions, protein-coding genes, Alu, CpG islands, SINE and simple repeats. Sequence motif analysis of ucf-eccDNA junctions identified simple direct repeats (DRs) commonly presented in most eccDNAs, suggesting potential roles of DRs in eccDNA biogenesis.

Download Full-text

Molecular Characterization of HOXA2 and HOXA3 Binding Properties

Journal of Developmental Biology ◽

10.3390/jdb9040055 ◽

2021 ◽

Vol 9 (4) ◽

pp. 55

Author(s):

Joshua Mallen ◽

Manisha Kalsan ◽

Peyman Zarrineh ◽

Laure Bridoux ◽

Shandar Ahmad ◽

...

Keyword(s):

Transcription Factors ◽

Molecular Characterization ◽

Sequence Motif ◽

Body Parts ◽

Binding Affinities ◽

Binding Properties ◽

Functional Consequences

The highly conserved HOX homeodomain (HD) transcription factors (TFs) establish the identity of different body parts along the antero–posterior axis of bilaterian animals. Segment diversification and the morphogenesis of different structures is achieved by generating precise patterns of HOX expression along the antero–posterior axis and by the ability of different HOX TFs to instruct unique and specific transcriptional programs. However, HOX binding properties in vitro, characterised by the recognition of similar AT-rich binding sequences, do not account for the ability of different HOX to instruct segment-specific transcriptional programs. To address this problem, we previously compared HOXA2 and HOXA3 binding in vivo. Here, we explore if sequence motif enrichments observed in vivo are explained by binding affinities in vitro. Unexpectedly, we found that the highest enriched motif in HOXA2 peaks was not recognised by HOXA2 in vitro, highlighting the importance of investigating HOX binding in its physiological context. We also report the ability of HOXA2 and HOXA3 to heterodimerise, which may have functional consequences for the HOX patterning function in vivo.

Download Full-text

Constructing gene regulatory networks using epigenetic data

npj Systems Biology and Applications ◽

10.1038/s41540-021-00208-3 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Abhijeet Rajendra Sonawane ◽

Dawn L. DeMeo ◽

John Quackenbush ◽

Kimberly Glass

Keyword(s):

Transcription Factor ◽

Gene Regulatory Networks ◽

Regulatory Network ◽

Message Passing ◽

Regulatory Networks ◽

Transcription Factor Binding ◽

Network Reconstruction ◽

Sequence Motif ◽

Factor Binding ◽

Gene Regulatory

AbstractThe biological processes that drive cellular function can be represented by a complex network of interactions between regulators (transcription factors) and their targets (genes). A cell’s epigenetic state plays an important role in mediating these interactions, primarily by influencing chromatin accessibility. However, how to effectively use epigenetic data when constructing a gene regulatory network remains an open question. Almost all existing network reconstruction approaches focus on estimating transcription factor to gene connections using transcriptomic data. In contrast, computational approaches for analyzing epigenetic data generally focus on improving transcription factor binding site predictions rather than deducing regulatory network relationships. We bridged this gap by developing SPIDER, a network reconstruction approach that incorporates epigenetic data into a message-passing framework to estimate gene regulatory networks. We validated SPIDER’s predictions using ChIP-seq data from ENCODE and found that SPIDER networks are both highly accurate and include cell-line-specific regulatory interactions. Notably, SPIDER can recover ChIP-seq verified transcription factor binding events in the regulatory regions of genes that do not have a corresponding sequence motif. The networks estimated by SPIDER have the potential to identify novel hypotheses that will allow us to better characterize cell-type and phenotype specific regulatory mechanisms.

Download Full-text

sequence motif
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Single-cell multi-omics of human clonal hematopoiesis reveals that DNMT3A R882 mutations perturb early progenitor states through selective hypomethylation

A framework for mutational signature analysis based on DNA shape parameters

Frequent co-regulation of splicing and polyadenylation by RNA-binding proteins inferred with MAPP

Transcriptome-wide analysis of microRNA-mRNA correlations in unperturbed tissue transcriptomes identifies microRNA targeting determinants.

Extension of Mitogenome Enrichment Based on Single Long-Range PCR: mtDNAs and Putative Mitochondrial-Derived Peptides of Five Rodent Hibernators

Global proteome response of human cancer cell lines to low dose eIF4E/eIF4G inhibition

Global proteome response of human cancer cell lines to low dose eIF4E/eIF4G inhibition

Cell free extrachromosomal circular DNA is common in human urine

Molecular Characterization of HOXA2 and HOXA3 Binding Properties

Constructing gene regulatory networks using epigenetic data

Export Citation Format

sequence motifRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Single-cell multi-omics of human clonal hematopoiesis reveals that DNMT3A R882 mutations perturb early progenitor states through selective hypomethylation

A framework for mutational signature analysis based on DNA shape parameters

Frequent co-regulation of splicing and polyadenylation by RNA-binding proteins inferred with MAPP

Transcriptome-wide analysis of microRNA-mRNA correlations in unperturbed tissue transcriptomes identifies microRNA targeting determinants.

Extension of Mitogenome Enrichment Based on Single Long-Range PCR: mtDNAs and Putative Mitochondrial-Derived Peptides of Five Rodent Hibernators

Global proteome response of human cancer cell lines to low dose eIF4E/eIF4G inhibition

Global proteome response of human cancer cell lines to low dose eIF4E/eIF4G inhibition

Cell free extrachromosomal circular DNA is common in human urine

Molecular Characterization of HOXA2 and HOXA3 Binding Properties

Constructing gene regulatory networks using epigenetic data

sequence motif
Recently Published Documents