HeT-A, a transposable element specifically involved in "healing" broken chromosome ends in Drosophila melanogaster.

Eight terminally deleted Drosophila melanogaster chromosomes have now been found to be "healed." In each case, the healed chromosome end had acquired sequence from the HeT DNA family, a complex family of repeated sequences found only in telomeric and pericentric heterochromatin. The sequences were apparently added by transposition events involving no sequence homology. We now report that the sequences transposed in healing these chromosomes identify a novel transposable element, HeT-A, which makes up a subset of the HeT DNA family. Addition of HeT-A elements to broken chromosome ends appears to be polar. The proximal junction between each element and the broken chromosome end is an oligo(A) tract beginning 54 nucleotides downstream from a conserved AATAAA sequence on the strand running 5' to 3' from the chromosome end. The distal (telomeric) ends of HeT-A elements are variably truncated; however, we have not yet been able to determine the extreme distal sequence of a complete element. Our analysis covers approximately 2,600 nucleotides of the HeT-A element, beginning with the oligo(A) tract at one end. Sequence homology is strong (greater than 75% between all elements studied). Sequence may be conserved for DNA structure rather than for protein coding; even the most recently transposed HeT-A elements lack significant open reading frames in the region studied. Instead, the elements exhibit conserved short-range sequence repeats and periodic long-range variation in base composition. These conserved features suggest that HeT-A elements, although transposable elements, may have a structural role in telomere organization or maintenance.

Download Full-text

Long-read cDNA sequencing identifies functional pseudogenes in the human transcriptome

Genome Biology ◽

10.1186/s13059-021-02369-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Robin-Lee Troskie ◽

Yohaann Jafrani ◽

Tim R. Mercer ◽

Adam D. Ewing ◽

Geoffrey J. Faulkner ◽

...

Keyword(s):

Cultured Cells ◽

Open Reading Frames ◽

Cdna Sequencing ◽

Protein Coding ◽

Dynamic Component ◽

Gene Copies ◽

Long Read ◽

Normal Human ◽

Reading Frames ◽

Transcriptional Landscape

AbstractPseudogenes are gene copies presumed to mainly be functionless relics of evolution due to acquired deleterious mutations or transcriptional silencing. Using deep full-length PacBio cDNA sequencing of normal human tissues and cancer cell lines, we identify here hundreds of novel transcribed pseudogenes expressed in tissue-specific patterns. Some pseudogene transcripts have intact open reading frames and are translated in cultured cells, representing unannotated protein-coding genes. To assess the biological impact of noncoding pseudogenes, we CRISPR-Cas9 delete the nucleus-enriched pseudogene PDCL3P4 and observe hundreds of perturbed genes. This study highlights pseudogenes as a complex and dynamic component of the human transcriptional landscape.

Download Full-text

Heterochromatic Stellate Gene Cluster in Drosophila melanogaster: Structure and Molecular Evolution

Genetics ◽

10.1093/genetics/146.1.253 ◽

1997 ◽

Vol 146 (1) ◽

pp. 253-262 ◽

Cited By ~ 2

Author(s):

Alexei V Tulin ◽

Galina L Kogan ◽

Dominik Filipp ◽

Maria D Balakireva ◽

Vladimir A Gvozdev

Keyword(s):

Drosophila Melanogaster ◽

Molecular Evolution ◽

Protein Kinase Ck2 ◽

Unknown Origin ◽

Open Reading Frames ◽

Β Subunit ◽

Retrotransposon Insertion ◽

High Level ◽

Kinase Ck2 ◽

Reading Frames

The 30-kb cluster comprising close to 20 copies of tandemly repeated Stellate genes was localized in the distal heterochromatin of the X chromosome. Of 10 sequenced genes, nine contain undamaged open reading frames with extensive similarity to protein kinase CK2 β-subunit; one gene is interrupted by an insertion. The heterochromatic array of Stellate repeats is divided into three regions by a 4.5-kb DNA segment of unknown origin and a retrotransposon insertion: the A region (∼14 Stellate genes), the adjacent B region (approximately three Stellate genes), and the C region (about four Stellate genes). The sequencing of Stellate copies located along the discontinuous cluster revealed a complex pattern of diversification. The lowest level of divergence was detected in nearby Stellate repeats. The marginal copies of the A region, truncated or interrupted by an insertion, escaped homogenization and demonstrated high levels of divergence. Comparison of copies in the B and C regions, which are separated by a retrotransposon insertion, revealed a high level of diversification. These observations suggest that homogenization takes place in the Stellate cluster, but that inserted sequences may impede this process.

Download Full-text

Disrupting upstream translation in mRNAs is associated with human disease

Nature Communications ◽

10.1038/s41467-021-21812-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

David S. M. Lee ◽

Joseph Park ◽

Andrew Kromer ◽

Aris Baras ◽

Daniel J. Rader ◽

...

Keyword(s):

Protein Expression ◽

Biological Significance ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Protein Coding ◽

Stop Codons ◽

Human Genes ◽

Strong Negative Selection ◽

Disease Associations ◽

Reading Frames

AbstractRibosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.

Download Full-text

A repetitive DNA element, associated with telomeric sequences in Drosophila melanogaster, contains open reading frames

Chromosoma ◽

10.1007/bf00352288 ◽

1992 ◽

Vol 102 (1) ◽

pp. 32-40 ◽

Cited By ~ 26

Author(s):

Olga N. Danilevskaya ◽

Dmitri A. Petrov ◽

Maria N. Pavlova ◽

Akihiko Koga ◽

Elena V. Kurenova ◽

...

Keyword(s):

Drosophila Melanogaster ◽

Repetitive Dna ◽

Open Reading Frames ◽

Telomeric Sequences ◽

Reading Frames

Download Full-text

Overexpression-based detection of translatable circular RNAs is vulnerable to coexistent linear RNA byproducts

10.1101/2021.03.23.433163 ◽

2021 ◽

Author(s):

Yanyi Jiang ◽

Xiaofan Chen ◽

Wei Zhang

Keyword(s):

Open Reading Frames ◽

Systematic Evaluation ◽

Circular Rnas ◽

Protein Coding ◽

Rolling Circle ◽

Functional Investigation ◽

Overexpression System ◽

Translation Signals ◽

Coding Potential ◽

Reading Frames

AbstractIn RNA field, the demarcation between coding and non-coding has been negotiated by the recent discovery of occasionally translated circular RNAs (circRNAs). Although absent of 5’ cap structure, circRNAs can be translated cap-independently. Complementary intron-mediated overexpression is one of the most utilized methodologies for circRNA research but not without bearing echoing skepticism for its poorly defined mechanism and latent coexistent side products. In this study, leveraging such circRNA overexpression system, we have interrogated the protein-coding potential of 30 human circRNAs containing infinite open reading frames in HEK293T cells. Surprisingly, pervasive translation signals are detected by immunoblotting. However, intensive mutagenesis reveals that numerous translation signals are generated independently of circRNA synthesis. We have developed a dual tag strategy to isolate translation noise and directly demonstrate that the fallacious translation signals originate from cryptically spliced linear transcripts. The concomitant linear RNA byproducts, presumably concatemers, can be translated to allow pseudo rolling circle translation signals, and can involve backsplicing junction (BSJ) to disqualify the BSJ-based evidence for circRNA translation. We also find non-AUG start codons may engage in the translation initiation of circRNAs. Taken together, our systematic evaluation sheds light on heterogeneous translational outputs from circRNA overexpression vector and comes with a caveat that ectopic overexpression technique necessitates extremely rigorous control setup in circRNA translation and functional investigation.

Download Full-text

A micropeptide encoded by lncRNA MIR155HG suppresses autoimmune inflammation via modulating antigen presentation

Science Advances ◽

10.1126/sciadv.aaz2059 ◽

2020 ◽

Vol 6 (21) ◽

pp. eaaz2059 ◽

Cited By ~ 4

Author(s):

Liman Niu ◽

Fangzhou Lou ◽

Yang Sun ◽

Libo Sun ◽

Xiaojie Cai ◽

...

Keyword(s):

Antigen Presentation ◽

Inflammatory Diseases ◽

Open Reading Frames ◽

Protein Coding ◽

Histocompatibility Complex ◽

Antigen Trafficking ◽

Heat Shock Cognate Protein ◽

Antigen Presenting ◽

Cognate Protein ◽

Reading Frames

Many annotated long noncoding RNAs (lncRNAs) harbor predicted short open reading frames (sORFs), but the coding capacities of these sORFs and the functions of the resulting micropeptides remain elusive. Here, we report that human lncRNA MIR155HG encodes a 17–amino acid micropeptide, which we termed miPEP155 (P155). MIR155HG is highly expressed by inflamed antigen-presenting cells, leading to the discovery that P155 interacts with the adenosine 5′-triphosphate binding domain of heat shock cognate protein 70 (HSC70), a chaperone required for antigen trafficking and presentation in dendritic cells (DCs). P155 modulates major histocompatibility complex class II–mediated antigen presentation and T cell priming by disrupting the HSC70-HSP90 machinery. Exogenously injected P155 improves two classical mouse models of DC-driven auto inflammation. Collectively, we demonstrate the endogenous existence of a micropeptide encoded by a transcript annotated as “non-protein coding” and characterize a micropeptide as a regulator of antigen presentation and a suppressor of inflammatory diseases.

Download Full-text

When Long Noncoding Becomes Protein Coding

Molecular and Cellular Biology ◽

10.1128/mcb.00528-19 ◽

2020 ◽

Vol 40 (6) ◽

Cited By ~ 14

Author(s):

Corrine Corrina R. Hartford ◽

Ashish Lal

Keyword(s):

Cell Division ◽

Cell Signaling ◽

Transcription Regulation ◽

Noncoding Rnas ◽

Long Noncoding Rnas ◽

Open Reading Frames ◽

Protein Coding ◽

Small Proteins ◽

Coding Potential ◽

Reading Frames

ABSTRACT Recent advancements in genetic and proteomic technologies have revealed that more of the genome encodes proteins than originally thought possible. Specifically, some putative long noncoding RNAs (lncRNAs) have been misannotated as noncoding. Numerous lncRNAs have been found to contain short open reading frames (sORFs) which have been overlooked because of their small size. Many of these sORFs encode small proteins or micropeptides with fundamental biological importance. These micropeptides can aid in diverse processes, including cell division, transcription regulation, and cell signaling. Here we discuss strategies for establishing the coding potential of putative lncRNAs and describe various functions of known micropeptides.

Download Full-text

Comparative genomic analysis of novel conserved peptide upstream open reading frames in Drosophila melanogaster and other dipteran species

BMC Genomics ◽

10.1186/1471-2164-9-61 ◽

2008 ◽

Vol 9 (1) ◽

pp. 61 ◽

Cited By ~ 40

Author(s):

Celine A Hayden ◽

Giovanni Bosco

Keyword(s):

Drosophila Melanogaster ◽

Genomic Analysis ◽

Comparative Genomic Analysis ◽

Open Reading Frames ◽

Comparative Genomic ◽

Upstream Open Reading Frames ◽

Dipteran Species ◽

Reading Frames

Download Full-text

Identification of Proteins Associated with Murine Cytomegalovirus Virions

Journal of Virology ◽

10.1128/jvi.78.20.11187-11197.2004 ◽

2004 ◽

Vol 78 (20) ◽

pp. 11187-11197 ◽

Cited By ~ 105

Author(s):

Lisa M. Kattenhorn ◽

Ryan Mills ◽

Markus Wagner ◽

Alexandre Lomsadze ◽

Vsevolod Makeev ◽

...

Keyword(s):

Gene Prediction ◽

Polyacrylamide Gel Electrophoresis ◽

Sodium Dodecyl ◽

Open Reading Frames ◽

Murine Cytomegalovirus ◽

Prediction Algorithm ◽

Sequencing Analysis ◽

Protein Coding ◽

Coding Potential ◽

Reading Frames

ABSTRACT Proteins associated with the murine cytomegalovirus (MCMV) viral particle were identified by a combined approach of proteomic and genomic methods. Purified MCMV virions were dissociated by complete denaturation and subjected to either separation by sodium dodecyl sulfate-polyacrylamide gel electrophoresis and in-gel digestion or treated directly by in-solution tryptic digestion. Peptides were separated by nanoflow liquid chromatography and analyzed by tandem mass spectrometry (LC-MS/MS). The MS/MS spectra obtained were searched against a database of MCMV open reading frames (ORFs) predicted to be protein coding by an MCMV-specific version of the gene prediction algorithm GeneMarkS. We identified 38 proteins from the capsid, tegument, glycoprotein, replication, and immunomodulatory protein families, as well as 20 genes of unknown function. Observed irregularities in coding potential suggested possible sequence errors in the 3′-proximal ends of m20 and M31. These errors were experimentally confirmed by sequencing analysis. The MS data further indicated the presence of peptides derived from the unannotated ORFs ORFc225441-226898 (m166.5) and ORF105932-106072. Immunoblot experiments confirmed expression of m166.5 during viral infection.

Download Full-text