Long-read cDNA sequencing identifies functional pseudogenes in the human transcriptome

Pseudogenes are gene copies presumed to mainly be functionless relics of evolution due to acquired deleterious mutations or transcriptional silencing. When transcribed, pseudogenes may encode proteins or enact RNA-intrinsic regulatory mechanisms. However, the extent, characteristics and functional relevance of the human pseudogene transcriptome are unclear. Short-read sequencing platforms have limited power to resolve and accurately quantify pseudogene transcripts owing to the high sequence similarity of pseudogenes and their parent genes. Using deep full-length PacBio cDNA sequencing of normal human tissues and cancer cell lines, we identify here hundreds of novel transcribed pseudogenes. Pseudogene transcripts are expressed in tissue-specific patterns, exhibit complex splicing patterns and contribute to the coding sequences of known genes. We survey pseudogene transcripts encoding intact open reading frames (ORFs), representing potential unannotated protein-coding genes, and demonstrate their efficient translation in cultured cells. To assess the impact of noncoding pseudogenes on the cellular transcriptome, we delete the nucleus- enriched pseudogene PDCL3P4 transcript from HAP1 cells and observe hundreds of perturbed genes. This study highlights pseudogenes as a complex and dynamic component of the transcriptional landscape underpinning human biology and disease.

Download Full-text

Long-read cDNA sequencing identifies functional pseudogenes in the human transcriptome

Genome Biology ◽

10.1186/s13059-021-02369-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Robin-Lee Troskie ◽

Yohaann Jafrani ◽

Tim R. Mercer ◽

Adam D. Ewing ◽

Geoffrey J. Faulkner ◽

...

Keyword(s):

Cultured Cells ◽

Open Reading Frames ◽

Cdna Sequencing ◽

Protein Coding ◽

Dynamic Component ◽

Gene Copies ◽

Long Read ◽

Normal Human ◽

Reading Frames ◽

Transcriptional Landscape

AbstractPseudogenes are gene copies presumed to mainly be functionless relics of evolution due to acquired deleterious mutations or transcriptional silencing. Using deep full-length PacBio cDNA sequencing of normal human tissues and cancer cell lines, we identify here hundreds of novel transcribed pseudogenes expressed in tissue-specific patterns. Some pseudogene transcripts have intact open reading frames and are translated in cultured cells, representing unannotated protein-coding genes. To assess the biological impact of noncoding pseudogenes, we CRISPR-Cas9 delete the nucleus-enriched pseudogene PDCL3P4 and observe hundreds of perturbed genes. This study highlights pseudogenes as a complex and dynamic component of the human transcriptional landscape.

Download Full-text

Two Unrelated 8-Vinyl Reductases Ensure Production of Mature Chlorophylls in Acaryochloris marina

Journal of Bacteriology ◽

10.1128/jb.00925-15 ◽

2016 ◽

Vol 198 (9) ◽

pp. 1393-1400 ◽

Cited By ~ 8

Author(s):

Guangyu E. Chen ◽

Andrew Hitchcock ◽

Philip J. Jackson ◽

Roy R. Chaudhuri ◽

Mark J. Dickman ◽

...

Keyword(s):

Transcriptional Control ◽

Sequence Similarity ◽

Open Reading Frames ◽

High Sequence Similarity ◽

Content Type ◽

Ethyl Group ◽

Acaryochloris Marina ◽

Vinyl Group ◽

Vinyl Reductase ◽

Reading Frames

ABSTRACTThe major photopigment of the cyanobacteriumAcaryochloris marinais chlorophylld, while its direct biosynthetic precursor, chlorophylla, is also present in the cell. These pigments, along with the majority of chlorophylls utilized by oxygenic phototrophs, carry an ethyl group at the C-8 position of the molecule, having undergone reduction of a vinyl group during biosynthesis. Two unrelated classes of 8-vinyl reductase involved in the biosynthesis of chlorophylls are known to exist, BciA and BciB. The genome ofAcaryochloris marinacontains open reading frames (ORFs) encoding proteins displaying high sequence similarity to BciA or BciB, although they are annotated as genes involved in transcriptional control (nmrA) and methanogenesis (frhB), respectively. These genes were introduced into an 8-vinyl chlorophylla-producing ΔbciBstrain ofSynechocystissp. strain PCC 6803, and both were shown to restore synthesis of the pigment with an ethyl group at C-8, demonstrating their activities as 8-vinyl reductases. We propose thatnmrAandfrhBbe reassigned asbciAandbciB, respectively; transcript and proteomic analysis ofAcaryochloris marinareveal that bothbciAandbciBare expressed and their encoded proteins are present in the cell, possibly in order to ensure that all synthesized chlorophyll pigment carries an ethyl group at C-8. Potential reasons for the presence of two 8-vinyl reductases in this strain, which is unique for cyanobacteria, are discussed.IMPORTANCEThe cyanobacteriumAcaryochloris marinais the best-studied phototrophic organism that uses chlorophylldfor photosynthesis. Unique among cyanobacteria sequenced to date, its genome contains ORFs encoding two unrelated enzymes that catalyze the reduction of the C-8 vinyl group of a precursor molecule to an ethyl group. Carrying a reduced C-8 group may be of particular importance to organisms containing chlorophylld. Plant genomes also contain orthologs of both of these genes; thus, the bacterial progenitor of the chloroplast may also have contained bothbciAandbciB.

Download Full-text

The Genome of Gryllus bimaculatus Nudivirus Indicates an Ancient Diversification of Baculovirus-Related Nonoccluded Nudiviruses of Insects

Journal of Virology ◽

10.1128/jvi.02781-06 ◽

2007 ◽

Vol 81 (10) ◽

pp. 5395-5406 ◽

Cited By ~ 53

Author(s):

Yongjie Wang ◽

Regina G. Kleespies ◽

Alois M. Huger ◽

Johannes A. Jehle

Keyword(s):

Sequence Similarity ◽

Heliothis Zea ◽

Direct Repeat ◽

Sister Group ◽

Open Reading Frames ◽

Gryllus Bimaculatus ◽

Structural Genomic ◽

Protein Coding ◽

Dna Viruses ◽

Core Genes

ABSTRACT The Gryllus bimaculatus nudivirus (GbNV) infects nymphs and adults of the cricket Gryllus bimaculatus (Orthoptera: Gryllidae). GbNV and other nudiviruses such as Heliothis zea nudivirus 1 (HzNV-1) and Oryctes rhinoceros nudivirus (OrNV) were previously called “nonoccluded baculoviruses” as they share some similar structural, genomic, and replication aspects with members of the family Baculoviridae. Their relationships to each other and to baculoviruses are elucidated by the sequence of the complete genome of GbNV, which is 96,944 bp, has an AT content of 72%, and potentially contains 98 predicted protein-coding open reading frames (ORFs). Forty-one ORFs of GbNV share sequence similarities with ORFs found in OrNV, HzNV-1, baculoviruses, and bacteria. Most notably, 15 GbNV ORFs are homologous to the baculovirus core genes, which are associated with transcription (lef-8, lef-9, lef-4, vlf-1, and lef-5), replication (dnapol), structural proteins (p74, pif-1, pif-2, pif-3, vp91, and odv-e56), and proteins of unknown function (38K, ac81, and 19kda). Homologues to these baculovirus core genes have been predicted in HzNV-1 as well. Six GbNV ORFs are homologous to nonconserved baculovirus genes dnaligase, helicase 2, rr1, rr2, iap-3, and desmoplakin. However, the remaining 57 ORFs revealed no homology or poor similarities to the current gene databases. No homologous repeat (hr) sequences but fourteen short direct repeat (dr) regions were detected in the GbNV genome. Gene content and sequence similarity suggest that the nudiviruses GbNV, HzNV-1, and OrNV form a monophyletic group of nonoccluded double-stranded DNA viruses, which separated from the baculovirus lineage before this radiated into dipteran-, hymenopteran-, and lepidopteran-specific clades of occluded nucleopolyhedroviruses and granuloviruses. The accumulated information on the GbNV genome suggests that nudiviruses form a highly diverse and phylogenetically ancient sister group of the baculoviruses, which have evolved in a variety of highly divergent host orders.

Download Full-text

The SARS-CoV-2 ORF10 is not essential in vitro or in vivo in humans

10.1101/2020.08.29.257360 ◽

2020 ◽

Cited By ~ 4

Author(s):

Katarzyna Pancer ◽

Aleksandra Milewska ◽

Katarzyna Owczarek ◽

Agnieszka Dabrowska ◽

Wojciech Branicki ◽

...

Keyword(s):

Sequence Similarity ◽

Hypothetical Protein ◽

Open Reading Frames ◽

N Gene ◽

Coding Region ◽

Protein Coding ◽

Share Sequence Similarity ◽

Genome Annotations

AbstractSARS-CoV-2 genome annotation revealed the presence of 10 open reading frames (ORFs), of which the last one (ORF10) is positioned downstream the N gene. It is a hypothetical gene, which was speculated to encode a 38 aa protein. This hypothetical protein does not share sequence similarity with any other known protein and cannot be associated with a function. While the role of this ORF10 was proposed, there is a growing evidence showing that the ORF10 is not a coding region.Here, we identified SARS-CoV-2 variants in which the ORF10 gene was prematurely terminated. The disease was not attenuated, and the transmissibility between humans was not hampered. Also in vitro, the strains replicated similarly, as the related viruses with the intact ORF10. Altogether, based on clinical observation and laboratory analyses, it appears that the ORF10 protein is not essential in humans. This observation further proves that the ORF10 should not be treated as the protein-coding gene, and the genome annotations should be amended.

Download Full-text

Characterization of Novel Erwinia amylovora Jumbo Bacteriophages from Eneladusvirus Genus

Viruses ◽

10.3390/v12121373 ◽

2020 ◽

Vol 12 (12) ◽

pp. 1373

Author(s):

Sang Guen Kim ◽

Sung Bin Lee ◽

Sib Sankar Giri ◽

Hyoun Joong Kim ◽

Sang Wha Kim ◽

...

Keyword(s):

Genome Size ◽

Sequence Similarity ◽

Phylogenetic Analyses ◽

Gc Content ◽

Genomic Analysis ◽

Open Reading Frames ◽

Comparative Genomic ◽

Limited Information ◽

High Sequence Similarity ◽

A Genome

Jumbo phages, which have a genome size of more than 200 kb, have recently been reported for the first time. However, limited information is available regarding their characteristics because few jumbo phages have been isolated. Therefore, in this study, we aimed to isolate and characterize other jumbo phages. We performed comparative genomic analysis of three Erwinia phages (pEa_SNUABM_12, pEa_SNUABM_47, and pEa_SNUABM_50), each of which had a genome size of approximately 360 kb (32.5% GC content). These phages were predicted to harbor 546, 540, and 540 open reading frames with 32, 34, and 35 tRNAs, respectively. Almost all of the genes in these phages could not be functionally annotated but showed high sequence similarity with genes encoded in Serratia phage BF, a member of Eneladusvirus. The detailed comparative and phylogenetic analyses presented in this study contribute to our understanding of the diversity and evolution of Erwinia phage and the genus Eneladusvirus.

Download Full-text

Cloning, Biochemical Properties, and Distribution of Mycobacterial Haloalkane Dehalogenases

Applied and Environmental Microbiology ◽

10.1128/aem.71.11.6736-6745.2005 ◽

2005 ◽

Vol 71 (11) ◽

pp. 6736-6745 ◽

Cited By ~ 41

Author(s):

Andrea Jesenská ◽

Martina Pavlová ◽

Michal Strouhal ◽

Radka Chaloupková ◽

Iva Těšínská ◽

...

Keyword(s):

Halogen Bond ◽

Sequence Similarity ◽

Bacterial Species ◽

Maximal Activity ◽

Biochemical Properties ◽

Open Reading Frames ◽

Haloalkane Dehalogenase ◽

Ph Optimum ◽

High Sequence Similarity ◽

Pcr Screening

ABSTRACT Haloalkane dehalogenases are enzymes that catalyze the cleavage of the carbon-halogen bond by a hydrolytic mechanism. Genomes of Mycobacterium tuberculosis and M. bovis contain at least two open reading frames coding for the polypeptides showing a high sequence similarity with biochemically characterized haloalkane dehalogenases. We describe here the cloning of the haloalkane dehalogenase genes dmbA and dmbB from M. bovis 5033/66 and demonstrate the dehalogenase activity of their translation products. Both of these genes are widely distributed among species of the M. tuberculosis complex, including M. bovis, M. bovis BCG, M. africanum, M. caprae, M. microti, and M. pinnipedii, as shown by the PCR screening of 48 isolates from various hosts. DmbA and DmbB proteins were heterologously expressed in Escherichia coli and purified to homogeneity. The DmbB protein had to be expressed in a fusion with thioredoxin to obtain a soluble protein sample. The temperature optimum of DmbA and DmbB proteins determined with 1,2-dibromoethane is 45°C. The melting temperature assessed by circular dichroism spectroscopy of DmbA is 47°C and DmbB is 57°C. The pH optimum of DmbA depends on composition of a buffer with maximal activity at 9.0. DmbB had a single pH optimum at pH 6.5. Mycobacteria are currently the only genus known to carry more than one haloalkane dehalogenase gene, although putative haloalkane dehalogenases can be inferred in more then 20 different bacterial species by comparative genomics. The evolution and distribution of haloalkane dehalogenases among mycobacteria is discussed.

Download Full-text

An Alternative Succinate (2-Oxoglutarate) Transport System in Rhizobium tropici Is Induced in Nodules of Phaseolus vulgaris

Journal of Bacteriology ◽

10.1128/jb.00252-09 ◽

2009 ◽

Vol 191 (16) ◽

pp. 5057-5067 ◽

Cited By ~ 10

Author(s):

Silvia Batista ◽

Eduardo J. Patriarca ◽

Rosarita Tatè ◽

Gloria Martínez-Drets ◽

Paul R. Gill

Keyword(s):

Phaseolus Vulgaris ◽

Carbon Source ◽

Sequence Similarity ◽

Optimal Growth ◽

Open Reading Frames ◽

Uptake System ◽

Rhizobium Tropici ◽

High Sequence Similarity ◽

Encoding Gene ◽

Reading Frames

ABSTRACT The rhizobial DctA permease is essential for the development of effective nitrogen-fixing bacteroids, which was correlated with its requirement for growth on C4-dicarboxylates. A previously described dctA mutant of Rhizobium tropici CIAT899, strain GA1 (dctA), however, was unexpectedly still able to grow on succinate as a sole carbon source but less efficiently than CIAT899. Like other rhizobial dctA mutants, GA1 was unable to grow on fumarate or malate as a carbon source and induced the formation of ineffective nodules. We report an alternative succinate uptake system identified by Tn5 mutagenesis of strain GA1 that was required for the remaining ability to transport and utilize succinate. The alternative uptake system required a three-gene cluster that is highly characteristic of a dctABD locus. The predicted permease-encoding gene had high sequence similarity with open reading frames encoding putative 2-oxoglutarate permeases (KgtP) of Ralstonia solanacearum and Agrobacterium tumefaciens. This analysis was in agreement with the requirement for this gene for optimal growth on and induction by 2-oxoglutarate. The permease-encoding gene of the alternative system was also designated kgtP in R. tropici. The dctBD-like genes in this cluster were found to be required for kgtP expression and were designated kgtSR. Analysis of a kgtP::lacZ transcriptional fusion indicated that a kgtSR-dependent promoter of kgtP was specifically induced by 2-oxoglutarate. The expression of kgtPp was found in bacteroids of nodules formed with either CIAT899 or GA1 on roots of Phaseolus vulgaris. Results suggested that 2-oxoglutarate might be transported or conceivably exported in nodules induced by R. tropici on roots of P. vulgaris.

Download Full-text

The size, shape and specificity of the sugar-binding site of the jacalin-related lectins is profoundly affected by the proteolytic cleavage of the subunits

Biochemical Journal ◽

10.1042/bj20020856 ◽

2002 ◽

Vol 367 (3) ◽

pp. 817-824 ◽

Cited By ~ 21

Author(s):

Corinne HOULÈS ASTOUL ◽

Willy J. PEUMANS ◽

Els J.M. van DAMME ◽

Annick BARRE ◽

Yves BOURNE ◽

...

Keyword(s):

Binding Site ◽

Plant Pathogens ◽

Higher Plants ◽

Sequence Similarity ◽

Binding Specificity ◽

Proteolytic Cleavage ◽

Carbohydrate Binding ◽

High Sequence Similarity ◽

Intracellular Location ◽

The Impact

Mannose-specific lectins with high sequence similarity to jacalin and the Maclura pomifera agglutinin have been isolated from species belonging to the families Moraceae, Convolvulaceae, Brassicaceae, Asteraceae, Poaceae and Musaceae. Although these novel mannose-specific lectins are undoubtedly related to the galactose-specific Moraceae lectins there are several important differences. Apart from the obvious differences in specificity, the mannose- and galactose-specific jacalin-related lectins differ in what concerns their biosynthesis and processing, intracellular location and degree of oligomerization of the protomers. Taking into consideration that the mannose-specific lectins are widely distributed in higher plants, whereas their galactose-specific counterparts are confined to a subgroup of the Moraceae sp. one can reasonably assume that the galactose-specific Moraceae lectins are a small-side group of the main family. The major change that took place in the structure of the binding site of the diverging Moraceae lectins concerns a proteolytic cleavage close to the N-terminus of the protomer. To corroborate the impact of this change, the specificity of jacalin was re-investigated using surface plasmon resonance analysis. This approach revealed that in addition to galactose and N-acetylgalactosamine, the carbohydrate-binding specificity of jacalin extends to mannose, glucose, N-acetylmuramic acid and N-acetylneuraminic acid. Owing to this broad carbohydrate-binding specificity, jacalin is capable of recognizing complex glycans from plant pathogens or predators.

Download Full-text

Annotating high-impact 5’untranslated region variants with the UTRannotator

10.1101/2020.06.03.132266 ◽

2020 ◽

Cited By ~ 1

Author(s):

Xiaolei Zhang ◽

Matthew Wakeling ◽

James Ware ◽

Nicola Whiffin

Keyword(s):

Open Reading Frames ◽

Supplementary Information ◽

Untranslated Regions ◽

Protein Coding ◽

Pathogenic Variants ◽

Uncertain Significance ◽

Upstream Open Reading Frames ◽

The Impact ◽

Reading Frames

AbstractSummaryCurrent tools to annotate the predicted effect of genetic variants are heavily biased towards protein-coding sequence. Variants outside of these regions may have a large impact on protein expression and/or structure and can lead to disease, but this effect can be challenging to predict. Consequently, these variants are poorly annotated using standard tools. We have developed a plugin to the Ensembl Variant Effect Predictor, the UTRannotator, that annotates variants in 5’untranslated regions (5’UTR) that create or disrupt upstream open reading frames (uORFs). We investigate the utility of this tool using the ClinVar database, providing an annotation for 30.8% of all 5’UTR (likely) pathogenic variants, and highlighting 31 variants of uncertain significance as candidates for further follow-up. We will continue to update the UTR annotator as we gain new knowledge on the impact of variants in UTRs.Availability and implementationUTRannotator is freely available on Github: https://github.com/ImperialCardioGenetics/UTRannotatorSupplementary informationSupplementary data are available at bioRxiv.

Download Full-text

HLA-dependent variation in SARS-CoV-2 CD8+ T cell cross-reactivity with human coronaviruses

10.1101/2021.07.17.452778 ◽

2021 ◽

Author(s):

Paul Buckley ◽

Chloe Hyun-jung Lee ◽

Mariana Pereira Pinho ◽

Rosana Ottakandathil Babu ◽

Jeongmin Woo ◽

...

Keyword(s):

T Cell ◽

Sequence Similarity ◽

Cross Reactivity ◽

Skewed Distribution ◽

Prior Exposure ◽

High Sequence Similarity ◽

Protein Coding ◽

Cell Immunity ◽

Immunogenic Peptides ◽

Human Coronaviruses

Pre-existing T cell immunity to SARS-CoV-2 in individuals without prior exposure to SARS-CoV-2 has been reported in several studies. While emerging evidence hints toward prior exposure to common-cold human coronaviruses (HCoV), the extent of- and conditions for- cross-protective immunity between SARS-CoV-2 and HCoVs remain open. Here, by leveraging a comprehensive pool of publicly available functionally evaluated SARS-CoV-2 peptides, we report 126 immunogenic SARS-CoV-2 peptides with high sequence similarity to 285 MHC-presented target peptides from at least one of four HCoV, thus providing a map describing the landscape of SARS-CoV-2 shared and private immunogenic peptides with functionally validated T cell responses. Using this map, we show that while SARS-CoV-2 immunogenic peptides in general exhibit higher level of dissimilarity to both self-proteome and -microbiomes, there exist several SARS-CoV-2 immunogenic peptides with high similarity to various human protein coding genes, some of which have been reported to have elevated expression in severe COVID-19 patients. We then combine our map with a SARS-CoV-2-specific TCR repertoire data from COVID-19 patients and healthy controls and show that whereas the public repertoire for the majority of convalescent patients are dominated by TCRs cognate to private SARS-CoV-2 peptides, for a subset of patients, more than 50% of their public repertoires that show reactivity to SARS-CoV-2, consist of TCRs cognate to shared SARS-CoV-2-HCoV peptides. Further analyses suggest that the skewed distribution of TCRs cognate to shared and private peptides in COVID-19 patients is likely to be HLA-dependent. Finally, by utilising the global prevalence of HLA alleles, we provide 10 peptides with known cognate TCRs that are conserved across SARS-CoV-2 and multiple human coronaviruses and are predicted to be recognised by a high proportion of the global population. Overall, our work indicates the potential for HCoV-SARS-CoV-2 reactive CD8+ T cells, which is likely dependent on differences in HLA-coding genes among individuals. These findings may have important implications for COVID-19 heterogeneity and vaccine-induced immune responses as well as robustness of immunity to SARS-CoV-2 and its variants.

Download Full-text