Proteins encoded by Novel ORFs have increased disorder but can be biochemically regulated and harbour pathogenic mutations

AbstractRecent evidence has suggested that protein or protein-like products can be encoded by previously uncharacterized Open Reading Frames (ORFs) that we define as Novel Open Reading Frames (nORFs)1,2. These nORFs are present in both coding and non coding regions of the human genome and the novel proteins that they encode have increased the number and complexity of cellular proteome from bacteria to humans. It is a conundrum whether these protein or protein-like products could play any significant functional biological role. But hopes have been raised to target them for anticancer and antimicrobial therapy3,4. To infer whether these novel proteins can perform biological functions, we used computational predictions to systematically investigate whether their amino acid sequences can form ordered or disordered structures. Our results indicated that that these novel proteins have significantly higher predicted disorder structure compared to all known proteins, yet we do not find any correlation between the pathogenicity of the mutations and whether they are present in the ordered and disordered regions of these novel proteins. This study reveals that we should investigate these novel proteins more systematically as they may be important to understand complex diseases.

Download Full-text

Isolation and Characterization of Genes Responsible for Naphthalene Degradation from Thermophilic Naphthalene Degrader, Geobacillus sp. JF8

Microorganisms ◽

10.3390/microorganisms8010044 ◽

2019 ◽

Vol 8 (1) ◽

pp. 44 ◽

Cited By ~ 1

Author(s):

Daisuke Miyazawa ◽

Le Thi Ha Thanh ◽

Akio Tani ◽

Masaki Shintani ◽

Nguyen Hoang Loc ◽

...

Keyword(s):

Amino Acid ◽

Thermophilic Bacterium ◽

Amino Acid Sequences ◽

Open Reading Frames ◽

Pcr Analysis ◽

Naphthalene Degradation ◽

Isolation And Characterization ◽

Dihydrodiol Dehydrogenase ◽

Reading Frames

Geobacillus sp. JF8 is a thermophilic biphenyl and naphthalene degrader. To identify the naphthalene degradation genes, cis-naphthalene dihydrodiol dehydrogenase was purified from naphthalene-grown cells, and its N-terminal amino acid sequence was determined. Using a DNA probe encoding the N-terminal region of the dehydrogenase, a 10-kb DNA fragment was isolated. Upstream of nahB, a gene for dehydrogenase, there were two open reading frames which were designated as nahAc and nahAd, respectively. The products of nahAc and nahAd were predicted to be alpha and beta subunit of ring-hydroxylating dioxygenases, respectively. Phylogenetic analysis of amino acid sequences of NahB indicated that it did not belong to the cis-dihydrodiol dehydrogenase group that includes those of classical naphthalene degradation pathways. Downstream of nahB, four open reading frames were found, and their products were predicted as meta-cleavage product hydrolase, monooxygenase, dehydrogenase, and gentisate 1,2-dioxygenase, respectively. A reverse transcriptase-PCR analysis showed that transcription of nahAcAd was induced by naphthalene. These findings indicate that we successfully identified genes involved in the upper pathway of naphthalene degradation from a thermophilic bacterium.

Download Full-text

Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling

10.1101/031617 ◽

2015 ◽

Author(s):

Anil Raj ◽

Sidney H. Wang ◽

Heejung Shim ◽

Arbel Harpak ◽

Yang I. Li ◽

...

Keyword(s):

Significant Negative Correlation ◽

Selective Constraint ◽

Open Reading Frames ◽

Sequence Information ◽

The Novel ◽

Protein Coding ◽

Coding Sequences ◽

Coding Regions ◽

Human Lymphoblastoid Cell ◽

Reading Frames

AbstractAccurate annotation of protein coding regions is essential for understanding how genetic information is translated into biological functions. Here we describe riboHMM, a new method that uses ribosome footprint data along with gene expression and sequence information to accurately infer translated sequences. We applied our method to human lymphoblastoid cell lines and identified 7,273 previously unannotated coding sequences, including 2,442 translated upstream open reading frames. We observed an enrichment of harringtonine-treated ribosome footprints at the inferred initiation sites, validating many of the novel coding sequences. The novel sequences exhibit significant signatures of selective constraint in the reading frames of the inferred proteins, suggesting that many of these are functional. Nearly 40% of bicistronic transcripts showed significant negative correlation in the levels of translation of their two coding sequences, suggesting a key regulatory role for these novel translated sequences. Our work significantly expands the set of known coding regions in humans.

Download Full-text

Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling

eLife ◽

10.7554/elife.13328 ◽

2016 ◽

Vol 5 ◽

Cited By ~ 66

Author(s):

Anil Raj ◽

Sidney H Wang ◽

Heejung Shim ◽

Arbel Harpak ◽

Yang I Li ◽

...

Keyword(s):

Selective Constraint ◽

Open Reading Frames ◽

The Novel ◽

Drug Induced ◽

Protein Coding ◽

Coding Sequences ◽

Coding Regions ◽

Human Lymphoblastoid Cell ◽

Validation Rate ◽

Reading Frames

Accurate annotation of protein coding regions is essential for understanding how genetic information is translated into function. We describe riboHMM, a new method that uses ribosome footprint data to accurately infer translated sequences. Applying riboHMM to human lymphoblastoid cell lines, we identified 7273 novel coding sequences, including 2442 translated upstream open reading frames. We observed an enrichment of footprints at inferred initiation sites after drug-induced arrest of translation initiation, validating many of the novel coding sequences. The novel proteins exhibit significant selective constraint in the inferred reading frames, suggesting that many are functional. Moreover, ~40% of bicistronic transcripts showed negative correlation in the translation levels of their two coding sequences, suggesting a potential regulatory role for these novel regions. Despite known limitations of mass spectrometry to detect protein expressed at low level, we estimated a 14% validation rate. Our work significantly expands the set of known coding regions in humans.

Download Full-text

Relapsing Fever Spirochetes Contain Chromosomal Genes with Unique Direct Tandemly Repeated Sequences

Infection and Immunity ◽

10.1128/iai.73.5.3025-3037.2005 ◽

2005 ◽

Vol 73 (5) ◽

pp. 3025-3037 ◽

Cited By ~ 9

Author(s):

Cyril Guyard ◽

Earl M. Chester ◽

Sandra J. Raffel ◽

Merry E. Schrumpf ◽

Paul F. Policastro ◽

...

Keyword(s):

Amino Acid ◽

Human Infection ◽

Amino Acid Sequences ◽

Open Reading Frames ◽

Relapsing Fever ◽

Serum Samples ◽

Borrelia Hermsii ◽

Reading Frames ◽

Antigenic Heterogeneity

ABSTRACT Genome sequencing of the relapsing fever spirochetes Borrelia hermsii and Borrelia turicatae identified three open reading frames (ORFs) on the chromosomes that contained internal, tandemly repeated amino acid sequences that were absent in the Lyme disease spirochete Borrelia burgdorferi. The predicted amino acid sequences of these genes (BH0209, BH0512, and BH0553) have hydrophobic N termini, indicating that these proteins may be secreted. B. hermsii transcribed the three ORFs in vitro, and the BH0512- and BH0553-encoded proteins (PBH-512 and PBH-553) were produced in vitro and in experimentally infected mice. PBH-512 and PBH-553 were on the spirochete's outer surface, and antiserum to these proteins reduced the adherence of B. hermsii to red blood cells. PCR analyses of 28 isolates of B. hermsii and 8 isolates of B. turicatae demonstrated polymorphism in each gene correlated with the number of repeats. Serum samples from relapsing fever patients reacted with recombinant PBH-512 and PBH-553, suggesting that these proteins are produced during human infection. These polymorphic proteins may be involved in the pathogenicity of these relapsing fever spirochetes and provide a mechanism for antigenic heterogeneity within their populations.

Download Full-text

Incorporation of iron-sulphur clusters in membrane-bound proteins

Biochemical Society Transactions ◽

10.1042/bst0290418 ◽

2001 ◽

Vol 29 (4) ◽

pp. 418-421 ◽

Cited By ~ 16

Author(s):

A. Seidler ◽

K. Jaschkowitz ◽

M. Wollenberg

Keyword(s):

Amino Acid ◽

Amino Acid Sequences ◽

Open Reading Frames ◽

Synechocystis Pcc 6803 ◽

Membrane Bound ◽

Nif Proteins ◽

Reading Frames ◽

Pcc 6803

The completely sequenced genome of the cyano-bacterium Synechocystis PCC 6803 contains several open reading frames, of which the deduced amino acid sequences show similarities to proteins known to be involved in FeS cluster synthesis of nitrogenase (Nif proteins) and other FeS proteins (Isc proteins). In this article, the results of our studies on these proteins are summarized and discussed with respect to their relevance in FeS cluster incorporation in chloroplasts. In cyanobacteria, there appears to exist several pathways for FeS cluster synthesis.

Download Full-text

BAIUCAS: a novel BLAST-based algorithm for the identification of upstream open reading frames with conserved amino acid sequences and its application to the Arabidopsis thaliana genome

Bioinformatics ◽

10.1093/bioinformatics/bts303 ◽

2012 ◽

Vol 28 (17) ◽

pp. 2231-2241 ◽

Cited By ~ 38

Author(s):

Hiro Takahashi ◽

Anna Takahashi ◽

Satoshi Naito ◽

Hitoshi Onouchi

Keyword(s):

Arabidopsis Thaliana ◽

Amino Acid ◽

Amino Acid Sequences ◽

Open Reading Frames ◽

Upstream Open Reading Frames ◽

Arabidopsis Thaliana Genome ◽

Reading Frames

Download Full-text

Characterization of pRGO1, a Plasmid from Propionibacterium acidipropionici, and Its Use for Development of a Host-Vector System in Propionibacteria

Applied and Environmental Microbiology ◽

10.1128/aem.66.11.4688-4695.2000 ◽

2000 ◽

Vol 66 (11) ◽

pp. 4688-4695 ◽

Cited By ~ 43

Author(s):

Pornpimon Kiatpapan ◽

Yoshiteru Hashimoto ◽

Hisako Nakamura ◽

Yong-Zhe Piao ◽

Hisayo Ono ◽

...

Keyword(s):

Amino Acid ◽

Shuttle Vector ◽

Amino Acid Sequences ◽

Open Reading Frames ◽

Vector System ◽

Propionibacterium Acidipropionici ◽

Rep Protein ◽

Gram Positive Bacteria ◽

High Degree ◽

Reading Frames

ABSTRACT The complete nucleotide sequence of pRGO1, a cryptic plasmid fromPropionibacterium acidipropionici E214, was determined. pRGO1 is 6,868 bp long, and its G+C content is 65.0%. Frame analysis of the sequence revealed six open reading frames, which were designated Orf1 to Orf6. The deduced amino acid sequences of Orf1 and Orf2 showed extensive similarities to an initiator of plasmid replication, the Rep protein, of various plasmids of gram-positive bacteria. The amino acid sequence of the putative translation product of orf3 exhibited a high degree of similarity to the amino acid sequences of DNA invertase in several bacteria. For the putative translation products of orf4,orf5, and orf6, on the other hand, no homologous sequences were found. The function of these open reading frames was studied by deletion analysis. A shuttle vector, pPK705, was constructed for shuttling between Escherichia coli and a Propionibacterium strain containingorf1 (repA), orf2(repB), orf5, and orf6 from pRGO1, pUC18, and the hygromycin B-resistant gene as a drug marker. Shuttle vector pPK705 successfully transformed Propionibacterium freudenreichii subsp. shermanii IFO12426 by electroporation at an efficiency of 8 × 106 CFU/μg of DNA under optimized conditions. Transformation of various species of propionibacteria with pPK705 was also performed at efficiencies of about 104 to 107 CFU/μg of DNA. The vector was stably maintained in strains of P. freudenreichiisubsp. shermanii, P. freudenreichii, P. pentosaceum, and P. freudenreichii subsp.freudenreichii grown under nonselective conditions. Successful manipulation of a host-vector system in propionibacteria should facilitate genetic studies and lead to creation of genes that are useful industrially.

Download Full-text

Molecular and Phylogenetic Characterisation of a Highly Divergent Novel Parvovirus (Psittaciform Chaphamaparvovirus 2) in Australian Neophema Parrots

Pathogens ◽

10.3390/pathogens10121559 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1559

Author(s):

Subir Sarker

Keyword(s):

Amino Acid ◽

Phylogenetic Analyses ◽

Systematic Investigation ◽

Open Reading Frames ◽

Replicase Gene ◽

The Novel ◽

Sequence Identity ◽

Avian Origin ◽

Rainbow Lorikeet ◽

Reading Frames

Parvoviruses under the genus Chaphamaparvovirus (subfamily Hamaparvovirinae) are highly divergent and have recently been identified in many animals. However, the detection and characterisation of parvoviruses in psittacine birds are limited. Therefore, this study reports a novel parvovirus, tentatively named psittaciform chaphamaparvovirus 2 (PsChPV-2) under the genus Chaphamaparvovirus, which was identified in Australian Neophema birds. The PsChPV-2 genome is 4371 bp in length and encompasses four predicted open-reading frames, including two major genes, a nonstructural replicase gene (NS1), and a structural capsid gene (VP1). The NS1 and VP1 genes showed the closest amino acid identities of 56.2% and 47.7%, respectively, with a recently sequenced psittaciform chaphamaparvovirus 1 from a rainbow lorikeet (Trichoglossus moluccanus). Subsequent phylogenetic analyses exhibited that the novel PsChPV-2 is most closely related to other chaphamaparvoviruses of avian origin and has the greatest sequence identity with PsChPV-1 (60.6%). Further systematic investigation is warranted to explore the diversity with many avian-associated parvoviruses likely to be discovered.

Download Full-text

Complete nucleotide sequence and genome organization of a single-stranded RNA virus infecting the marine fungoid protist Schizochytrium sp.

Journal of General Virology ◽

10.1099/vir.0.81204-0 ◽

2006 ◽

Vol 87 (3) ◽

pp. 723-733 ◽

Cited By ~ 28

Author(s):

Yoshitake Takao ◽

Kazuyuki Mise ◽

Keizo Nagasaki ◽

Tetsuro Okuno ◽

Daiske Honda

Keyword(s):

Amino Acid ◽

Nucleotide Sequence ◽

Complete Nucleotide Sequence ◽

Rna Virus ◽

Amino Acid Sequences ◽

Open Reading Frames ◽

Genome Database ◽

Virus Family ◽

Translation Mechanism ◽

Reading Frames

The complete nucleotide sequence of the genomic RNA of a marine fungoid protist-infecting virus (Schizochytrium single-stranded RNA virus; SssRNAV) has been determined. The viral RNA is single-stranded with a positive sense and is 9018 nt in length [excluding the 3′ poly(A) tail]. It contains two long open reading frames (ORFs), which are separated by an intergenic region of 92 nt. The 5′ ORF (ORF1) is preceded by an untranslated leader sequence of 554 nt. The 3′ large ORF (ORF2) and an additional ORF (ORF3) overlap ORF2 by 431 nt and are followed by an untranslated region of 70 nt [excluding the 3′ poly(A) tail]. The deduced amino acid sequences of ORF1 and ORF2 products show similarity to non-structural and structural proteins of dicistroviruses, respectively. However, Northern blot analysis suggests that SssRNAV synthesizes subgenomic RNAs to translate ORF2 and ORF3, showing that the translation mechanism of downstream ORFs is distinct from that of dicistroviruses. Furthermore, although considerable similarities were detected by using a blast genome database search, phylogenetic analysis based on both the nucleotide and amino acid sequences of the putative RNA-dependent RNA polymerase (RdRp) and the RNA helicase suggests that SssRNAV is phylogenetically distinct from other virus families. Therefore, it is concluded that SssRNAV is not a member of any currently defined virus family and belongs to a novel, unrecognized virus group.

Download Full-text

Comparison of genomic and predicted amino acid sequences of respiratory and enteric bovine coronaviruses isolated from the same animal with fatal shipping pneumonia

Journal of General Virology ◽

10.1099/0022-1317-82-12-2927 ◽

2001 ◽

Vol 82 (12) ◽

pp. 2927-2933 ◽

Cited By ~ 54

Author(s):

Vladimir N. Chouljenko ◽

X. Q. Lin ◽

J. Storz ◽

Konstantin G. Kousoulas ◽

Alexander E. Gorbalenya

Keyword(s):

Amino Acid ◽

Hepatitis Virus ◽

Mouse Hepatitis Virus ◽

Amino Acid Sequences ◽

Open Reading Frames ◽

Single Amino Acid ◽

Untranslated Regions ◽

Genome Sequences ◽

Shipping Fever ◽

Reading Frames

The complete genome sequences are reported here of two field isolates of bovine coronavirus (BCoV), which were isolated from respiratory and intestinal samples of the same animal experiencing fatal pneumonia during a bovine shipping fever epizootic. Both genomes contained 31028 nucleotides and included 13 open reading frames (ORFs) flanked by 5′- and 3′-untranslated regions (UTRs). ORF1a and ORF1b encode replicative polyproteins pp1a and pp1ab, respectively, that contain all of the putative functional domains documented previously for the closest relative, mouse hepatitis virus. The genomes of the BCoV isolates differed in 107 positions, scattered throughout the genome except the 5′-UTR. Differences in 25 positions were non-synonymous and were located in all proteins except pp1b. Six replicase mutations were identified within or immediately downstream of the predicted largest pp1a-derived protein, p195/p210. Single amino acid changes within p195/p210 as well as within the S glycoprotein might contribute to the different phenotypes of the BCoV isolates.

Download Full-text