scholarly journals Novel open reading frames in human accelerated regions and transposable elements reveal new leads to understand schizophrenia and bipolar disorder

Author(s):  
Chaitanya Erady ◽  
Krishna Amin ◽  
Temiloluwa O. A. E. Onilogbo ◽  
Jakub Tomasik ◽  
Rebekah Jukes-Jones ◽  
...  

AbstractSchizophrenia (SCZ) and bipolar disorder are debilitating neuropsychiatric disorders arising from a combination of environmental and genetic factors. Novel open reading frames (nORFs) are genomic loci that give rise to previously uncharacterized transcripts and protein products. In our previous work, we have shown that nORFs can be biologically regulated and that they may play a role in cancer and rare diseases. More importantly, we have shown that nORFs may emerge in accelerated regions of the genome giving rise to species-specific functions. We hypothesize that nORFs represent a potentially important group of biological factors that may contribute to SCZ and bipolar disorder pathophysiology. Human accelerated regions (HARs) are genomic features showing human-lineage-specific rapid evolution that may be involved in biological regulation and have additionally been found to associate with SCZ genes. Transposable elements (TEs) are another set of genomic features that have been shown to regulate gene expression. As with HARs, their relevance to SCZ has also been suggested. Here, nORFs are investigated in the context of HARs and TEs. This work shows that nORFs whose expression is disrupted in SCZ and bipolar disorder are in close proximity to HARs and TEs and that some of them are significantly associated with SCZ and bipolar disorder genomic hotspots. We also show that nORF encoded proteins can form structures and potentially constitute novel drug targets.

2021 ◽  
Author(s):  
Juan F Cornejo-Franco ◽  
Francisco Flores ◽  
Dimitre Mollov ◽  
diego fernando quito-avila

Abstract The complete sequence of a new viral RNA from babaco (Vasconcellea x heilbornii) was determined. The genome consisted of 4,584 nucleotides organized in two non-overlapping open reading frames (ORFs 1 and 2), a 9-nt-long noncoding region (NCR) at the 5’ terminus and a 1,843 -nt-long NCR at the 3’ terminus. Sequence comparisons of ORF 2 revealed homology to the RNA-dependent-RNA-polymerase (RdRp) of several umbra- and umbra-related viruses. Phylogenetic analysis of the RdRp placed the new virus in a well-supported and cohesive clade that includes umbra-like viruses reported from papaya, citrus, opuntia, maize and sugarcane hosts. This clade shares a most recent ancestor with the umbraviruses but has different genomic features. The creation of a new genus, within the Tombusviridae, is proposed for the classification of these novel viruses.


Reproduction ◽  
2016 ◽  
Vol 152 (6) ◽  
pp. 727-739 ◽  
Author(s):  
Andrea Miccoli ◽  
Ike Olivotto ◽  
Andrea De Felice ◽  
Iole Leonori ◽  
Oliana Carnevali

The European anchovy Engraulis encrasicolus, a member of the Clupeiformes order, holds a great biological and economical importance. In the past, this species was mostly investigated with the aim of assessing its reproductive biology, trophic ecology, population dynamics and the relations existing with the physical environment. At present days, though, an almost complete lack of information afflicts its neuroendocrinology and reproductive physiology. The hypothalamic–pituitary–gonadal (HPG) axis at its highest levels was herein investigated. In this study, the gonadotropin-releasing hormone (GnRH), a neuropeptide underlying many reproduction-related processes, the most critical of which is the stimulation of gonadotropin synthesis and secretion from the pituitary gland, was cloned. Three forms (salmon GnRH, chicken-II GnRH and the species-specific type) were characterized in their full-length open-reading frames and, in accordance with other Clupeiformes species, the distinctive one was found to be the herring-type GnRH. We qualitatively and semiquantitatively evaluated the localizations of expressions and the temporal transcription patterns of the three GnRH forms in male and female specimens throughout their reproductive cycle as well as described their phylogeny with regard to teleost GnRH lineages, and, specifically, to other Clupeiformes species.


2010 ◽  
Vol 192 (20) ◽  
pp. 5289-5303 ◽  
Author(s):  
C. Peter Wolk ◽  
Sigal Lechno-Yossef ◽  
Karin M. Jäger

ABSTRACT Anabaena sp. strain PCC 7120, widely studied, has 145 annotated transposase genes that are part of transposable elements called insertion sequences (ISs). To determine the entirety of the ISs, we aligned transposase genes and their flanking regions; identified the ISs' possible terminal inverted repeats, usually flanked by direct repeats; and compared IS-interrupted sequences with homologous sequences. We thereby determined both ends of 87 ISs bearing 110 transposase genes in eight IS families (http://www-is.biotoul.fr/ ) and in a cluster of unclassified ISs, and of hitherto unknown miniature inverted-repeat transposable elements. Open reading frames were then identified to which ISs contributed and others—some encoding proteins of predictable function, including protein kinases, and restriction endonucleases—that were interrupted by ISs. Anabaena sp. ISs were often more closely related to exogenous than to other endogenous ISs, suggesting that numerous variant ISs were not degraded within PCC 7120 but transferred from without. This observation leads to the expectation that further sequencing projects will extend this and similar analyses. We also propose an adaptive role for poly(A) sequences in ISs.


Author(s):  
Sunil Thomas

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) responsible for the disease COVID-19 has wreaked havoc on the health and economy of humanity. In addition, the disease is observed in domestic and wild animals. The disease has impacted directly and indirectly every corner of the planet. Currently, there are no vaccines and effective therapies for COVID-19. SARS-CoV-2 is an enveloped virus with a single-stranded RNA genome of 29.8 kb. More than two-thirds of the genome comprises Orf1ab encoding 16 non-structural proteins (nsps) followed by mRNAs encoding structural proteins, spike (S), envelop (E), membrane (M), and nucleocapsid (N). These genes are interspaced with several accessory genes (open reading frames [Orf] 3a, 3b, 6, 7a, 7b, 8, 9b, 9c and 10). The functions of these proteins are of particular interest for understanding the pathogenesis of SARS-CoV-2. Several of the nsps (nsp3, nsp4, nsp6) and Orf3a are transmembrane proteins involved in regulating the host immunity, modifying host cell organelles for viral replication and escape and hence considered drug targets. In this paper we report mapping the transmembrane structure of the non-structural proteins of SARS-CoV-2.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jing Li ◽  
Urminder Singh ◽  
Zebulun Arendsee ◽  
Eve Syrkin Wurtele

The “dark transcriptome” can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins (“orphan-ORFs”); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.


Mobile DNA ◽  
2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Alicja Macko-Podgórni ◽  
Katarzyna Stelmach ◽  
Kornelia Kwolek ◽  
Dariusz Grzebelus

Abstract Background Miniature inverted repeat transposable elements (MITEs) are small non-autonomous DNA transposons that are ubiquitous in plant genomes, and are mobilised by their autonomous relatives. Stowaway MITEs are derived from and mobilised by elements from the mariner superfamily. Those elements constitute a significant portion of the carrot genome; however the variation caused by Daucus carota Stowaway MITEs (DcStos), their association with genes and their putative impact on genome evolution has not been comprehensively analysed. Results Fourteen families of Stowaway elements DcStos occupy about 0.5% of the carrot genome. We systematically analysed 31 genomes of wild and cultivated Daucus carota, yielding 18.5 thousand copies of these elements, showing remarkable insertion site polymorphism. DcSto element demography differed based on the origin of the host populations, and corresponded with the four major groups of D. carota, wild European, wild Asian, eastern cultivated and western cultivated. The DcStos elements were associated with genes, and most frequently occurred in 5′ and 3′ untranslated regions (UTRs). Individual families differed in their propensity to reside in particular segments of genes. Most importantly, DcSto copies in the 2 kb regions up- and downstream of genes were more frequently associated with open reading frames encoding transcription factors, suggesting their possible functional impact. More than 1.5% of all DcSto insertion sites in different host genomes contained different copies in exactly the same position, indicating the existence of insertional hotspots. The DcSto7b family was much more polymorphic than the other families in cultivated carrot. A line of evidence pointed at its activity in the course of carrot domestication, and identified Dcmar1 as an active carrot mariner element and a possible source of the transposition machinery for DcSto7b. Conclusion Stowaway MITEs have made a substantial contribution to the structural and functional variability of the carrot genome.


2009 ◽  
Vol 71-73 ◽  
pp. 203-206
Author(s):  
F.J. Ossandón ◽  
G. Rivera ◽  
F. Lazo ◽  
David S. Holmes

A particularly challenging problem in genome annotation is to attribute function to genes annotated as “hypothetical, no known function”. These typically account for about 40% of all genes regardless of the genome. Some of these are “orphan” genes and are not found in any other genome. Some of these could encode species specific proteins and so are particularly interesting for evaluating novel metabolic potential and for understanding the evolution of genes and genomes. Several similarity and non-similarity bioinformatics tools exist that help predict function of hypotheticals, but none are able to suggest function for more than a few percent and the annotation of the others remains a formidable task. We have developed a bioinformatics tool called AlterORF (www.AlterORF.cl) that is able to identify alternate open reading frames (ORFs) embedded within annotated genes. Analysis of over 2 million genes in over 700 completely sequenced genomes reveals that alternate ORFs of substantial length (potentially encoding 70 amino acids or more) are surprisingly common, especially in G+C rich genomes. During our examination of these alternate ORFs, we uncovered hundreds of examples where the alternate ORF has a significant hit with databases of motifs and domains (e.g. CDD, Pfam) and where the actual annotated gene is described as hypothetical and has no database match. This strongly suggests that the annotated gene has been incorrectly identified and that the alternate ORF is the real gene. We describe the evaluation of the following genomes of bioleaching microorganisms and others that reside in similar ecological niches using AlterORF: Acidithiobacillus ferrooxidans (2 strains), Leptospirillum type II, Methylacidiphilum infernorum, Picrophilus torridus, Sulfolobus acidocaldarius, S. solfataricus, S. tokodaii, Thermodesulfovibrio yellowstonii, Thermoplasma acidophilum and T. volcanium. Examples of novel genes from these microorganisms and their suggested roles in metabolism will be described.


2008 ◽  
Vol 75 (3) ◽  
pp. 811-822 ◽  
Author(s):  
Ralf Rosenstein ◽  
Christiane Nerz ◽  
Lalitha Biswas ◽  
Alexandra Resch ◽  
Guenter Raddatz ◽  
...  

ABSTRACT The Staphylococcus carnosus genome has the highest GC content of all sequenced staphylococcal genomes, with 34.6%, and therefore represents a species that is set apart from S. aureus, S. epidermidis, S. saprophyticus, and S. haemolyticus. With only 2.56 Mbp, the genome belongs to a family of smaller staphylococcal genomes, and the ori and ter regions are asymmetrically arranged with the replichores I (1.05 Mbp) and II (1.5 Mbp). The events leading up to this asymmetry probably occurred not that long ago in evolution, as there was not enough time to approach the natural tendency of a physical balance. Unlike the genomes of pathogenic species, the TM300 genome does not contain mobile elements such as plasmids, insertion sequences, transposons, or STAR elements; also, the number of repeat sequences is markedly decreased, suggesting a comparatively high stability of the genome. While most S. aureus genomes contain several prophages and genomic islands, the TM300 genome contains only one prophage, ΦTM300, and one genomic island, νSCA1, which is characterized by a mosaic structure mainly composed of species-specific genes. Most of the metabolic core pathways are present in the genome. Some open reading frames are truncated, which reflects the nutrient-rich environment of the meat starter culture, making some functions dispensable. The genome is well equipped with all functions necessary for the starter culture, such as nitrate/nitrite reduction, various sugar degradation pathways, two catalases, and nine osmoprotection systems. The genome lacks most of the toxins typical of S. aureus as well as genes involved in biofilm formation, underscoring the nonpathogenic status.


Author(s):  
Camilla Borges Gazolla ◽  
Adriana Ludwig ◽  
Joana de Moura Gama ◽  
Daniel Pacheco Bruschi

Abstract Anuran genomes have a large number and diversity of transposable elements, but are little explored, mainly in relation to their molecular structure and evolutionary dynamics. Here, we investigated the retrotransposons containing tyrosine recombinase (YR) (order DIRS) in the genome of Xenopus tropicalis and Xenopus laevis. These anurans show 2n = 20 and the 2n = 36 karyotypes, respectively. They diverged about 48 million years ago (mya) and X. laevis had an allotetraploid origin (around 17-18 mya). Our investigation is based on the analysis of the molecular structure and the phylogenetic relationships of 95 DIRS families of Xenopus belonging to DIRS-like and Ngaro-like superfamilies. We were able to identify molecular signatures in the 5' and 3' non-coding terminal regions, preserved open reading frames (ORFs) and conserved domains that are specific to distinguish each superfamily. We recognize two ancient amplification waves of DIRS-like elements that occurred in the ancestor of both species and a higher density of the old/degenerate copies detected in both subgenomes of X. laevis. More recent amplification waves are seen in X. tropicalis (less than 3.2 mya) and X. laevis (around 10 mya) corroborating with transcriptional activity evidence. All DIRS-like families were found in both X. laevis subgenomes, while a few were most represented in the L subgenome. Ngaro-like elements presented less diversity and quantity in X. tropicalis and X. laevis genomes, although potentially active copies were found in both species and this is consistent with a recent amplification wave seen in the evolutionary landscape. Our findings highlight a differential diversity-level and evolutionary dynamics of the YR retrotransposons in X. tropicalis and X. laevis species expanding our comprehension of the behavior of these elements in both genomes during the diversification process.


1998 ◽  
Vol 66 (10) ◽  
pp. 4611-4623 ◽  
Author(s):  
Robert D. Perry ◽  
Susan C. Straley ◽  
Jacqueline D. Fetherston ◽  
Debra J. Rose ◽  
Jason Gregor ◽  
...  

ABSTRACT The low-Ca2+-response (LCR) plasmid pCD1 of the plague agent Yersinia pestis KIM5 was sequenced and analyzed for its genetic structure. pCD1 (70,509 bp) has an IncFIIA-like replicon and a SopABC-like partition region. We have assigned 60 apparently intact open reading frames (ORFs) that are not contained within transposable elements. Of these, 47 are proven or possible members of the LCR, a major virulence property of human-pathogenicYersinia spp., that had been identified previously in one or more of Y. pestis or the enteropathogenic yersiniaeYersinia enterocolitica and Yersinia pseudotuberculosis. Of these 47 LCR-related ORFs, 35 constitute a continuous LCR cluster. The other LCR-related ORFs are interspersed among three intact insertion sequence (IS) elements (IS100and two new IS elements, IS1616 and IS1617) and numerous defective or partial transposable elements. Regional variations in percent GC content and among ORFs encoding effector proteins of the LCR are additional evidence of a complex history for this plasmid. Our analysis suggested the possible addition of a new Syc- and Yop-encoding operon to the LCR-related pCD1 genes and gave no support for the existence of YopL. YadA likely is not expressed, as was the case for Y. pestis EV76, and the gene for the lipoprotein YlpA found in Y. enterocolitica likely is a pseudogene in Y. pestis. The yopM gene is longer than previously thought (by a sequence encoding two leucine-rich repeats), the ORF upstream of ypkA-yopJ is discussed as a potential Syc gene, and a previously undescribed ORF downstream ofyopE was identified as being potentially significant. Eight other ORFs not associated with IS elements were identified and deserve future investigation into their functions.


Sign in / Sign up

Export Citation Format

Share Document