scholarly journals LongGeneDB: a data hub for long genes

2020 ◽  
Author(s):  
Yura Kim ◽  
Mariam Naghavi ◽  
Ying-Tao Zhao

ABSTRACTThe human genome contains more than 4000 genes that are longer than 100 kb. These long genes require more time and resources to make a transcript than shorter genes do. Long genes have also been linked to various human diseases. Specific mechanisms are utilized by long genes to facilitate their transcription and co-transcriptional processes. This results in unique features in their multi-omics profiles. Although these unique profiles are important to understand long genes, a database that provides an integrated view and easy access to the multi-omics profiles of long genes does not exist. We leveraged the publicly accessible multi-omics data and systematically analyzed the genomic conservation, histone modifications, chromatin organization, tissue-specific transcriptome, and single cell transcriptome of 992 protein-coding genes that are longer than 200 kb in the mouse genome. We also examined the evolution history of their gene lengths in 15 species that belong to six Classes and 11 Orders. To share the multi-omics profiles of long genes, we developed a user-friendly and easy-to-use database, LongGeneDB (https://longgenedb.com), for users to search, browse, and download these profiles. LongGeneDB will be a useful data hub for the biomedical research community to understand long genes.

2020 ◽  
Vol 49 (D1) ◽  
pp. D962-D968 ◽  
Author(s):  
Zhao Li ◽  
Lin Liu ◽  
Shuai Jiang ◽  
Qianpeng Li ◽  
Changrui Feng ◽  
...  

Abstract Expression profiles of long non-coding RNAs (lncRNAs) across diverse biological conditions provide significant insights into their biological functions, interacting targets as well as transcriptional reliability. However, there lacks a comprehensive resource that systematically characterizes the expression landscape of human lncRNAs by integrating their expression profiles across a wide range of biological conditions. Here, we present LncExpDB (https://bigd.big.ac.cn/lncexpdb), an expression database of human lncRNAs that is devoted to providing comprehensive expression profiles of lncRNA genes, exploring their expression features and capacities, identifying featured genes with potentially important functions, and building interactions with protein-coding genes across various biological contexts/conditions. Based on comprehensive integration and stringent curation, LncExpDB currently houses expression profiles of 101 293 high-quality human lncRNA genes derived from 1977 samples of 337 biological conditions across nine biological contexts. Consequently, LncExpDB estimates lncRNA genes’ expression reliability and capacities, identifies 25 191 featured genes, and further obtains 28 443 865 lncRNA-mRNA interactions. Moreover, user-friendly web interfaces enable interactive visualization of expression profiles across various conditions and easy exploration of featured lncRNAs and their interacting partners in specific contexts. Collectively, LncExpDB features comprehensive integration and curation of lncRNA expression profiles and thus will serve as a fundamental resource for functional studies on human lncRNAs.


2017 ◽  
Vol 114 (34) ◽  
pp. 9158-9163 ◽  
Author(s):  
Steven Timmermans ◽  
Marc Van Montagu ◽  
Claude Libert

Mouse inbred strains remain essential in science. We have analyzed the publicly available genome sequences of 36 popular inbred strains and provide lists for each strain of protein-coding genes that acquired sequence variations that cause premature STOP codons, loss of STOP codons and single nucleotide polymorphisms, and short in-frame insertions and deletions. Our data give an overview of predicted defective proteins, including predicted impact scores, of all these strains compared with the reference mouse genome of C57BL/6J. These data can also be retrieved via a searchable website (mousepost.be) and allow a global, better interpretation of genetic background effects and a source of naturally defective alleles in these 36 sequenced classical and high-priority mouse inbred strains.


2021 ◽  
Author(s):  
Fredrik Salmen ◽  
Joachim De Jonghe ◽  
Tomasz S. Kaminski ◽  
Anna Alemany ◽  
Guillermo Parada ◽  
...  

In recent years, single-cell transcriptome sequencing has revolutionized biology, allowing for the unbiased characterization of cellular subpopulations. However, most methods amplify the termini of polyadenylated transcripts capturing only a small fraction of the total cellular transcriptome. This precludes the detection of many long non-coding, short non-coding and non-polyadenylated protein-coding transcripts. Additionally, most workflows do not sequence the full transcript hindering the analysis of alternative splicing. We therefore developed VASA- seq to detect the total transcriptome in single cells. VASA-seq is compatible with both plate- based formats and droplet microfluidics. We applied VASA-seq to over 30,000 single cells in the developing mouse embryo during gastrulation and early organogenesis. The dynamics of the total single-cell transcriptome result in the discovery of novel cell type markers many based on non-coding RNA, an in vivo cell cycle analysis and an improved RNA velocity characterization. Moreover, it provides the first comprehensive analysis of alternative splicing during mammalian development.


2021 ◽  
Author(s):  
Marc-André Legault ◽  
Louis-Philippe Lemieux Perreault ◽  
Marie-Pierre Dubé

Structured AbstractMotivationThe relationship between protein coding genes and phenotypes has the potential to inform on the underlying molecular function in disease etiology. We conducted a phenome-wide association study (pheWAS) of protein coding genes using a principal components analysis-based approach in the UK Biobank.ResultsWe tested the association between 19,114 protein coding gene regions and 1,210 phenotypes including anthropometric measurements, laboratory biomarkers, cancer registry data, hospitalization and death record codes and algorithmically-defined cardiovascular outcomes. We report the pheWAS results in a user-friendly web-based browser. Taking atrial fibrillation, a common cardiac arrhythmia, as an example, ExPheWas identified genes that are known drug targets for the treatment of arrhythmias and genes involved in biological processes implicated in cardiac muscle function. We also identified MYOT as a possible atrial fibrillation gene.Availability and implementationThe ExPheWas browser and API are available at http://exphewas.statgen.org/[email protected]


2018 ◽  
Author(s):  
Tom R. Booker ◽  
Peter D. Keightley

AbstractA major goal of population genetics has been to determine the extent to which selection at linked sites influences patterns of neutral nucleotide diversity in the genome. Multiple lines of evidence suggest that diversity is influenced by both positive and negative selection. For example, in many species there are troughs in diversity surrounding functional genomic elements, consistent with the action of either background selection (BGS) or selective sweeps. In this study, we investigated the causes of the diversity troughs that are observed in the wild house mouse genome. Using the unfolded site frequency spectrum (uSFS), we estimated the strength and frequencies of deleterious and advantageous mutations occurring in different functional elements in the genome. We then used these estimates to parameterize forward-in-time simulations of chromosomes, using realistic distributions of functional elements and recombination rate variation in order to determine if selection at linked sites can explain the observed patterns of nucleotide diversity. The simulations suggest that BGS alone cannot explain the dips in diversity around either exons or conserved non-coding elements (CNEs). A combination of BGS and selective sweeps, however, can explain the troughs in diversity around CNEs. This is not the case for protein-coding exons, where observed dips in diversity cannot be explained by parameter estimates obtained from the uSFS. We discuss the extent to which our results provide evidence of sweeps playing a role in shaping patterns of nucleotide diversity and the limitations of using the uSFS for obtaining inferences of the frequency and effects of advantageous mutations.Author SummaryWe present a study examining the causes of variation in nucleotide diversity across the mouse genome. The status of mice as a model organism in the life sciences makes them an excellent model system for studying molecular evolution in mammals. In our study, we analyse how natural selection acting on new mutations can affect levels of nucleotide diversity through the processes of background selection and selective sweeps. To perform our analyses, we first estimated the rate and strengths of selected mutations from a sample of wild mice and then use our estimates in realistic population genetic simulations. Analysing simulations, we find that both harmful and beneficial mutations are required to explain patterns of nucleotide diversity in regions of the genome close to gene regulatory elements. For protein-coding genes, however, our approach is not able to fully explain observed patterns and we think that this is because there are strongly advantageous mutations that occur in protein-coding genes that we were not able to detect.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Xiaoling Zhang ◽  
◽  
Jeroen G. J. van Rooij ◽  
Yoshiyuki Wakabayashi ◽  
Shih-Jen Hwang ◽  
...  

Abstract Background Coronary artery calcification (CAC) is a noninvasive measure of coronary atherosclerosis, the proximal pathophysiology underlying most cases of myocardial infarction (MI). We sought to identify expression signatures of early MI and subclinical atherosclerosis in the Framingham Heart Study (FHS). In this study, we conducted paired-end RNA sequencing on whole blood collected from 198 FHS participants (55 with a history of early MI, 72 with high CAC without prior MI, and 71 controls free of elevated CAC levels or history of MI). We applied DESeq2 to identify coding-genes and long intergenic noncoding RNAs (lincRNAs) differentially expressed in MI and high CAC, respectively, compared with the control. Results On average, 150 million paired-end reads were obtained for each sample. At the false discovery rate (FDR) < 0.1, we found 68 coding genes and 2 lincRNAs that were differentially expressed in early MI versus controls. Among them, 60 coding genes were detectable and thus tested in an independent RNA-Seq data of 807 individuals from the Rotterdam Study, and 8 genes were supported by p value and direction of the effect. Immune response, lipid metabolic process, and interferon regulatory factor were enriched in these 68 genes. By contrast, only 3 coding genes and 1 lincRNA were differentially expressed in high CAC versus controls. APOD, encoding a component of high-density lipoprotein, was significantly downregulated in both early MI (FDR = 0.007) and high CAC (FDR = 0.01) compared with controls. Conclusions We identified transcriptomic signatures of early MI that include differentially expressed protein-coding genes and lincRNAs, suggesting important roles for protein-coding genes and lincRNAs in the pathogenesis of MI.


Zootaxa ◽  
2008 ◽  
Vol 1945 (1) ◽  
pp. 51-66 ◽  
Author(s):  
NICOLAS VIDAL ◽  
WILLIAM R. BRANCH ◽  
OLIVIER S. G. PAUWELS ◽  
S. BLAIR HEDGES ◽  
DONALD G. BROADLEY ◽  
...  

The Elapoidea includes the Elapidae and a large (~60 genera, 280 sp.) and mostly African (including Madagascar) radiation termed Lamprophiidae by Vidal et al. (2007), that includes at least four major groups: the psammophiines, atractaspidines, lamprophiines and pseudoxyrhophiines. In this work, we reviewed the recent taxonomic history of the lamprophiids, and built a data set including two nuclear protein-coding genes (c-mos and RAG2), two mitochondrial rRNA genes (12S and 16S rRNA) and two mitochondrial protein-coding genes (cytochrome b and ND4) for 85 species belonging to 45 genera (thus representing about 75% of the generic diversity and 30% of the specific diversity of the radiation), in order to clarify the phylogenetic relationships of this large and neglected group at the subfamilial and generic levels. To this aim, 480 new sequences were produced. The vast majority of the investigated genera fall into four main monophyletic clusters, that correspond to the four subfamilies mentioned above, although the content of atractaspidines, lamprophiines and pseudoxyrhophiines is revised. We confirm the polyphyly of the genus Stenophis, and the relegation of the genus name Dromophis to the synonymy of the genus name Psammophis. Gonionotophis brussauxi is nested within Mehelya. The genus Lamprophis Fitzinger, 1843 is paraphyletic with respect to Lycodonomorphus Fitzinger, 1843. Lamprophis swazicus is the sister-group to Hormonotus modestus, and may warrant generic recognition. Molecular data do not support the traditional placement of Micrelaps within the Atractaspidinae, but its phylogenetic position, along with that of Oxyrhabdium (previously considered to belong to the Xenodermatidae), requires additional molecular data and they are both treated as Elapoidea incertae sedis. The interrelationships of Psammophiinae, Atractaspidinae, Lamprophiinae, Pseudoxyrhophiinae, Prosymna (13 sp.), Pseudaspis (1 sp.) and Pythonodipsas (1 sp.), Buhoma (2 species), and Psammodynastes (1 sp.) remain unresolved. Finally, the genus Lycognathophis, endemic to the Seychelles, does not belong to the African radiation, but to the Natricidae.


1999 ◽  
Vol 15 (4) ◽  
pp. 673-684 ◽  
Author(s):  
Jamie R. Stevens ◽  
Wendy C. Gibson

In the absence of a fossil record, the evolution of protozoa has until recently largely remained a matter for speculation. However, advances in molecular methods and phylogenetic analysis are now allowing interpretation of the "history written in the genes". This review focuses on recent progress in reconstruction of trypanosome phylogeny based on molecular data from ribosomal RNA, the miniexon and protein-coding genes. Sufficient data have now been gathered to demonstrate unequivocally that trypanosomes are monophyletic; the phylogenetic trees derived can serve as a framework to reinterpret the biology, taxonomy and present day distribution of trypanosome species, providing insights into the coevolution of trypanosomes with their vertebrate hosts and vectors. Different methods of dating the divergence of trypanosome lineages give rise to radically different evolutionary scenarios and these are reviewed. In particular, the use of one such biogeographically based approach provides new insights into the coevolution of the pathogens, Trypanosoma brucei and Trypanosoma cruzi, with their human hosts and the history of the diseases with which they are associated.


2019 ◽  
Author(s):  
Tyler Alioto ◽  
Konstantinos Alexiou ◽  
Amélie Bardil ◽  
Fabio Barteri ◽  
Raúl Castanera ◽  
...  

AbstractCombining both short and long-read sequencing, we have estimated the almondPrunus dulciscv. Texas genome size in 235 Mbp and assembled 227.6 Mb of its sequence. The highly heterozygous compact genome of Texas comprises eight chromosomes, to which we have anchored over 91% of the assembly. We annotated 27,042 protein-coding genes and 6,800 non-coding transcripts. High levels of genetic variability were characterized after resequencing a collection of ten almond accessions. Phylogenomic comparison with the genomes of 16 other close and distant species allowed estimating that almond and peach diverged around 5.88 Mya. Comparison between peach and almond genomes confirmed the high synteny between these close relatives, but also revealed high numbers of presence-absence variants, many attributable to the movement of transposable elements (TEs). The number and distribution of TEs between peach and almond was similar, but the history of TE movement was distinct, with peach having a larger proportion of recent transpositions and almond preserving a higher level of polymorphism in the older TEs. When focusing on specific genes involved in key characters such as the bitter vs. sweet kernel taste and the formation of a fleshy mesocarp, we found that for one gene associated with the biosynthesis of amygdalin that confers the bitter kernel taste, several TEs were inserted in its vicinity only in sweet almond cultivars but not in bitter cultivars andPrunusbitter kernel relatives, includingP. webbii,P. mume, and other species like peach and cherry. TE insertions likely to produce affects in the expression of six more genes involved in the formation of the fleshy mesocarp were also identified. Altogether, our results suggest a key role of TEs in the recent history and diversification of almond with respect to peach.


Sign in / Sign up

Export Citation Format

Share Document