gene annotations
Recently Published Documents


TOTAL DOCUMENTS

115
(FIVE YEARS 57)

H-INDEX

14
(FIVE YEARS 5)

2022 ◽  
Author(s):  
Caroline M. Weisman ◽  
Andrew M. Murray ◽  
Sean R Eddy

Comparisons of genomes of different species are used to identify lineage-specific genes, those genes that appear unique to one species or clade. Lineage-specific genes are often thought to represent genetic novelty that underlies unique adaptations. Identification of these genes depends not only on genome sequences, but also on inferred gene annotations. Comparative analyses typically use available genomes that have been annotated using different methods, increasing the risk that orthologous DNA sequences may be erroneously annotated as a gene in one species but not another, appearing lineage-specific as a result. To evaluate the impact of such 'annotation heterogeneity', we identified four clades of species with sequenced genomes with more than one publicly available gene annotation, allowing us to compare the number of lineage-specific genes inferred when differing annotation methods are used to those resulting when annotation method is uniform across the clade. In these case studies, annotation heterogeneity increases the apparent number of lineage-specific genes by up to 15-fold, suggesting that annotation heterogeneity is a substantial source of potential artifact.


2022 ◽  
Author(s):  
Andrew J Harris ◽  
Nicole M Foley ◽  
Tiffani L Williams ◽  
William J Murphy

Tree House Explorer (THEx) is a genome browser that integrates phylogenomic data and genomic annotations into a single interactive platform for combined analysis. THEx allows users to visualize genome-wide variation in evolutionary histories and genetic divergence on a chromosome-by-chromosome basis, with continuous sliding window comparisons to gene annotations, recombination rates, and other user-specified, highly customizable feature annotations. THEx provides a new platform for interactive phylogenomic data visualization to analyze and interpret the diverse evolutionary histories woven throughout genomes. Hosted on Conda, THEx integrates seamlessly into new or pre-existing workflows.


2021 ◽  
Author(s):  
Nita Parekh ◽  
Mayank Musaddi ◽  
Sanchari Sircar

Recent focus on transcriptomic studies in food crops like rice, wheat and maize provide new opportunities to address issues related to agriculture and climate change. Re-analysis of such data available in public domain supplemented with annotations across molecular hierarchy can be of immense help to the plant research community, particularly co-expression networks representing transcriptionally coordinated genes that are often part of the same biological process. With this objective we have developed NetREx, a Network based Rice Expression Analysis Server, that hosts ranked co-expression networks of Oryza sativa using publicly available mRNA-seq data across uniform experimental conditions. It provides a range of interactable data viewers and modules for analysing user queried genes across different stress conditions (drought, flood, cold and osmosis) and hormonal treatments (abscisic and jasmonic acid) and tissues (root and shoot). Subnetworks of user-defined genes can be queried in preconstructed tissue-specific networks, allowing users to view the fold-change, module memberships, gene annotations and analysis of their neighborhood genes and associated pathways. The webserver also allows querying of orthologous genes from Arabidopsis, wheat, maize, barley, and sorghum. Here we demonstrate that NetREx can be used to identify novel candidate genes and tissue-specific interactions under stress conditions and can aid in the analysis and understanding of complex phenotypes linked to stress response in rice. Available at: https://bioinf.iiit.ac.in/netrex/index.html


2021 ◽  
Vol 118 (52) ◽  
pp. e2109019118
Author(s):  
Scott Hotaling ◽  
Joanna L. Kelley ◽  
Paul B. Frandsen

In less than 25 y, the field of animal genome science has transformed from a discipline seeking its first glimpses into genome sequences across the Tree of Life to a global enterprise with ambitions to sequence genomes for all of Earth’s eukaryotic diversity [H. A. Lewin et al., Proc. Natl. Acad. Sci. U.S.A. 115, 4325–4333 (2018)]. As the field rapidly moves forward, it is important to take stock of the progress that has been made to best inform the discipline’s future. In this Perspective, we provide a contemporary, quantitative overview of animal genome sequencing. We identified the best available genome assemblies in GenBank, the world’s most extensive genetic database, for 3,278 unique animal species across 24 phyla. We assessed taxonomic representation, assembly quality, and annotation status for major clades. We show that while tremendous taxonomic progress has occurred, stark disparities in genomic representation exist, highlighted by a systemic overrepresentation of vertebrates and underrepresentation of arthropods. In terms of assembly quality, long-read sequencing has dramatically improved contiguity, whereas gene annotations are available for just 34.3% of taxa. Furthermore, we show that animal genome science has diversified in recent years with an ever-expanding pool of researchers participating. However, the field still appears to be dominated by institutions in the Global North, which have been listed as the submitting institution for 77% of all assemblies. We conclude by offering recommendations for improving genomic resource availability and research value while also broadening global representation.


Animals ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 3264
Author(s):  
Duo Xie ◽  
Guangji Chen ◽  
Xiaoyu Meng ◽  
Haotian Wang ◽  
Xupeng Bi ◽  
...  

Alleles that cause advantageous phenotypes with positive selection contribute to adaptive evolution. Investigations of positive selection in protein-coding genes rely on the accuracy of orthology, models, the quality of assemblies, and alignment. Here, based on the latest genome assemblies and gene annotations, we present a comparative analysis on positive selection in four great ape species and identify 211 high-confidence positively selected genes (PSGs). Even the differences in population size among these closely related great apes have resulted in differences in their ability to remove deleterious alleles and to adapt to changing environments, we found that they experienced comparable numbers of positive selection. We also uncovered that more than half of multigene families exhibited signals of positive selection, suggesting that imbalanced positive selection resulted in the functional divergence of duplicates. Moreover, at the expression level, although positive selection led to a more non-uniform pattern across tissues, the correlation between positive selection and expression patterns is diverse. Overall, this updated list of PSGs is of great significance for the further study of the phenotypic evolution in great apes.


2021 ◽  
Author(s):  
Chuanyi Zhang ◽  
Palash Sashittal ◽  
Mohammed El-Kebir

Genes in coronaviruses are preceded by transcription regulatory sequences (TRSs), which play a critical role in gene expression mediated by the viral RNA-dependent RNA-polymerase via the process of discontinuous transcription. In addition to being crucial for our understanding of the regulation and expression of coronavirus genes, we demonstrate for the first time how TRSs can be leveraged to identify gene locations in the coronavirus genome. To that end, we formulate the TRS AND GENE IDENTIFICATION (TRS-GENE-ID) problem of simultaneously identifying TRS sites and gene locations in unannotated coronavirus genomes. We introduce CORSID (CORe Sequence IDentifier), a computational tool to solve this problem. We also present CORSID-A, which solves a constrained version of the TRS-GENE-ID problem, the TRS IDENTIFICATION (TRS-ID) problem, identifying TRS sites in a coronavirus genome with specified gene annotations. We show that CORSID-A outperforms existing motif-based methods in identifying TRS sites in coronaviruses and that CORSID outperforms state-of-the-art gene finding methods in finding genes in coronavirus genomes. We demonstrate that CORSID enables de novo identification of TRS sites and genes in previously unannotated coronaviruses. CORSID is the first method to perform accurate and simultaneous identification of TRS sites and genes in coronavirus genomes without the use of any prior information.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yu Hamaguchi ◽  
Chao Zeng ◽  
Michiaki Hamada

Abstract Background Differential expression (DE) analysis of RNA-seq data typically depends on gene annotations. Different sets of gene annotations are available for the human genome and are continually updated–a process complicated with the development and application of high-throughput sequencing technologies. However, the impact of the complexity of gene annotations on DE analysis remains unclear. Results Using “mappability”, a metric of the complexity of gene annotation, we compared three distinct human gene annotations, GENCODE, RefSeq, and NONCODE, and evaluated how mappability affected DE analysis. We found that mappability was significantly different among the human gene annotations. We also found that increasing mappability improved the performance of DE analysis, and the impact of mappability mainly evident in the quantification step and propagated downstream of DE analysis systematically. Conclusions We assessed how the complexity of gene annotations affects DE analysis using mappability. Our findings indicate that the growth and complexity of gene annotations negatively impact the performance of DE analysis, suggesting that an approach that excludes unnecessary gene models from gene annotations improves the performance of DE analysis.


Author(s):  
Almut Heinken ◽  
Stefanía Magnúsdóttir ◽  
Ronan M T Fleming ◽  
Ines Thiele

Abstract Motivation Manual curation of genome-scale reconstructions is laborious, yet existing automated curation tools do not typically take species-specific experimental and curated genomic data into account. Results We developed DEMETER, a COBRA Toolbox extension, that enables the efficient, simultaneous refinement of thousands of draft genome-scale reconstructions, while ensuring adherence to the quality standards in the field, agreement with available experimental data, and refinement of pathways based on manually refined genome annotations. Availability DEMETER and tutorials are freely available at https://github.com/opencobra.


2021 ◽  
Vol 10 (35) ◽  
Author(s):  
Wesley C. Warren ◽  
Natalia S. Akopyants ◽  
Deborah E. Dobson ◽  
Christiane Hertz-Fowler ◽  
Lon-Fye Lye ◽  
...  

We report the high-quality draft assemblies and gene annotations for 13 species and/or strains of the protozoan parasite genera Leishmania , Endotrypanum , and Crithidia , which span the phylogenetic diversity of the subfamily Leishmaniinae within the kinetoplastid order of the phylum Euglenazoa. These resources will support studies on the origins of parasitism.


2021 ◽  
Author(s):  
Scott Hotaling ◽  
Joanna L Kelley ◽  
Paul B Frandsen

In less than 25 years, the field of animal genome science has transformed from a discipline seeking its first glimpses into genome sequences across the Tree of Life to a global enterprise with ambitions to sequence genomes for all of Earth's eukaryotic diversity (1). As the field rapidly moves forward, it is important to take stock of the progress that has been made to best inform the discipline's future. In this perspective, we provide a contemporary, quantitative perspective on animal genome sequencing. We identified the best available genome assemblies on GenBank, the world's most extensive genetic database, for 3,278 unique animals across 24 phyla. We assessed taxonomic representation, assembly quality, and annotation status for major clades. We show that while tremendous taxonomic progress has occurred, stark disparities in genomic representation exist, highlighted by a systemic overrepresentation of vertebrates and underrepresentation of arthropods. In terms of assembly quality, long-read sequencing has dramatically improved contiguity and, on average, gene annotations are available for just 34.3% of taxa. Furthermore, we show that animal genome science has diversified in recent years with an ever-expanding pool of researchers participating. However, the field still appears to be dominated by institutions in the Global North, which have been listed as the submitting institution for 77% of all assemblies. We conclude by offering recommendations for how we can collectively improve genomic resource availability and value while also broadening representation worldwide.


Sign in / Sign up

Export Citation Format

Share Document