scholarly journals The Absence of Universally-Conserved Protein-coding Genes

2019 ◽  
Author(s):  
Change Laura Tan

AbstractPublic access to thousands of completely sequenced and annotated genomes provides a great opportunity to address the relationships of different organisms, at the molecular level and on a genome-wide scale. Via comparing the phylogenetic profiles of all protein-coding genes in 317 model species described in the OrthoInspector3.0 database, we found that approximately 29.8% of the total protein-coding genes were orphan genes (genes unique to a specific species) while < 0.01% were universal genes (genes with homologs in each of the 317 species analyzed). When weighted by potential birth event, the orphan genes comprised 82% of the total, while the universal genes accounted for less than 0.00008%. Strikingly, as the analyzed genomes increased, the sum total of universal and nearly-universal genes plateaued while that of orphan and nearly-orphan genes grew continuously. When the compared species increased to the inclusion of 3863 bacteria, 711 eukaryotes, and 179 archaea, not one of the universal genes remained. The results speak to a previously unappreciated degree of genetic biodiversity, which we propose to quantify using the birth-event-weighted gene count method.

2019 ◽  
Author(s):  
Leeban Yusuf ◽  
Matthew C. Heatley ◽  
Joseph P.G. Palmer ◽  
Henry J. Barton ◽  
Christopher R. Cooney ◽  
...  

AbstractRecent progress has been made in identifying genomic regions implicated in trait evolution on a microevolutionary scale in many species, but whether these are relevant over macroevolutionary time remains unclear. Here, we directly address this fundamental question using bird beak shape, a key evolutionary innovation linked to patterns of resource use, divergence and speciation, as a model trait. We integrate class-wide geometric-morphometric analyses with evolutionary sequence analyses of 10,322 protein coding genes as well as 229,001 genomic regions spanning 72 species. We identify 1,434 protein coding genes and 39,806 noncoding regions for which molecular rates were significantly related to rates of bill shape evolution. We show that homologs of the identified protein coding genes as well as genes in close proximity to the identified noncoding regions are involved in craniofacial embryo development in mammals. They are associated with embryonic stem cells pathways, including BMP and Wnt signalling, both of which have repeatedly been implicated in the morphological development of avian beaks. This suggests that identifying genotype-phenotype association on a genome wide scale over macroevolutionary time is feasible. While the coding and noncoding gene sets are associated with similar pathways, the actual genes are highly distinct, with significantly reduced overlap between them and bill-related phenotype associations specific to noncoding loci. Evidence for signatures of recent diversifying selection on our identified noncoding loci in Darwin finch populations further suggests that regulatory rather than coding changes are major drivers of morphological diversification over macroevolutionary times.


2021 ◽  
Vol 6 ◽  
pp. 258
Author(s):  
Konrad Lohse ◽  
Alexander Mackintosh ◽  
Roger Vila ◽  
◽  
◽  
...  

We present a genome assembly from an individual male Aglais io (also known as Inachis io and Nymphalis io) (the European peacock; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 384 megabases in span. The majority (99.91%) of the assembly is scaffolded into 31 chromosomal pseudomolecules, with the Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 11,420 protein coding genes.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Jeanne Wilbrandt ◽  
Bernhard Misof ◽  
Kristen A. Panfilio ◽  
Oliver Niehuis

Abstract Background The location and modular structure of eukaryotic protein-coding genes in genomic sequences can be automatically predicted by gene annotation algorithms. These predictions are often used for comparative studies on gene structure, gene repertoires, and genome evolution. However, automatic annotation algorithms do not yet correctly identify all genes within a genome, and manual annotation is often necessary to obtain accurate gene models and gene sets. As manual annotation is time-consuming, only a fraction of the gene models in a genome is typically manually annotated, and this fraction often differs between species. To assess the impact of manual annotation efforts on genome-wide analyses of gene structural properties, we compared the structural properties of protein-coding genes in seven diverse insect species sequenced by the i5k initiative. Results Our results show that the subset of genes chosen for manual annotation by a research community (3.5–7% of gene models) may have structural properties (e.g., lengths and exon counts) that are not necessarily representative for a species’ gene set as a whole. Nonetheless, the structural properties of automatically generated gene models are only altered marginally (if at all) through manual annotation. Major correlative trends, for example a negative correlation between genome size and exonic proportion, can be inferred from either the automatically predicted or manually annotated gene models alike. Vice versa, some previously reported trends did not appear in either the automatic or manually annotated gene sets, pointing towards insect-specific gene structural peculiarities. Conclusions In our analysis of gene structural properties, automatically predicted gene models proved to be sufficiently reliable to recover the same gene-repertoire-wide correlative trends that we found when focusing on manually annotated gene models only. We acknowledge that analyses on the individual gene level clearly benefit from manual curation. However, as genome sequencing and annotation projects often differ in the extent of their manual annotation and curation efforts, our results indicate that comparative studies analyzing gene structural properties in these genomes can nonetheless be justifiable and informative.


2018 ◽  
Vol 6 (3) ◽  
pp. e01443-17 ◽  
Author(s):  
Vivek Kumar Ranjan ◽  
Tilak Saha ◽  
Shriparna Mukherjee ◽  
Ranadhir Chakraborty

ABSTRACTThe draft genome sequence of a novel strain,Pseudomonassp. MR 02, a pyomelanin-producing bacterium isolated from the Mahananda River at Siliguri, West Bengal, India, is reported here. This strain has a genome size of 5.94 Mb, with an overall G+C content of 62.6%. The draft genome reports 5,799 genes (mean gene length, 923 bp), among which 5,503 are protein-coding genes, including the genes required for the catabolism of tyrosine or phenylalanine for the characteristic production of homogentisic acid (HGA). Excess HGA, on excretion, auto-oxidizes and polymerizes to form pyomelanin.


2021 ◽  
Vol 6 ◽  
pp. 266
Author(s):  
Roger Vila ◽  
Alex Hayward ◽  
Konrad Lohse ◽  
Charlotte Wright ◽  
◽  
...  

We present a genome assembly from an individual male Melitaea cinxia (the Glanville fritillary; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 499 megabases in span. The complete assembly is scaffolded into 31 chromosomal pseudomolecules, with the Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 13,666 protein coding genes.


2019 ◽  
Author(s):  
Haley Wight ◽  
Junhui Zhou ◽  
Muzi Li ◽  
Sridhar Hannenhalli ◽  
Stephen M. Mount ◽  
...  

AbstractThe red raspberry, Rubus idaeus, is widely distributed in all temperate regions of Europe, Asia, and North America and is a major commercial fruit valued for its taste, high antioxidant and vitamin content. However, Rubus breeding is a long and slow process hampered by limited genomic and molecular resources. Genomic resources such as a complete genome sequencing and transcriptome will be of exceptional value to improve research and breeding of this high value crop. Using a hybrid sequence assembly approach including data from both long and short sequence reads, we present the first assembly of the Rubus idaeus genome (Joan J. variety). The de novo assembled genome consists of 2,145 scaffolds with a genome completeness of 95.3% and an N50 score of 638 KB. Leveraging a linkage map, we anchored 80.1% of the genome onto seven chromosomes. Using over 1 billion paired-end RNAseq reads, we annotated 35,566 protein coding genes with a transcriptome completeness score of 97.2%. The Rubus idaeus genome provides an important new resource for researchers and breeders.


2021 ◽  
Vol 6 ◽  
pp. 304
Author(s):  
Alex Hayward ◽  
Roger Vila ◽  
Dominik R. Laetsch ◽  
Konrad Lohse ◽  
Tobias Baril ◽  
...  

We present a genome assembly from an individual female Melitaea athalia (also known as Mellicta athalia; the heath fritillary; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 610 megabases in span. In total, 99.98% of the assembly is scaffolded into 32 chromosomal pseudomolecules, with the W and Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 12,824 protein coding genes.


2019 ◽  
Author(s):  
Xiaoyun Huang ◽  
Yue Song ◽  
Suyu Zhang ◽  
A Yunga ◽  
Mengqi Zhang ◽  
...  

AbstractChelmon rostratus (Teleostei, Perciformes, Chaetodontidae) is a copperband butterflyfish. As an ornamental fish, the genome information for this species might help understanding the genome evolution of Chaetodontidae and adaptation/evolution of coral reef fish.In this study, using the stLFR co-Barcode reads data, we assembled a genome of 638.70 Mb in size with contig and scaffold N50 sizes of 294.41 kb and 2.61 Mb, respectively. 94.40% of scaffold sequences were assigned to 24 chromosomes using Hi-C data and BUSCO analysis showed that 97.3% (2,579) of core genes were found in our assembly. Up to 21.47 % of the genome was found to be repetitive sequences and 21,375 protein-coding genes were annotated. Among these annotated protein-coding genes, 20,163 (94.33%) proteins were assigned with possible functions.As the first genome for Chaetodontidae family, the information of these data helpfully to improve the essential to the further understanding and exploration of marine ecological environment symbiosis with coral and the genomic innovations and molecular mechanisms contributing to its unique morphology and physiological features.


2021 ◽  
Vol 17 ◽  
pp. 117693432110389
Author(s):  
Olubukola Oluranti Babalola ◽  
Bartholomew Saanu Adeleke ◽  
Ayansina Segun Ayangbenro

In recent times, diverse agriculturally important endophytic bacteria colonizing plant endosphere have been identified. Harnessing the potential of Bacillus species from sunflower could reveal their biotechnological and agricultural importance. Here, we present genomic insights into B. cereus T4S isolated from sunflower sourced from Lichtenburg, South Africa. Genome analysis revealed a sequence read count of 7 255 762, a genome size of 5 945 881 bp, and G + C content of 34.8%. The genome contains various protein-coding genes involved in various metabolic pathways. The detection of genes involved in the metabolism of organic substrates and chemotaxis could enhance plant-microbe interactions in the synthesis of biological products with biotechnological and agricultural importance.


2016 ◽  
Vol 4 (4) ◽  
Author(s):  
Welkin H. Pope ◽  
Anshika Bandyopadhyay ◽  
Meghan L. Carlton ◽  
Meghan T. Kane ◽  
Niyati J. Panchal ◽  
...  

Gordonia bacteriophage Yvonnetastic was isolated from soil in Pittsburgh, PA, using Gordonia terrae 3612 as a host. Yvonnetastic has siphoviral morphology and a genome of 98,136 bp, with 198 predicted protein-coding genes and five tRNA genes. Yvonnetastic does not share substantial sequence similarity with other sequenced bacteriophage genomes.


Sign in / Sign up

Export Citation Format

Share Document