scholarly journals Chromosome-level genome assembly of a butterflyfish, Chelmon rostratus

2019 ◽  
Author(s):  
Xiaoyun Huang ◽  
Yue Song ◽  
Suyu Zhang ◽  
A Yunga ◽  
Mengqi Zhang ◽  
...  

AbstractChelmon rostratus (Teleostei, Perciformes, Chaetodontidae) is a copperband butterflyfish. As an ornamental fish, the genome information for this species might help understanding the genome evolution of Chaetodontidae and adaptation/evolution of coral reef fish.In this study, using the stLFR co-Barcode reads data, we assembled a genome of 638.70 Mb in size with contig and scaffold N50 sizes of 294.41 kb and 2.61 Mb, respectively. 94.40% of scaffold sequences were assigned to 24 chromosomes using Hi-C data and BUSCO analysis showed that 97.3% (2,579) of core genes were found in our assembly. Up to 21.47 % of the genome was found to be repetitive sequences and 21,375 protein-coding genes were annotated. Among these annotated protein-coding genes, 20,163 (94.33%) proteins were assigned with possible functions.As the first genome for Chaetodontidae family, the information of these data helpfully to improve the essential to the further understanding and exploration of marine ecological environment symbiosis with coral and the genomic innovations and molecular mechanisms contributing to its unique morphology and physiological features.

GigaScience ◽  
2021 ◽  
Vol 10 (4) ◽  
Author(s):  
Tiantian Zhao ◽  
Wenxu Ma ◽  
Zhen Yang ◽  
Lisong Liang ◽  
Xin Chen ◽  
...  

Abstract Background Corylus heterophylla Fisch. is a species of the Betulaceae family native to China. As an economically and ecologically important nut tree, C. heterophylla can survive in extremely low temperatures (–30 to –40 °C). To deepen our knowledge of the Betulaceae species and facilitate the use of C. heterophylla for breeding and its genetic improvement, we have sequenced the whole genome of C. heterophylla. Findings Based on >64.99 Gb (∼175.30×) of Nanopore long reads, we assembled a 370.75-Mb C. heterophylla genome with contig N50 and scaffold N50 sizes of 2.07 and 31.33  Mb, respectively, accounting for 99.23% of the estimated genome size (373.61 Mb). Furthermore, 361.90 Mb contigs were anchored to 11 chromosomes using Hi-C link data, representing 97.61% of the assembled genome sequences. Transcriptomes representing 4 different tissues were sequenced to assist protein-coding gene prediction. A total of 27,591 protein-coding genes were identified, of which 92.02% (25,389) were functionally annotated. The phylogenetic analysis showed that C. heterophylla is close to Ostrya japonica, and they diverged from their common ancestor ∼52.79 million years ago. Conclusions We generated a high-quality chromosome-level genome of C. heterophylla. This genome resource will promote research on the molecular mechanisms of how the hazelnut responds to environmental stresses and serves as an important resource for genome-assisted improvement in cold and drought resistance of the Corylus genus.


2019 ◽  
Author(s):  
Ryan Bracewell ◽  
Anita Tran ◽  
Kamalakar Chatla ◽  
Doris Bachtrog

ABSTRACTThe Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromere, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.


2021 ◽  
Vol 6 ◽  
pp. 258
Author(s):  
Konrad Lohse ◽  
Alexander Mackintosh ◽  
Roger Vila ◽  
◽  
◽  
...  

We present a genome assembly from an individual male Aglais io (also known as Inachis io and Nymphalis io) (the European peacock; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 384 megabases in span. The majority (99.91%) of the assembly is scaffolded into 31 chromosomal pseudomolecules, with the Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 11,420 protein coding genes.


2019 ◽  
Author(s):  
Change Laura Tan

AbstractPublic access to thousands of completely sequenced and annotated genomes provides a great opportunity to address the relationships of different organisms, at the molecular level and on a genome-wide scale. Via comparing the phylogenetic profiles of all protein-coding genes in 317 model species described in the OrthoInspector3.0 database, we found that approximately 29.8% of the total protein-coding genes were orphan genes (genes unique to a specific species) while < 0.01% were universal genes (genes with homologs in each of the 317 species analyzed). When weighted by potential birth event, the orphan genes comprised 82% of the total, while the universal genes accounted for less than 0.00008%. Strikingly, as the analyzed genomes increased, the sum total of universal and nearly-universal genes plateaued while that of orphan and nearly-orphan genes grew continuously. When the compared species increased to the inclusion of 3863 bacteria, 711 eukaryotes, and 179 archaea, not one of the universal genes remained. The results speak to a previously unappreciated degree of genetic biodiversity, which we propose to quantify using the birth-event-weighted gene count method.


2020 ◽  
Author(s):  
Jinrong Huang ◽  
Lin Lin ◽  
Zhanying Dong ◽  
Ling Yang ◽  
Tianyu Zheng ◽  
...  

Abstract Adenosine-to-inosine (A-to-I) RNA editing, catalyzed by ADAR enzymes, is an essential post-transcriptional modification. Although hundreds of thousands of RNA editing sites have been reported in mammals, brain-wide analysis of the RNA editing in the mammalian brain remains rare. Here, a genome-wide RNA editing investigation is performed in 119 samples, representing 30 anatomically defined subregions in the pig brain. We identify a total of 682,037 A-to-I RNA editing sites of which 97% are not identified before. Within the pig brain, cerebellum and olfactory bulb are regions with most edited transcripts. The editing level of sites residing in protein-coding regions are similar across brain regions, whereas region-distinct editing is observed in repetitive sequences. Highly edited conserved recoding events in pig and human brain are found in neurotransmitter receptors, demonstrating the evolutionary importance of RNA editing in neurotransmission functions. The porcine brain-wide RNA landscape provides a rich resource to better understand the evolutionally importance of post-transcriptional RNA editing.


2020 ◽  
Vol 10 (3) ◽  
pp. 891-897 ◽  
Author(s):  
Ryan Bracewell ◽  
Anita Tran ◽  
Kamalakar Chatla ◽  
Doris Bachtrog

The Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193 Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromeres, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Jeanne Wilbrandt ◽  
Bernhard Misof ◽  
Kristen A. Panfilio ◽  
Oliver Niehuis

Abstract Background The location and modular structure of eukaryotic protein-coding genes in genomic sequences can be automatically predicted by gene annotation algorithms. These predictions are often used for comparative studies on gene structure, gene repertoires, and genome evolution. However, automatic annotation algorithms do not yet correctly identify all genes within a genome, and manual annotation is often necessary to obtain accurate gene models and gene sets. As manual annotation is time-consuming, only a fraction of the gene models in a genome is typically manually annotated, and this fraction often differs between species. To assess the impact of manual annotation efforts on genome-wide analyses of gene structural properties, we compared the structural properties of protein-coding genes in seven diverse insect species sequenced by the i5k initiative. Results Our results show that the subset of genes chosen for manual annotation by a research community (3.5–7% of gene models) may have structural properties (e.g., lengths and exon counts) that are not necessarily representative for a species’ gene set as a whole. Nonetheless, the structural properties of automatically generated gene models are only altered marginally (if at all) through manual annotation. Major correlative trends, for example a negative correlation between genome size and exonic proportion, can be inferred from either the automatically predicted or manually annotated gene models alike. Vice versa, some previously reported trends did not appear in either the automatic or manually annotated gene sets, pointing towards insect-specific gene structural peculiarities. Conclusions In our analysis of gene structural properties, automatically predicted gene models proved to be sufficiently reliable to recover the same gene-repertoire-wide correlative trends that we found when focusing on manually annotated gene models only. We acknowledge that analyses on the individual gene level clearly benefit from manual curation. However, as genome sequencing and annotation projects often differ in the extent of their manual annotation and curation efforts, our results indicate that comparative studies analyzing gene structural properties in these genomes can nonetheless be justifiable and informative.


Genome ◽  
2009 ◽  
Vol 52 (12) ◽  
pp. 975-984 ◽  
Author(s):  
Xiaoyu Kong ◽  
Xiaoli Dong ◽  
Yanchun Zhang ◽  
Wei Shi ◽  
Zhongming Wang ◽  
...  

The organization of fish mitochondrial genomes (mitogenomes) is quite conserved, usually with the heavy strand encoding 12 of 13 protein-coding genes and 14 of 22 tRNA genes, and the light strand encoding ND6 and the remaining 8 tRNA genes. Currently, there are only a few reports on gene reorganization of fish mitogenomes, with only two types of rearrangements (shuffling and translocation) observed. No gene inversion has been detected in approximately 420 complete fish mitogenomes available so far. Here we report a novel rearrangement in the mitogenome of Cynoglossus semilaevis (Cynoglossinae, Cynoglossidae, Pleuronectiformes). The genome is 16 371 bp in length and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes, and 2 main noncoding regions, the putative control region and the light-strand replication origin. A striking finding of this study is that the tRNAGln gene is translocated from the light to the heavy strand (Q inversion). This is accompanied by shuffling of the tRNAIle gene and long-range translocation of the putative control region downstream to a site between ND1 and the tRNAGln gene. The remaining gene order is identical to that of typical fish mitogenomes. Additionally, unique characters of this mitogenome, including a high A+T content and length variations of 8 protein-coding genes, were found through comparison of the mitogenome sequence with those from other flatfishes. All the features detected and their relationships with the rearrangements, as well as a possible rearrangement pathway, are discussed. These data provide interesting information for better understanding the molecular mechanisms of gene reorganization in fish mitogenomes.


Zootaxa ◽  
2020 ◽  
Vol 4890 (4) ◽  
pp. 451-472
Author(s):  
NERIVANIA NUNES GODEIRO ◽  
FENG ZHANG ◽  
NIKOLAS GIOIA CIPOLA

A new species of Seira from Koh Rong Sanloem Island, Cambodia, as well as its mitochondrial genome information, are herein described. Seira sanloemensis sp. nov. has a similar colour pattern compared to nine other species of Seira worldwide distributed, but the dorsal chaetotaxy is more similar to S. arunachala Mitra from India, S. camgiangensis Nguyễn from Vietnam, and S. gobalezai Christiansen & Bellinger from Hawaii. However, the new species differs from these species by dorsal chaetotaxy of head, Th II–III and Abd II, collophore chaetotaxy, and morphology of the empodial complex. This is the third Collembola species described for Cambodia. Its assembled incomplete mitogenome from MGI reads, has a length of 13,953 bp, and contains all protein-coding genes except for tree tRNAs missing; the gene order is the same of the Pancrustacean ancestral gene order. Based on the alignment of the 13 coding genes, a maximum likelihood phylogenetic tree of medium bootstrap values suggested that the Asian Seira species can represent a different lineage from the Neotropical Seirinae, but further biogeographic and divergence estimation analyses plus the inclusion of more Asian taxa are necessary to test such hypothesis. 


2018 ◽  
Vol 6 (3) ◽  
pp. e01443-17 ◽  
Author(s):  
Vivek Kumar Ranjan ◽  
Tilak Saha ◽  
Shriparna Mukherjee ◽  
Ranadhir Chakraborty

ABSTRACTThe draft genome sequence of a novel strain,Pseudomonassp. MR 02, a pyomelanin-producing bacterium isolated from the Mahananda River at Siliguri, West Bengal, India, is reported here. This strain has a genome size of 5.94 Mb, with an overall G+C content of 62.6%. The draft genome reports 5,799 genes (mean gene length, 923 bp), among which 5,503 are protein-coding genes, including the genes required for the catabolism of tyrosine or phenylalanine for the characteristic production of homogentisic acid (HGA). Excess HGA, on excretion, auto-oxidizes and polymerizes to form pyomelanin.


Sign in / Sign up

Export Citation Format

Share Document