Computational Prediction of De Novo Emerged Protein-Coding Genes

ABSTRACTThe Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromere, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text

Draft genome assembly data of Anoxybacillus sp. strain MB8 isolated from Tattapani hot springs, India

10.1101/2021.06.09.447659 ◽

2021 ◽

Author(s):

VISHNU PRASOODANAN P K ◽

Shruti S. Menon ◽

Rituja Saxena ◽

Prashant Waiker ◽

Vineet K Sharma

Keyword(s):

Hot Springs ◽

De Novo ◽

Draft Genome ◽

Gc Content ◽

Central India ◽

Glycoside Hydrolases ◽

Rrna Gene ◽

Aerobic Bacterium ◽

Protein Coding ◽

Protein Coding Genes

Discovery of novel thermophiles has shown promising applications in the field of biotechnology. Due to their thermal stability, they can survive the harsh processes in the industries, which make them important to be characterized and studied. Members of Anoxybacillus are alkaline tolerant thermophiles and have been extensively isolated from manure, dairy-processed plants, and geothermal hot springs. This article reports the assembled data of an aerobic bacterium Anoxybacillus sp. strain MB8, isolated from the Tattapani hot springs in Central India, where the 16S rRNA gene shares an identity of 97% (99% coverage) with Anoxybacillus kamchatkensis strain G10. The de novo assembly and annotation performed on the genome of Anoxybacillus sp. strain MB8 comprises of 2,898,780 bp (in 190 contigs) with a GC content of 41.8% and includes 2,976 protein-coding genes,1 rRNA operon, 73 tRNAs, 1 tm-RNA and 10 CRISPR arrays. The predicted protein-coding genes have been classified into 21 eggNOG categories. The KEGG Automated Annotation Server (KAAS) analysis indicated the presence of assimilatory sulfate reduction pathway, nitrate reducing pathway, and genes for glycoside hydrolases (GHs) and glycoside transferase (GTs). GHs and GTs hold widespread applications, in the baking and food industry for bread manufacturing, and in the paper, detergent and cosmetic industry. Hence, Anoxybacillus sp. strain MB8 holds the potential to be screened and characterized for such commercially relevant enzymes.

Download Full-text

Integrating healthcare and research genetic data empowers the discovery of 28 novel developmental disorders

10.1101/797787 ◽

2019 ◽

Cited By ~ 14

Author(s):

Joanna Kaplanis ◽

Kaitlin E. Samocha ◽

Laurens Wiel ◽

Zhancheng Zhang ◽

Kevin J. Arvai ◽

...

Keyword(s):

Developmental Disorders ◽

De Novo ◽

Genetic Data ◽

Statistical Test ◽

Integrated Healthcare ◽

Protein Coding ◽

Protein Coding Genes ◽

Clinical Diagnostic ◽

Simulation Based

SummaryDe novo mutations (DNMs) in protein-coding genes are a well-established cause of developmental disorders (DD). However, known DD-associated genes only account for a minority of the observed excess of such DNMs. To identify novel DD-associated genes, we integrated healthcare and research exome sequences on 31,058 DD parent-offspring trios, and developed a simulation-based statistical test to identify gene-specific enrichments of DNMs. We identified 285 significantly DD-associated genes, including 28 not previously robustly associated with DDs. Despite detecting more DD-associated genes than in any previous study, much of the excess of DNMs of protein-coding genes remains unaccounted for. Modelling suggests that over 1,000 novel DD-associated genes await discovery, many of which are likely to be less penetrant than the currently known genes. Research access to clinical diagnostic datasets will be critical for completing the map of dominant DDs.

Download Full-text

Phylogenetic relationships and taxonomic position of genus Hyperacrius (Rodentia: Arvicolinae) from Kashmir based on evidences from analysis of mitochondrial genome and study of skull morphology

PeerJ ◽

10.7717/peerj.10364 ◽

2020 ◽

Vol 8 ◽

pp. e10364

Author(s):

Natalia I. Abramson ◽

Fedor N. Golenishchev ◽

Semen Yu. Bodrov ◽

Olga V. Bondareva ◽

Evgeny A. Genelt-Yanovskiy ◽

...

Keyword(s):

Mitochondrial Genome ◽

De Novo ◽

Phylogenetic Analyses ◽

Complete Mitochondrial Genome ◽

Morphological Characters ◽

Molecular Data ◽

Phylogenetic Position ◽

Skull Morphology ◽

Protein Coding ◽

Protein Coding Genes

In this article, we present the nearly complete mitochondrial genome of the Subalpine Kashmir vole Hyperacrius fertilis (Arvicolinae, Cricetidae, Rodentia), assembled using data from Illumina next-generation sequencing (NGS) of the DNA from a century-old museum specimen. De novo assembly consisted of 16,341 bp and included all mitogenome protein-coding genes as well as 12S and 16S RNAs, tRNAs and D-loop. Using the alignment of protein-coding genes of 14 previously published Arvicolini tribe mitogenomes, seven Clethrionomyini mitogenomes, and also Ondatra and Dicrostonyx outgroups, we conducted phylogenetic reconstructions based on a dataset of 13 protein-coding genes (PCGs) under maximum likelihood and Bayesian inference. Phylogenetic analyses robustly supported the phylogenetic position of this species within the tribe Arvicolini. Among the Arvicolini, Hyperacrius represents one of the early-diverged lineages. This result of phylogenetic analysis altered the conventional view on phylogenetic relatedness between Hyperacrius and Alticola and prompted the revision of morphological characters underlying the former assumption. Morphological analysis performed here confirmed molecular data and provided additional evidence for taxonomic replacement of the genus Hyperacrius from the tribe Clethrionomyini to the tribe Arvicolini.

Download Full-text

The complete chloroplast genome of Saxifraga sinomontana (Saxifragaceae) and comparative analysis with other Saxifragaceae species

Revista Brasileira de Botânica ◽

10.1007/s40415-019-00561-y ◽

2019 ◽

Vol 42 (4) ◽

pp. 601-611 ◽

Cited By ~ 1

Author(s):

Yan Li ◽

Liukun Jia ◽

Zhihua Wang ◽

Rui Xing ◽

Xiaofeng Chi ◽

...

Keyword(s):

Comparative Analysis ◽

Chloroplast Genome ◽

Phylogenetic Relationships ◽

De Novo ◽

Single Copy ◽

Bootstrap Support ◽

Protein Coding ◽

Complete Chloroplast Genome ◽

Protein Coding Genes ◽

Chloroplast Genomes

Abstract Saxifraga sinomontana J.-T. Pan & Gornall belongs to Saxifraga sect. Ciliatae subsect. Hirculoideae, a lineage containing ca. 110 species whose phylogenetic relationships are largely unresolved due to recent rapid radiations. Analyses of complete chloroplast genomes have the potential to significantly improve the resolution of phylogenetic relationships in this young plant lineage. The complete chloroplast genome of S. sinomontana was de novo sequenced, assembled and then compared with that of other six Saxifragaceae species. The S. sinomontana chloroplast genome is 147,240 bp in length with a typical quadripartite structure, including a large single-copy region of 79,310 bp and a small single-copy region of 16,874 bp separated by a pair of inverted repeats (IRs) of 25,528 bp each. The chloroplast genome contains 113 unique genes, including 79 protein-coding genes, four rRNAs and 30 tRNAs, with 18 duplicates in the IRs. The gene content and organization are similar to other Saxifragaceae chloroplast genomes. Sixty-one simple sequence repeats were identified in the S. sinomontana chloroplast genome, mostly represented by mononucleotide repeats of polyadenine or polythymine. Comparative analysis revealed 12 highly divergent regions in the intergenic spacers, as well as coding genes of matK, ndhK, accD, cemA, rpoA, rps19, ndhF, ccsA, ndhD and ycf1. Phylogenetic reconstruction of seven Saxifragaceae species based on 66 protein-coding genes received high bootstrap support values for nearly all identified nodes, suggesting a promising opportunity to resolve infrasectional relationships of the most species-rich section Ciliatae of Saxifraga.

Download Full-text

Chromosome-Level Assembly of Drosophila bifasciata Reveals Important Karyotypic Transition of the X Chromosome

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400922 ◽

2020 ◽

Vol 10 (3) ◽

pp. 891-897 ◽

Cited By ~ 3

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

The Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193 Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromeres, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text

Draft Genome of the Macadamia Husk Spot Pathogen, Pseudocercospora macadamiae

Phytopathology ◽

10.1094/phyto-12-19-0460-a ◽

2020 ◽

Vol 110 (9) ◽

pp. 1503-1506

Author(s):

Olufemi A. Akinsanmi ◽

Lilia C. Carvalhais

Keyword(s):

Plant Disease Resistance ◽

Plant Disease ◽

De Novo ◽

Draft Genome ◽

Gc Content ◽

Disease Development ◽

Closely Related Species ◽

Protein Coding ◽

Protein Coding Genes ◽

The Family

Pseudocercospora macadamiae causes husk spot in macadamia in Australia. Lack of genomic resources for this pathogen has restricted acquiring knowledge on the mechanism of disease development, spread, and its role in fruit abscission. To address this gap, we sequenced the genome of P. macadamiae. The sequence was de novo assembled into a draft genome of 40 Mb, which is comparable to closely related species in the family Mycosphaerellaceae. The draft genome comprises 212 scaffolds, of which 99 scaffolds are over 50 kb. The genome has a 49% GC content and is predicted to contain 15,430 protein-coding genes. This draft genome sequence is the first for P. macadamiae and represents a valuable resource for understanding genome evolution and plant disease resistance.

Download Full-text

Quantifying gene selection in cancer through protein functional alteration bias

Nucleic Acids Research ◽

10.1093/nar/gkz546 ◽

2019 ◽

Vol 47 (13) ◽

pp. 6642-6655 ◽

Cited By ~ 7

Author(s):

Nadav Brandes ◽

Nathan Linial ◽

Michal Linial

Keyword(s):

Somatic Mutations ◽

Gene Selection ◽

De Novo ◽

Cancer Genes ◽

Driver Genes ◽

Protein Coding ◽

Protein Coding Genes ◽

Machine Learning Model ◽

Implicit And Explicit ◽

False Discoveries

Abstract Compiling the catalogue of genes actively involved in cancer is an ongoing endeavor, with profound implications to the understanding and treatment of the disease. An abundance of computational methods have been developed to screening the genome for candidate driver genes based on genomic data of somatic mutations in tumors. Existing methods make many implicit and explicit assumptions about the distribution of random mutations. We present FABRIC, a new framework for quantifying the selection of genes in cancer by assessing the effects of de-novo somatic mutations on protein-coding genes. Using a machine-learning model, we quantified the functional effects of ∼3M somatic mutations extracted from over 10 000 human cancerous samples, and compared them against the effects of all possible single-nucleotide mutations in the coding human genome. We detected 593 protein-coding genes showing statistically significant bias towards harmful mutations. These genes, discovered without any prior knowledge, show an overwhelming overlap with known cancer genes, but also include many overlooked genes. FABRIC is designed to avoid false discoveries by comparing each gene to its own background model using rigorous statistics, making minimal assumptions about the distribution of random somatic mutations. The framework is an open-source project with a simple command-line interface.

Download Full-text

De Novo Whole-Genome Sequencing of the Wood Rot Fungus Polyporus brumalis, Which Exhibits Potential Terpenoid Metabolism

Genome Announcements ◽

10.1128/genomea.00586-17 ◽

2017 ◽

Vol 5 (28) ◽

Author(s):

Su-Yeon Lee ◽

Ji-eun An ◽

Sun-Hwa Ryu ◽

Myungkil Kim

Keyword(s):

Single Molecule ◽

De Novo ◽

Gene Annotation ◽

Draft Genome ◽

Fungal Growth ◽

Protein Coding ◽

Sequencing Platform ◽

Protein Coding Genes ◽

Polyporus Brumalis ◽

Terpenoid Metabolism

ABSTRACT Polyporus brumalis is able to synthesize several sesquiterpenes during fungal growth. Using a single-molecule real-time sequencing platform, we present the 53-Mb draft genome of P. brumalis, which contains 6,231 protein-coding genes. Gene annotation and isolation support genetic information, which can increase the understanding of sesquiterpene metabolism in P. brumalis.

Download Full-text