scholarly journals A high-quality assembly of the nine-spined stickleback (Pungitius pungitius) genome

Author(s):  
Srinidhi Varadharajan ◽  
Pasi Rastas ◽  
Ari Löytynoja ◽  
Michael Matschiner ◽  
Federico C F Calboli ◽  
...  

Abstract The Gasterosteidae fish family hosts several species that are important models for eco-evolutionary, genetic and genomic research. In particular, a wealth of genetic and genomic data has been generated for the three-spined stickleback (Gasterosteus aculeatus), the ‘ecology’s supermodel’, while the genomic resources for the nine-spined stickleback (Pungitius pungitius) have remained relatively scarce. Here, we report a high-quality chromosome-level genome assembly of P. pungitius consisting of 5,303 contigs (N50 = 1.2 Mbp) with a total size of 521 Mbp. These contigs were mapped to 21 linkage groups using a high-density linkage map, yielding a final assembly with 98.5% BUSCO completeness. A total of 25,062 protein-coding genes were annotated, and ca. 23% of the assembly was found to consist of repetitive elements. A comprehensive analysis of repetitive elements uncovered centromeric-specific tandem repeats and provided insights into the evolution of retrotransposons. A multigene phylogenetic analysis inferred a divergence time of about 26 million years (MYA) between nine- and three-spined sticklebacks, which is far older than the commonly assumed estimate of 13 MYA. Compared to the three-spined stickleback, we identified an additional duplication of several genes in the hemoglobin cluster. Sequencing data from populations adapted to different environments indicated potential copy number variations in hemoglobin genes. Furthermore, genome-wide synteny comparisons between three- and nine-spined sticklebacks identified chromosomal rearrangements underlying the karyotypic differences between the two species. The high-quality chromosome-scale assembly of the nine-spined stickleback genome obtained with long-read sequencing technology provides a crucial resource for comparative and population genomic investigations of stickleback fishes and teleosts.

2019 ◽  
Author(s):  
Srinidhi Varadharajan ◽  
Pasi Rastas ◽  
Ari Löytynoja ◽  
Michael Matschiner ◽  
Federico C. F. Calboli ◽  
...  

AbstractThe Gasterostidae fish family hosts several species that are important models for eco-evolutionary, genetic and genomic research. In particular, a wealth of genetic and genomic data have been generated for the three-spined stickleback (Gasterosteus aculeatus), the ‘ecology’s supermodel’, while the genomic resources for the nine-spined stickleback (Pungitius pungitius) have remained relatively scarce. Here, we report a high-quality chromosome-level genome assembly of P. pungitius consisting of 5,303 contigs (N50 = 1.2 Mbp) with a total size of 521 Mbp. These contigs were mapped to 21 linkage groups using a high-density linkage map, yielding a final assembly with 98.5% BUSCO completeness. A total of 25,062 protein-coding genes were annotated, and ca. 23% of the assembly was found to consist of repetitive elements. A comprehensive analysis of repetitive elements uncovered centromeric-specific tandem repeats and provided insights into the evolution of retrotransposons. A multigene phylogenetic analysis inferred a divergence time of about 26 million years (MYA) between nine- and three-spined sticklebacks, which is far older than the commonly assumed estimate of 13 MYA. Compared to the three-spined stickleback, we identified an additional duplication of several genes in the hemoglobin cluster. Sequencing data from populations adapted to different environments indicated potential copy number variations in hemoglobin genes. Furthermore, genome-wide synteny comparisons between three- and nine-spined sticklebacks identified chromosomal rearrangements underlying the karyotypic differences between the two species. The high-quality chromosome-scale assembly of the nine-spined stickleback genome obtained with long-read sequencing technology provides a crucial resource for comparative and population genomic investigations of stickleback fishes and teleosts.


Author(s):  
Alexandra Pavlova ◽  
Katherine Harrisson ◽  
Rustam Turakulov ◽  
Yin Peng Lee ◽  
Brett Ingram ◽  
...  

Sex-specific ecology has management implications, but rapid sex-chromosome turnover in fishes hinders development of markers to sex monomorphic species. Here, we use annotated genomes and reduced-representation sequencing data for two Australian percichthyids, the Macquarie perch Macquaria australasica and the golden perch M. ambigua, and whole genome resequencing data for 50 Macquarie perch of each sex, to detect sex-linked loci, identify a candidate sex-determining gene and develop an affordable sexing assay. In-silico pool-seq tests of 1,492,004 Macquarie perch SNP loci revealed that a 275-Kb scaffold, containing the transcription factor SOX1b gene, was enriched for gametologous loci. Within this scaffold, 22 loci were sex-linked in a predominantly XY system, with females being homozygous at all 22, and males being heterozygous at two or more. Seven XY-gametologous loci were within a 146-bp region. Being ~38 Kb upstream of SOX1b, it might act as an enhancer controlling SOX1b transcription in the bipotential gonad that drives gonad differentiation. A PCR-RFLP sexing assay, targeting one of the Y-linked SNPs, tested in 66 known-sex Macquarie perch and two individuals of each sex of three confamilial species, and amplicon sequencing of 400 bp encompassing the 146-bp region, revealed that the few sex-linked positions differ between species and between Macquarie perch populations. This indicates sex-chromosome lability in Percichthyidae, also supported by non-homologous scaffolds containing sex-linked loci for Macquarie- and golden perches. The resources developed here will facilitate genomic research in Percichthyidae. Sex-linked markers will be useful for determining genetic sex in some populations and studying sex chromosome turnover.


2021 ◽  
Author(s):  
Xier Luo ◽  
Kuiqing Cui ◽  
Zhiqiang Wang ◽  
Lijuan Yin ◽  
Zhipeng Li ◽  
...  

AbstractFasciola giganticaandFasciola hepaticaare causative pathogens offascioliasis, with the widest latitudinal, longitudinal, and altitudinal distribution; however, among parasites, they have the largest sequenced genomes, hindering genomic research. In the present study, we used various sequencing and assembly technologies to generate a new high-qualityFasciola giganticareference genome. We improved the integration of gene structure prediction, and identified two independent transposable element expansion events contributing to (1) the speciation between Fasciola and Fasciolopsis during the Cretaceous-Paleogene boundary mass extinction, and (2) the habitat switch to the liver during the Paleocene-Eocene Thermal Maximum, accompanied by gene length increment. Long interspersed element (LINE) duplication contributed to the second transposon-mediated alteration, showing an obvious trend of insertion into gene regions, regardless of strong purifying selection. Gene ontology analysis of genes with long LINE insertions identified membrane-associated and vesicle secretion process proteins, further implicating the functional alteration of the gene network. We identified 852 excretory/secretory proteins and 3300 protein-protein interactions betweenFasciola giganticaand its host. Among them, copper/zinc superoxide dismutase genes, with specific gene copy number variations, might play a central role in the phase I detoxification process. Analysis of 559 single-copy orthologs suggested thatFasciola giganticaandFasciola hepaticadiverged at 11.8 Ma near the Middle and Late Miocene Epoch boundary. We identified 98 rapidly evolving gene families, including actin and aquaporin, which might explain the large body size and the parasitic adaptive character resulting in these liver flukes becoming epidemic in tropical and subtropical regions.


2021 ◽  
Vol 15 (10) ◽  
pp. e0009750
Author(s):  
Xier Luo ◽  
Kuiqing Cui ◽  
Zhiqiang Wang ◽  
Zhipeng Li ◽  
Zhengjiao Wu ◽  
...  

Fasciola gigantica and Fasciola hepatica are causative pathogens of fascioliasis, with the widest latitudinal, longitudinal, and altitudinal distribution; however, among parasites, they have the largest sequenced genomes, hindering genomic research. In the present study, we used various sequencing and assembly technologies to generate a new high-quality Fasciola gigantica reference genome. We improved the integration of gene structure prediction, and identified two independent transposable element expansion events contributing to (1) the speciation between Fasciola and Fasciolopsis during the Cretaceous-Paleogene boundary mass extinction, and (2) the habitat switch to the liver during the Paleocene-Eocene Thermal Maximum, accompanied by gene length increment. Long interspersed element (LINE) duplication contributed to the second transposon-mediated alteration, showing an obvious trend of insertion into gene regions, regardless of strong purifying effect. Gene ontology analysis of genes with long LINE insertions identified membrane-associated and vesicle secretion process proteins, further implicating the functional alteration of the gene network. We identified 852 predicted excretory/secretory proteins and 3300 protein-protein interactions between Fasciola gigantica and its host. Among them, copper/zinc superoxide dismutase genes, with specific gene copy number variations, might play a central role in the phase I detoxification process. Analysis of 559 single-copy orthologs suggested that Fasciola gigantica and Fasciola hepatica diverged at 11.8 Ma near the Middle and Late Miocene Epoch boundary. We identified 98 rapidly evolving gene families, including actin and aquaporin, which might explain the large body size and the parasitic adaptive character resulting in these liver flukes becoming epidemic in tropical and subtropical regions.


2020 ◽  
Author(s):  
Mikko Kivikoski ◽  
Pasi Rastas ◽  
Ari Löytynoja ◽  
Juha Merilä

AbstractThe utility of genome-wide sequencing data in biological research depends heavily on the quality of the reference genome. Although the reference genomes have improved, it is evident that the assemblies could still be refined, especially in non-model study organisms. Here, we describe an integrative approach to improve contiguity and haploidy of a reference genome assembly. With two novel features of Lep-Anchor software and a combination of dense linkage maps, overlap detection and bridging long reads we generated an improved assembly of the nine-spined stickleback (Pungitius pungitius) reference genome. We were able to remove a significant number of haplotypic contigs, detect more genetic variation and improve the contiguity of the genome, especially that of X chromosome. However, improved scaffolding cannot correct for mosaicism of erroneously assembled contigs, demonstrated by a de novo assembly of a 1.7 Mbp inversion. Qualitatively similar gains were obtained with the genome of three-spined stickleback (Gasterosteus aculeatus).


2021 ◽  
Author(s):  
Lu Zhao ◽  
Hang Wang ◽  
Le Wu ◽  
Kuo Sun ◽  
De-Long Guan ◽  
...  

AbstractSphingonotus Fieber, 1852 (Orthoptera: Acrididae) is a species-rich grasshopper genus with ~146 species. All species of this genus prefer dry environments, such as: desert, steppe, sand, and stony benchland. This genomic study aimed to understand the evolution and ecology of these grasshopper species. Here, the genome size of Sphingonotus tsinlingensis was estimated using flow cytometry and the first high-quality full-length transcriptome of this species is presented, which may serve as a reference genetic resource for the drought-adapted grasshopper species of Sphingonotus Fieber. The genome size of Sphingonotus tsinlingensis was ~12.8 Gb. Based on the 146.98 Gb Pacbio isoform sequencing data, 221.47 Mb full-length transcripts were assembled. Among these transcripts, 88,693 non-redundant isoforms were identified with an average length of 2,497 bp and an N50 value of 2,726 bp, which was much longer than the formal grasshopper transcriptome assemblies. A total of 48,502 protein coding sequences were determined, and 37,569 were annotated in public gene function databases. A total of 36,488 simple tandem repeats, 12,765 long non-coding RNAs, and 414 transcription factors were also identified. According to gene functions, 70 heat shock proteins and 61 P450 genes that may correspond to drought adaptation of S. tsinlingensis were identified. The genome of Sphingonotus tsinlingensis is an ultra-large and complex genome. Full-length transcriptome sequencing is an ideal strategy for genomic research. This is the first full-length transcriptome of the genus Sphingonotus. The assembly parameters were better than all known grasshopper transcriptomes. This full-length transcriptome may be used to understand its genetic background and the evolution and ecology of grasshoppers.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Caitlin M. Singleton ◽  
Francesca Petriglieri ◽  
Jannie M. Kristensen ◽  
Rasmus H. Kirkegaard ◽  
Thomas Y. Michaelsen ◽  
...  

AbstractMicroorganisms play crucial roles in water recycling, pollution removal and resource recovery in the wastewater industry. The structure of these microbial communities is increasingly understood based on 16S rRNA amplicon sequencing data. However, such data cannot be linked to functional potential in the absence of high-quality metagenome-assembled genomes (MAGs) for nearly all species. Here, we use long-read and short-read sequencing to recover 1083 high-quality MAGs, including 57 closed circular genomes, from 23 Danish full-scale wastewater treatment plants. The MAGs account for ~30% of the community based on relative abundance, and meet the stringent MIMAG high-quality draft requirements including full-length rRNA genes. We use the information provided by these MAGs in combination with >13 years of 16S rRNA amplicon sequencing data, as well as Raman microspectroscopy and fluorescence in situ hybridisation, to uncover abundant undescribed lineages belonging to important functional groups.


2021 ◽  
Vol 11 (2) ◽  
pp. 131
Author(s):  
Laura B. Scheinfeldt ◽  
Andrew Brangan ◽  
Dara M. Kusic ◽  
Sudhir Kumar ◽  
Neda Gharani

Pharmacogenomics holds the promise of personalized drug efficacy optimization and drug toxicity minimization. Much of the research conducted to date, however, suffers from an ascertainment bias towards European participants. Here, we leverage publicly available, whole genome sequencing data collected from global populations, evolutionary characteristics, and annotated protein features to construct a new in silico machine learning pharmacogenetic identification method called XGB-PGX. When applied to pharmacogenetic data, XGB-PGX outperformed all existing prediction methods and identified over 2000 new pharmacogenetic variants. While there are modest pharmacogenetic allele frequency distribution differences across global population samples, the most striking distinction is between the relatively rare putatively neutral pharmacogene variants and the relatively common established and newly predicted functional pharamacogenetic variants. Our findings therefore support a focus on individual patient pharmacogenetic testing rather than on clinical presumptions about patient race, ethnicity, or ancestral geographic residence. We further encourage more attention be given to the impact of common variation on drug response and propose a new ‘common treatment, common variant’ perspective for pharmacogenetic prediction that is distinct from the types of variation that underlie complex and Mendelian disease. XGB-PGX has identified many new pharmacovariants that are present across all global communities; however, communities that have been underrepresented in genomic research are likely to benefit the most from XGB-PGX’s in silico predictions.


Genes ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 1001
Author(s):  
Jiyoon Han ◽  
Joonhong Park

A simultaneous analysis of nucleotide changes and copy number variations (CNVs) based on exome sequencing data was demonstrated as a potential new first-tier diagnosis strategy for rare neuropsychiatric disorders. In this report, using depth-of-coverage analysis from exome sequencing data, we described variable phenotypes of epilepsy, intellectual disability (ID), and schizophrenia caused by 12p13.33–p13.32 terminal microdeletion in a Korean family. We hypothesized that CACNA1C and KDM5A genes of the six candidate genes located in this region were the best candidates for explaining epilepsy, ID, and schizophrenia and may be responsible for clinical features reported in cases with monosomy of the 12p13.33 subtelomeric region. On the background of microdeletion syndrome, which was described in clinical cases with mild, moderate, and severe neurodevelopmental manifestations as well as impairments, the clinician may determine whether the patient will end up with a more severe or milder end‐phenotype, which in turn determines disease prognosis. In our case, the 12p13.33–p13.32 terminal microdeletion may explain the variable expressivity in the same family. However, further comprehensive studies with larger cohorts focusing on careful phenotyping across the lifespan are required to clearly elucidate the possible contribution of genetic modifiers and the environmental influence on the expressivity of 12p13.33 microdeletion and associated characteristics.


Author(s):  
Pu Liu ◽  
Wang Xiaojie ◽  
Dong Hongjie ◽  
Jianbin Lan ◽  
Kuan Liang ◽  
...  

Diaporthe spp. are critical plant pathogens that cause wood cankers, wilt, dieback, and fruit rot in a wide variety of economic plant hosts and are regarded as one of the most acute threats faced by kiwifruit industry worldwide. Diaporthe phragmitis strain NJD1 is a highly pathogenic isolate of soft rot of kiwifruit. Here, we present a high-quality genome-wide sequence of D. phragmitis NJD1 that was assembled into 28 contigs containing a total size of 58.33 Mb and N50 length of 3.55 Mb. These results lay a solid foundation for understanding host–pathogen interaction and improving disease management strategies.


Sign in / Sign up

Export Citation Format

Share Document