A high-quality assembly of the nine-spined stickleback (Pungitius pungitius) genome

Abstract The Gasterosteidae fish family hosts several species that are important models for eco-evolutionary, genetic and genomic research. In particular, a wealth of genetic and genomic data has been generated for the three-spined stickleback (Gasterosteus aculeatus), the ‘ecology’s supermodel’, while the genomic resources for the nine-spined stickleback (Pungitius pungitius) have remained relatively scarce. Here, we report a high-quality chromosome-level genome assembly of P. pungitius consisting of 5,303 contigs (N50 = 1.2 Mbp) with a total size of 521 Mbp. These contigs were mapped to 21 linkage groups using a high-density linkage map, yielding a final assembly with 98.5% BUSCO completeness. A total of 25,062 protein-coding genes were annotated, and ca. 23% of the assembly was found to consist of repetitive elements. A comprehensive analysis of repetitive elements uncovered centromeric-specific tandem repeats and provided insights into the evolution of retrotransposons. A multigene phylogenetic analysis inferred a divergence time of about 26 million years (MYA) between nine- and three-spined sticklebacks, which is far older than the commonly assumed estimate of 13 MYA. Compared to the three-spined stickleback, we identified an additional duplication of several genes in the hemoglobin cluster. Sequencing data from populations adapted to different environments indicated potential copy number variations in hemoglobin genes. Furthermore, genome-wide synteny comparisons between three- and nine-spined sticklebacks identified chromosomal rearrangements underlying the karyotypic differences between the two species. The high-quality chromosome-scale assembly of the nine-spined stickleback genome obtained with long-read sequencing technology provides a crucial resource for comparative and population genomic investigations of stickleback fishes and teleosts.

Download Full-text

Genome sequencing of the nine-spined stickleback (Pungitius pungitius) provides insights into chromosome evolution

10.1101/741751 ◽

2019 ◽

Cited By ~ 1

Author(s):

Srinidhi Varadharajan ◽

Pasi Rastas ◽

Ari Löytynoja ◽

Michael Matschiner ◽

Federico C. F. Calboli ◽

...

Keyword(s):

Gasterosteus Aculeatus ◽

Tandem Repeats ◽

Copy Number Variations ◽

Repetitive Elements ◽

Genomic Research ◽

Sequencing Data ◽

Pungitius Pungitius ◽

High Quality ◽

Total Size ◽

Fish Family

AbstractThe Gasterostidae fish family hosts several species that are important models for eco-evolutionary, genetic and genomic research. In particular, a wealth of genetic and genomic data have been generated for the three-spined stickleback (Gasterosteus aculeatus), the ‘ecology’s supermodel’, while the genomic resources for the nine-spined stickleback (Pungitius pungitius) have remained relatively scarce. Here, we report a high-quality chromosome-level genome assembly of P. pungitius consisting of 5,303 contigs (N50 = 1.2 Mbp) with a total size of 521 Mbp. These contigs were mapped to 21 linkage groups using a high-density linkage map, yielding a final assembly with 98.5% BUSCO completeness. A total of 25,062 protein-coding genes were annotated, and ca. 23% of the assembly was found to consist of repetitive elements. A comprehensive analysis of repetitive elements uncovered centromeric-specific tandem repeats and provided insights into the evolution of retrotransposons. A multigene phylogenetic analysis inferred a divergence time of about 26 million years (MYA) between nine- and three-spined sticklebacks, which is far older than the commonly assumed estimate of 13 MYA. Compared to the three-spined stickleback, we identified an additional duplication of several genes in the hemoglobin cluster. Sequencing data from populations adapted to different environments indicated potential copy number variations in hemoglobin genes. Furthermore, genome-wide synteny comparisons between three- and nine-spined sticklebacks identified chromosomal rearrangements underlying the karyotypic differences between the two species. The high-quality chromosome-scale assembly of the nine-spined stickleback genome obtained with long-read sequencing technology provides a crucial resource for comparative and population genomic investigations of stickleback fishes and teleosts.

Download Full-text

Labile sex chromosomes and a novel candidate sex-determination gene in the Australian freshwater fish family Percichthyidae

10.22541/au.162696809.95212733/v1 ◽

2021 ◽

Author(s):

Alexandra Pavlova ◽

Katherine Harrisson ◽

Rustam Turakulov ◽

Yin Peng Lee ◽

Brett Ingram ◽

...

Keyword(s):

Sex Chromosome ◽

Amplicon Sequencing ◽

Genomic Research ◽

Sequencing Data ◽

Fish Family ◽

Reduced Representation ◽

Gonad Differentiation ◽

Pcr Rflp ◽

Whole Genome Resequencing ◽

Australian Freshwater

Sex-specific ecology has management implications, but rapid sex-chromosome turnover in fishes hinders development of markers to sex monomorphic species. Here, we use annotated genomes and reduced-representation sequencing data for two Australian percichthyids, the Macquarie perch Macquaria australasica and the golden perch M. ambigua, and whole genome resequencing data for 50 Macquarie perch of each sex, to detect sex-linked loci, identify a candidate sex-determining gene and develop an affordable sexing assay. In-silico pool-seq tests of 1,492,004 Macquarie perch SNP loci revealed that a 275-Kb scaffold, containing the transcription factor SOX1b gene, was enriched for gametologous loci. Within this scaffold, 22 loci were sex-linked in a predominantly XY system, with females being homozygous at all 22, and males being heterozygous at two or more. Seven XY-gametologous loci were within a 146-bp region. Being ~38 Kb upstream of SOX1b, it might act as an enhancer controlling SOX1b transcription in the bipotential gonad that drives gonad differentiation. A PCR-RFLP sexing assay, targeting one of the Y-linked SNPs, tested in 66 known-sex Macquarie perch and two individuals of each sex of three confamilial species, and amplicon sequencing of 400 bp encompassing the 146-bp region, revealed that the few sex-linked positions differ between species and between Macquarie perch populations. This indicates sex-chromosome lability in Percichthyidae, also supported by non-homologous scaffolds containing sex-linked loci for Macquarie- and golden perches. The resources developed here will facilitate genomic research in Percichthyidae. Sex-linked markers will be useful for determining genetic sex in some populations and studying sex chromosome turnover.

Download Full-text

High-quality reference genome ofFasciola gigantica: Insights into the genomic signatures of transposon-mediated evolution and specific parasitic adaption in tropical regions

10.1101/2021.04.09.439143 ◽

2021 ◽

Author(s):

Xier Luo ◽

Kuiqing Cui ◽

Zhiqiang Wang ◽

Lijuan Yin ◽

Zhipeng Li ◽

...

Keyword(s):

Reference Genome ◽

Fasciola Hepatica ◽

Process Analysis ◽

Large Body ◽

Copy Number Variations ◽

Fasciola Gigantica ◽

Genomic Research ◽

Gene Copy ◽

Specific Gene ◽

High Quality

AbstractFasciola giganticaandFasciola hepaticaare causative pathogens offascioliasis, with the widest latitudinal, longitudinal, and altitudinal distribution; however, among parasites, they have the largest sequenced genomes, hindering genomic research. In the present study, we used various sequencing and assembly technologies to generate a new high-qualityFasciola giganticareference genome. We improved the integration of gene structure prediction, and identified two independent transposable element expansion events contributing to (1) the speciation between Fasciola and Fasciolopsis during the Cretaceous-Paleogene boundary mass extinction, and (2) the habitat switch to the liver during the Paleocene-Eocene Thermal Maximum, accompanied by gene length increment. Long interspersed element (LINE) duplication contributed to the second transposon-mediated alteration, showing an obvious trend of insertion into gene regions, regardless of strong purifying selection. Gene ontology analysis of genes with long LINE insertions identified membrane-associated and vesicle secretion process proteins, further implicating the functional alteration of the gene network. We identified 852 excretory/secretory proteins and 3300 protein-protein interactions betweenFasciola giganticaand its host. Among them, copper/zinc superoxide dismutase genes, with specific gene copy number variations, might play a central role in the phase I detoxification process. Analysis of 559 single-copy orthologs suggested thatFasciola giganticaandFasciola hepaticadiverged at 11.8 Ma near the Middle and Late Miocene Epoch boundary. We identified 98 rapidly evolving gene families, including actin and aquaporin, which might explain the large body size and the parasitic adaptive character resulting in these liver flukes becoming epidemic in tropical and subtropical regions.

Download Full-text

High-quality reference genome of Fasciola gigantica: Insights into the genomic signatures of transposon-mediated evolution and specific parasitic adaption in tropical regions

PLoS Neglected Tropical Diseases ◽

10.1371/journal.pntd.0009750 ◽

2021 ◽

Vol 15 (10) ◽

pp. e0009750

Author(s):

Xier Luo ◽

Kuiqing Cui ◽

Zhiqiang Wang ◽

Zhipeng Li ◽

Zhengjiao Wu ◽

...

Keyword(s):

Reference Genome ◽

Fasciola Hepatica ◽

Process Analysis ◽

Large Body ◽

Copy Number Variations ◽

Fasciola Gigantica ◽

Genomic Research ◽

Gene Copy ◽

Specific Gene ◽

High Quality

Fasciola gigantica and Fasciola hepatica are causative pathogens of fascioliasis, with the widest latitudinal, longitudinal, and altitudinal distribution; however, among parasites, they have the largest sequenced genomes, hindering genomic research. In the present study, we used various sequencing and assembly technologies to generate a new high-quality Fasciola gigantica reference genome. We improved the integration of gene structure prediction, and identified two independent transposable element expansion events contributing to (1) the speciation between Fasciola and Fasciolopsis during the Cretaceous-Paleogene boundary mass extinction, and (2) the habitat switch to the liver during the Paleocene-Eocene Thermal Maximum, accompanied by gene length increment. Long interspersed element (LINE) duplication contributed to the second transposon-mediated alteration, showing an obvious trend of insertion into gene regions, regardless of strong purifying effect. Gene ontology analysis of genes with long LINE insertions identified membrane-associated and vesicle secretion process proteins, further implicating the functional alteration of the gene network. We identified 852 predicted excretory/secretory proteins and 3300 protein-protein interactions between Fasciola gigantica and its host. Among them, copper/zinc superoxide dismutase genes, with specific gene copy number variations, might play a central role in the phase I detoxification process. Analysis of 559 single-copy orthologs suggested that Fasciola gigantica and Fasciola hepatica diverged at 11.8 Ma near the Middle and Late Miocene Epoch boundary. We identified 98 rapidly evolving gene families, including actin and aquaporin, which might explain the large body size and the parasitic adaptive character resulting in these liver flukes becoming epidemic in tropical and subtropical regions.

Download Full-text

New gains with a slimmer genome – An automated approach to improve reference genome assemblies

10.1101/2020.08.18.255596 ◽

2020 ◽

Author(s):

Mikko Kivikoski ◽

Pasi Rastas ◽

Ari Löytynoja ◽

Juha Merilä

Keyword(s):

Gasterosteus Aculeatus ◽

Reference Genome ◽

De Novo ◽

Biological Research ◽

Integrative Approach ◽

Sequencing Data ◽

Pungitius Pungitius ◽

Model Study ◽

Long Reads ◽

Genome Assemblies

AbstractThe utility of genome-wide sequencing data in biological research depends heavily on the quality of the reference genome. Although the reference genomes have improved, it is evident that the assemblies could still be refined, especially in non-model study organisms. Here, we describe an integrative approach to improve contiguity and haploidy of a reference genome assembly. With two novel features of Lep-Anchor software and a combination of dense linkage maps, overlap detection and bridging long reads we generated an improved assembly of the nine-spined stickleback (Pungitius pungitius) reference genome. We were able to remove a significant number of haplotypic contigs, detect more genetic variation and improve the contiguity of the genome, especially that of X chromosome. However, improved scaffolding cannot correct for mosaicism of erroneously assembled contigs, demonstrated by a de novo assembly of a 1.7 Mbp inversion. Qualitatively similar gains were obtained with the genome of three-spined stickleback (Gasterosteus aculeatus).

Download Full-text

Genome Size Estimation and Full-length Transcriptome of Sphingonotus tsinlingensis: Genetic Background for the Drought-Adapted Grasshopper

10.1101/2021.03.08.434347 ◽

2021 ◽

Author(s):

Lu Zhao ◽

Hang Wang ◽

Le Wu ◽

Kuo Sun ◽

De-Long Guan ◽

...

Keyword(s):

Genome Size ◽

Genetic Background ◽

Tandem Repeats ◽

Average Length ◽

Full Length ◽

Genomic Research ◽

Size Estimation ◽

Sequencing Data ◽

Grasshopper Species ◽

Genomic Study

AbstractSphingonotus Fieber, 1852 (Orthoptera: Acrididae) is a species-rich grasshopper genus with ~146 species. All species of this genus prefer dry environments, such as: desert, steppe, sand, and stony benchland. This genomic study aimed to understand the evolution and ecology of these grasshopper species. Here, the genome size of Sphingonotus tsinlingensis was estimated using flow cytometry and the first high-quality full-length transcriptome of this species is presented, which may serve as a reference genetic resource for the drought-adapted grasshopper species of Sphingonotus Fieber. The genome size of Sphingonotus tsinlingensis was ~12.8 Gb. Based on the 146.98 Gb Pacbio isoform sequencing data, 221.47 Mb full-length transcripts were assembled. Among these transcripts, 88,693 non-redundant isoforms were identified with an average length of 2,497 bp and an N50 value of 2,726 bp, which was much longer than the formal grasshopper transcriptome assemblies. A total of 48,502 protein coding sequences were determined, and 37,569 were annotated in public gene function databases. A total of 36,488 simple tandem repeats, 12,765 long non-coding RNAs, and 414 transcription factors were also identified. According to gene functions, 70 heat shock proteins and 61 P450 genes that may correspond to drought adaptation of S. tsinlingensis were identified. The genome of Sphingonotus tsinlingensis is an ultra-large and complex genome. Full-length transcriptome sequencing is an ideal strategy for genomic research. This is the first full-length transcriptome of the genus Sphingonotus. The assembly parameters were better than all known grasshopper transcriptomes. This full-length transcriptome may be used to understand its genetic background and the evolution and ecology of grasshoppers.

Download Full-text

Connecting structure to function with the recovery of over 1000 high-quality metagenome-assembled genomes from activated sludge using long-read sequencing

Nature Communications ◽

10.1038/s41467-021-22203-2 ◽

2021 ◽

Vol 12 (1) ◽

Cited By ~ 2

Author(s):

Caitlin M. Singleton ◽

Francesca Petriglieri ◽

Jannie M. Kristensen ◽

Rasmus H. Kirkegaard ◽

Thomas Y. Michaelsen ◽

...

Keyword(s):

16S Rrna ◽

Wastewater Treatment Plants ◽

In Situ Hybridisation ◽

Amplicon Sequencing ◽

Rrna Genes ◽

Fluorescence In Situ Hybridisation ◽

Sequencing Data ◽

High Quality ◽

16S Rrna Amplicon Sequencing ◽

Long Read

AbstractMicroorganisms play crucial roles in water recycling, pollution removal and resource recovery in the wastewater industry. The structure of these microbial communities is increasingly understood based on 16S rRNA amplicon sequencing data. However, such data cannot be linked to functional potential in the absence of high-quality metagenome-assembled genomes (MAGs) for nearly all species. Here, we use long-read and short-read sequencing to recover 1083 high-quality MAGs, including 57 closed circular genomes, from 23 Danish full-scale wastewater treatment plants. The MAGs account for ~30% of the community based on relative abundance, and meet the stringent MIMAG high-quality draft requirements including full-length rRNA genes. We use the information provided by these MAGs in combination with >13 years of 16S rRNA amplicon sequencing data, as well as Raman microspectroscopy and fluorescence in situ hybridisation, to uncover abundant undescribed lineages belonging to important functional groups.

Download Full-text

Common Treatment, Common Variant: Evolutionary Prediction of Functional Pharmacogenomic Variants

Journal of Personalized Medicine ◽

10.3390/jpm11020131 ◽

2021 ◽

Vol 11 (2) ◽

pp. 131

Author(s):

Laura B. Scheinfeldt ◽

Andrew Brangan ◽

Dara M. Kusic ◽

Sudhir Kumar ◽

Neda Gharani

Keyword(s):

In Silico ◽

Drug Efficacy ◽

Common Variant ◽

Genomic Research ◽

Whole Genome Sequencing Data ◽

Mendelian Disease ◽

Allele Frequency Distribution ◽

Sequencing Data ◽

Patient Race ◽

The Impact

Pharmacogenomics holds the promise of personalized drug efficacy optimization and drug toxicity minimization. Much of the research conducted to date, however, suffers from an ascertainment bias towards European participants. Here, we leverage publicly available, whole genome sequencing data collected from global populations, evolutionary characteristics, and annotated protein features to construct a new in silico machine learning pharmacogenetic identification method called XGB-PGX. When applied to pharmacogenetic data, XGB-PGX outperformed all existing prediction methods and identified over 2000 new pharmacogenetic variants. While there are modest pharmacogenetic allele frequency distribution differences across global population samples, the most striking distinction is between the relatively rare putatively neutral pharmacogene variants and the relatively common established and newly predicted functional pharamacogenetic variants. Our findings therefore support a focus on individual patient pharmacogenetic testing rather than on clinical presumptions about patient race, ethnicity, or ancestral geographic residence. We further encourage more attention be given to the impact of common variation on drug response and propose a new ‘common treatment, common variant’ perspective for pharmacogenetic prediction that is distinct from the types of variation that underlie complex and Mendelian disease. XGB-PGX has identified many new pharmacovariants that are present across all global communities; however, communities that have been underrepresented in genomic research are likely to benefit the most from XGB-PGX’s in silico predictions.

Download Full-text

Variable Phenotypes of Epilepsy, Intellectual Disability, and Schizophrenia Caused by 12p13.33–p13.32 Terminal Microdeletion in a Korean Family: A Case Report and Literature Review

Genes ◽

10.3390/genes12071001 ◽

2021 ◽

Vol 12 (7) ◽

pp. 1001

Author(s):

Jiyoon Han ◽

Joonhong Park

Keyword(s):

Intellectual Disability ◽

Exome Sequencing ◽

Environmental Influence ◽

Copy Number Variations ◽

Genetic Modifiers ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Korean Family ◽

Coverage Analysis ◽

Patient Will

A simultaneous analysis of nucleotide changes and copy number variations (CNVs) based on exome sequencing data was demonstrated as a potential new first-tier diagnosis strategy for rare neuropsychiatric disorders. In this report, using depth-of-coverage analysis from exome sequencing data, we described variable phenotypes of epilepsy, intellectual disability (ID), and schizophrenia caused by 12p13.33–p13.32 terminal microdeletion in a Korean family. We hypothesized that CACNA1C and KDM5A genes of the six candidate genes located in this region were the best candidates for explaining epilepsy, ID, and schizophrenia and may be responsible for clinical features reported in cases with monosomy of the 12p13.33 subtelomeric region. On the background of microdeletion syndrome, which was described in clinical cases with mild, moderate, and severe neurodevelopmental manifestations as well as impairments, the clinician may determine whether the patient will end up with a more severe or milder end‐phenotype, which in turn determines disease prognosis. In our case, the 12p13.33–p13.32 terminal microdeletion may explain the variable expressivity in the same family. However, further comprehensive studies with larger cohorts focusing on careful phenotyping across the lifespan are required to clearly elucidate the possible contribution of genetic modifiers and the environmental influence on the expressivity of 12p13.33 microdeletion and associated characteristics.

Download Full-text

High-Quality Genome Resource of the Pathogen of Diaporthe (Phomopsis) phragmitis Causing Kiwifruit Soft Rot

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-08-20-0236-a ◽

2020 ◽

Cited By ~ 1

Author(s):

Pu Liu ◽

Wang Xiaojie ◽

Dong Hongjie ◽

Jianbin Lan ◽

Kuan Liang ◽

...

Keyword(s):

Plant Pathogens ◽

Management Strategies ◽

Soft Rot ◽

Fruit Rot ◽

Economic Plant ◽

High Quality ◽

Total Size ◽

Genome Wide ◽

Solid Foundation ◽

High Quality Genome

Diaporthe spp. are critical plant pathogens that cause wood cankers, wilt, dieback, and fruit rot in a wide variety of economic plant hosts and are regarded as one of the most acute threats faced by kiwifruit industry worldwide. Diaporthe phragmitis strain NJD1 is a highly pathogenic isolate of soft rot of kiwifruit. Here, we present a high-quality genome-wide sequence of D. phragmitis NJD1 that was assembled into 28 contigs containing a total size of 58.33 Mb and N50 length of 3.55 Mb. These results lay a solid foundation for understanding host–pathogen interaction and improving disease management strategies.

Download Full-text