The genome of New Zealand trevally (Carangidae: Pseudocaranx georgianus) uncovers a XY sex determination locus

Abstract Background The genetic control of sex determination in teleost species is poorly understood. This is partly because of the diversity of mechanisms that determine sex in this large group of vertebrates, including constitutive genes linked to sex chromosomes, polygenic constitutive mechanisms, environmental factors, hermaphroditism, and unisexuality. Here we use a de novo genome assembly of New Zealand silver trevally (Pseudocaranx georgianus) together with sex-specific whole genome sequencing data to detect sexually divergent genomic regions, identify candidate genes and develop molecular makers. Results The de novo assembly of an unsexed trevally (Trevally_v1) resulted in a final assembly of 579.4 Mb in length, with a N50 of 25.2 Mb. Of the assembled scaffolds, 24 were of chromosome scale, ranging from 11 to 31 Mb in length. A total of 28,416 genes were annotated after 12.8 % of the assembly was masked with repetitive elements. Whole genome re-sequencing of 13 wild sexed trevally (seven males and six females) identified two sexually divergent regions located on two scaffolds, including a 6 kb region at the proximal end of chromosome 21. Blast analyses revealed similarity between one region and the aromatase genes cyp19 (a1a/b) (E-value < 1.00E-25, identity > 78.8 %). Males contained higher numbers of heterozygous variants in both regions, while females showed regions of very low read-depth, indicative of male-specificity of this genomic region. Molecular markers were developed and subsequently tested on 96 histologically-sexed fish (42 males and 54 females). Three markers amplified in absolute correspondence with sex (positive in males, negative in females). Conclusions The higher number of heterozygous variants in males combined with the absence of these regions in females support a XY sex-determination model, indicating that the trevally_v1 genome assembly was developed from a male specimen. This sex system contrasts with the ZW sex-determination model documented in closely related carangid species. Our results indicate a sex-determining function of a cyp19a1a-like gene, suggesting the molecular pathway of sex determination is somewhat conserved in this family. The genomic resources developed here will facilitate future comparative work, and enable improved insights into the varied sex determination pathways in teleosts. The sex marker developed in this study will be a valuable resource for aquaculture selective breeding programmes, and for determining sex ratios in wild populations.

Download Full-text

The genome of New Zealand trevally (Carangidae: Pseudocaranx georgianus) uncovers a XY sex determination locus

10.1101/2021.04.25.441282 ◽

2021 ◽

Author(s):

Mike Ruigrok ◽

Andrew Catanach ◽

Deepa Bowatte ◽

Marcus Davey ◽

Roy Storey ◽

...

Keyword(s):

New Zealand ◽

Sex Determination ◽

Genome Assembly ◽

De Novo ◽

Read Depth ◽

Chromosome 21 ◽

Whole Genome ◽

De Novo Genome Assembly ◽

Teleost Species ◽

Sex Marker

Background: The genetic control of sex determinism in teleost species is poorly understood. This is partly because of the diversity of sex determining mechanisms in this large group, including constitutive genes linked to sex chromosomes, polygenic constitutive mechanisms, environmental factors, hermaphroditism, and unisexuality. Here we use a de novo genome assembly of New Zealand silver trevally (Pseudocaranx georgianus) together with whole genome sequencing to detect sexually divergent regions, identify candidate genes and develop molecular makers. Results: The de novo assembly of an unsexed trevally (Trevally_v1) resulted in an assembly of 579.4 Mb in length, with a N50 of 25.2 Mb. Of the assembled scaffolds, 24 were of chromosome scale, ranging from 11 to 31 Mb. A total of 28416 genes were annotated after 12.8% of the assembly was masked with repetitive elements. Whole genome re-sequencing of 13 sexed trevally (7 males, 6 females) identified sexually divergent regions located on two scaffolds, including a 6 kb region at the proximal end of chromosome 21. Blast analyses revealed similarity between one region and the aromatase genes cyp19 (a1a/b). Males contained higher numbers of heterozygous variants in both regions, while females showed regions of very low read-depth, indicative of deletions. Molecular markers tested on 96 histologically-sexed fish (42 males, 54 females). Three markers amplified in absolute correspondence with sex. Conclusions: The higher number of heterozygous variants in males combined with deletions in females support a XY sex-determination model, indicating the trevally_v1 genome assembly was based on a male. This sex system contrasts with the ZW-type sex system documented in closely related species. Our results indicate a likely sex-determining function of the cyp19b-like gene, suggesting the molecular pathway of sex determination is somewhat conserved in this family. Our genomic resources will facilitate future comparative genomics works in teleost species, and enable improved insights into the varied sex determination pathways in this group of vertebrates. The sex marker will be a valuable resource for aquaculture breeding programmes, and for determining sex ratios and sex-specific impacts in wild fisheries stocks of this species.

Download Full-text

Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data

BMC Bioinformatics ◽

10.1186/s12859-017-1927-y ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 21

Author(s):

Kosai Al-Nakeeb ◽

Thomas Nordahl Petersen ◽

Thomas Sicheritz-Pontén

Keyword(s):

Mitochondrial Dna ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

An emergent clade of SARS-CoV-2 linked to returned travellers from Iran

10.1101/2020.03.15.992818 ◽

2020 ◽

Cited By ~ 20

Author(s):

John-Sebastian Eden ◽

Rebecca Rockett ◽

Ian Carter ◽

Hossinur Rahman ◽

Joep de Ligt ◽

...

Keyword(s):

New Zealand ◽

Infectious Diseases ◽

Genome Sequencing ◽

Phylogenetic Analyses ◽

Emerging Infectious Diseases ◽

Whole Genome Sequencing Data ◽

Viral Diversity ◽

Whole Genome ◽

Sequencing Data ◽

Public Data

AbstractThe SARS-CoV-2 epidemic has rapidly spread outside China with major outbreaks occurring in Italy, South Korea and Iran. Phylogenetic analyses of whole genome sequencing data identified a distinct SARS-CoV-2 clade linked to travellers returning from Iran to Australia and New Zealand. This study highlights potential viral diversity driving the epidemic in Iran, and underscores the power of rapid genome sequencing and public data sharing to improve the detection and management of emerging infectious diseases.

Download Full-text

De novo indels within introns contribute to ASD incidence

10.1101/137471 ◽

2017 ◽

Cited By ~ 2

Author(s):

Adriana Munoz ◽

Boris Yamrom ◽

Yoon-ha Lee ◽

Peter Andrews ◽

Steven Marks ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Target Genes ◽

De Novo ◽

Whole Genome Sequencing Data ◽

P Value ◽

Whole Genome ◽

Sequencing Data ◽

Control Sets ◽

The Difference

AbstractCopy number profiling and whole-exome sequencing has allowed us to make remarkable progress in our understanding of the genetics of autism over the past ten years, but there are major aspects of the genetics that are unresolved. Through whole-genome sequencing, additional types of genetic variants can be observed. These variants are abundant and to know which are functional is challenging. We have analyzed whole-genome sequencing data from 510 of the Simons Simplex Collections quad families and focused our attention on intronic variants. Within the introns of 546 high-quality autism target genes, we identified 63 de novo indels in the affected and only 37 in the unaffected siblings. The difference of 26 events is significantly larger than expected (p-val = 0.01) and using reasonable extrapolation shows that de novo intronic indels can contribute to at least 10% of simplex autism. The significance increases if we restrict to the half of the autism targets that are intolerant to damaging variants in the normal human population, which half we expect to be even more enriched for autism genes. For these 273 targets we observe 43 and 20 events in affected and unaffected siblings, respectively (p-value of 0.005). There was no significant signal in the number of de novo intronic indels in any of the control sets of genes analyzed. We see no signal from de novo substitutions in the introns of target genes.

Download Full-text

An emergent clade of SARS-CoV-2 linked to returned travellers from Iran

Virus Evolution ◽

10.1093/ve/veaa027 ◽

2020 ◽

Vol 6 (1) ◽

Cited By ~ 43

Author(s):

John-Sebastian Eden ◽

Rebecca Rockett ◽

Ian Carter ◽

Hossinur Rahman ◽

Joep de Ligt ◽

...

Keyword(s):

New Zealand ◽

Infectious Diseases ◽

Genome Sequencing ◽

Phylogenetic Analyses ◽

Emerging Infectious Diseases ◽

Whole Genome Sequencing Data ◽

Viral Diversity ◽

Whole Genome ◽

Sequencing Data ◽

Public Data

Abstract The SARS-CoV-2 epidemic has rapidly spread outside China with major outbreaks occurring in Italy, South Korea, and Iran. Phylogenetic analyses of whole-genome sequencing data identified a distinct SARS-CoV-2 clade linked to travellers returning from Iran to Australia and New Zealand. This study highlights potential viral diversity driving the epidemic in Iran, and underscores the power of rapid genome sequencing and public data sharing to improve the detection and management of emerging infectious diseases.

Download Full-text

De novo whole genome sequencing data of two mangrove-isolated microalgae from Terengganu coastal waters

Data in Brief ◽

10.1016/j.dib.2019.104680 ◽

2019 ◽

Vol 27 ◽

pp. 104680 ◽

Cited By ~ 2

Author(s):

Kit Yinn Teh ◽

C.L.Wan Afifudeen ◽

Ahmad Aziz ◽

Li Lian Wong ◽

Saw Hong Loh ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Coastal Waters ◽

De Novo ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

De novo ZIC2 frameshift variant associated with frontonasal dysplasia in a Limousin calf

BMC Genomics ◽

10.1186/s12864-020-07350-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Marina Braun ◽

Annika Lehmbecker ◽

Deborah Eikelberg ◽

Maren Hellige ◽

Andreas Beineke ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

De Novo Mutation ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Craniofacial Malformations ◽

Frontonasal Dysplasia ◽

Affected Calf

Abstract Background Bovine frontonasal dysplasias like arhinencephaly, synophthalmia, cyclopia and anophthalmia are sporadic congenital facial malformations. In this study, computed tomography, necropsy, histopathological examinations and whole genome sequencing on an Illumina NextSeq500 were performed to characterize a stillborn Limousin calf with frontonasal dysplasia. In order to identify private genetic and structural variants, we screened whole genome sequencing data of the affected calf and unaffected relatives including parents, a maternal and paternal halfsibling. Results The stillborn calf exhibited severe craniofacial malformations. Nose and maxilla were absent, mandibles were upwardly curved and a median cleft palate was evident. Eyes, optic nerve and orbital cavities were not developed and the rudimentary orbita showed hypotelorism. A defect centrally in the front skull covered with a membrane extended into the intracranial cavity. Aprosencephaly affected telencephalic and diencephalic structures and cerebellum. In addition, a shortened tail was seen. Filtering whole genome sequencing data revealed a private frameshift variant within the candidate gene ZIC2 in the affected calf. This variant was heterozygous mutant in this case and homozygous wild type in parents, half-siblings and controls. Conclusions We found a novel ZIC2 frameshift mutation in an aprosencephalic Limousin calf. The origin of this variant is most likely due to a de novo mutation in the germline of one parent or during very early embryonic development. To the authors’ best knowledge, this is the first identified mutation in cattle associated with bovine frontonasal dysplasia.

Download Full-text

NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data

GigaScience ◽

10.1093/gigascience/giab046 ◽

2021 ◽

Vol 10 (7) ◽

Author(s):

Michael D Linderman ◽

Crystal Paudyal ◽

Musab Shakeel ◽

William Kelley ◽

Ali Bashir ◽

...

Keyword(s):

Next Generation Sequencing ◽

De Novo ◽

Training Data ◽

Next Generation Sequencing Data ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Next Generation ◽

Structural Variants ◽

Sequencing Data ◽

Generation Sequencing

Abstract Background Structural variants (SVs) play a causal role in numerous diseases but are difficult to detect and accurately genotype (determine zygosity) in whole-genome next-generation sequencing data. SV genotypers that assume that the aligned sequencing data uniformly reflect the underlying SV or use existing SV call sets as training data can only partially account for variant and sample-specific biases. Results We introduce NPSV, a machine learning–based approach for genotyping previously discovered SVs that uses next-generation sequencing simulation to model the combined effects of the genomic region, sequencer, and alignment pipeline on the observed SV evidence. We evaluate NPSV alongside existing SV genotypers on multiple benchmark call sets. We show that NPSV consistently achieves or exceeds state-of-the-art genotyping accuracy across SV call sets, samples, and variant types. NPSV can specifically identify putative de novo SVs in a trio context and is robust to offset SV breakpoints. Conclusions Growing SV databases and the increasing availability of SV calls from long-read sequencing make stand-alone genotyping of previously identified SVs an increasingly important component of genome analyses. By treating potential biases as a “black box” that can be simulated, NPSV provides a framework for accurately genotyping a broad range of SVs in both targeted and genome-scale applications.

Download Full-text

First de novo whole genome sequencing and assembly of the bar-headed goose

PeerJ ◽

10.7717/peerj.8914 ◽

2020 ◽

Vol 8 ◽

pp. e8914 ◽

Cited By ~ 1

Author(s):

Wen Wang ◽

Fang Wang ◽

Rongkai Hao ◽

Aizhen Wang ◽

Kirill Sharshov ◽

...

Keyword(s):

High Altitude ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genome Assembly ◽

De Novo ◽

Gene Prediction ◽

Repetitive Sequences ◽

Gene Families ◽

Whole Genome ◽

Sequencing Data

Background The bar-headed goose (Anser indicus) mainly inhabits the plateau wetlands of Asia. As a specialized high-altitude species, bar-headed geese can migrate between South and Central Asia and annually fly twice over the Himalayan mountains along the central Asian flyway. The physiological, biochemical and behavioral adaptations of bar-headed geese to high-altitude living and flying have raised much interest. However, to date, there is still no genome assembly information publicly available for bar-headed geese. Methods In this study, we present the first de novo whole genome sequencing and assembly of the bar-headed goose, along with gene prediction and annotation. Results 10X Genomics sequencing produced a total of 124 Gb sequencing data, which can cover the estimated genome size of bar-headed goose for 103 times (average coverage). The genome assembly comprised 10,528 scaffolds, with a total length of 1.143 Gb and a scaffold N50 of 10.09 Mb. Annotation of the bar-headed goose genome assembly identified a total of 102 Mb (8.9%) of repetitive sequences, 16,428 protein-coding genes, and 282 tRNAs. In total, we determined that there were 63 expanded and 20 contracted gene families in the bar-headed goose compared with the other 15 vertebrates. We also performed a positive selection analysis between the bar-headed goose and the closely related low-altitude goose, swan goose (Anser cygnoides), to uncover its genetic adaptations to the Qinghai-Tibetan Plateau. Conclusion We reported the currently most complete genome sequence of the bar-headed goose. Our assembly will provide a valuable resource to enhance further studies of the gene functions of bar-headed goose. The data will also be valuable for facilitating studies of the evolution, population genetics and high-altitude adaptations of the bar-headed geese at the genomic level.

Download Full-text

LDscaff: LD-based scaffolding of de novo genome assemblies

BMC Bioinformatics ◽

10.1186/s12859-020-03895-7 ◽

2020 ◽

Vol 21 (S21) ◽

Author(s):

Zicheng Zhao ◽

Yingxiao Zhou ◽

Shuai Wang ◽

Xiuqing Zhang ◽

Changfa Wang ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Genetic Recombination ◽

Draft Genome ◽

Simulated Data ◽

Population Data ◽

Real Data ◽

Physical Distance ◽

Whole Genome Sequencing Data ◽

Sequencing Data

Abstract Background Genome assembly is fundamental for de novo genome analysis. Hybrid assembly, utilizing various sequencing technologies increases both contiguity and accuracy. While such approaches require extra costly sequencing efforts, the information provided millions of existed whole-genome sequencing data have not been fully utilized to resolve the task of scaffolding. Genetic recombination patterns in population data indicate non-random association among alleles at different loci, can provide physical distance signals to guide scaffolding. Results In this paper, we propose LDscaff for draft genome assembly incorporating linkage disequilibrium information in population data. We evaluated the performance of our method with both simulated data and real data. We simulated scaffolds by splitting the pig reference genome and reassembled them. Gaps between scaffolds were introduced ranging from 0 to 100 KB. The genome misassembly rate is 2.43% when there is no gap. Then we implemented our method to refine the Giant Panda genome and the donkey genome, which are purely assembled by NGS data. After LDscaff treatment, the resulting Panda assembly has scaffold N50 of 3.6 MB, 2.5 times larger than the original N50 (1.3 MB). The re-assembled donkey assembly has an improved N50 length of 32.1 MB from 23.8 MB. Conclusions Our method effectively improves the assemblies with existed re-sequencing data, and is an potential alternative to the existing assemblers required for the collection of new data.

Download Full-text