Whole Genome Sequence of Two Wild-Derived Mus musculus domesticus Inbred Strains, LEWES/EiJ and ZALENDE/EiJ, with Different Diploid Numbers

Abstract Wild-derived mouse inbred strains are becoming increasingly popular for complex traits analysis, evolutionary studies, and systems genetics. Here, we report the whole-genome sequencing of two wild-derived mouse inbred strains, LEWES/EiJ and ZALENDE/EiJ, of Mus musculus domesticus origin. These two inbred strains were selected based on their geographic origin, karyotype, and use in ongoing research. We generated 14× and 18× coverage sequence, respectively, and discovered over 1.1 million novel variants, most of which are private to one of these strains. This report expands the number of wild-derived inbred genomes in the Mus genus from six to eight. The sequence variation can be accessed via an online query tool; variant calls (VCF format) and alignments (BAM format) are available for download from a dedicated ftp site. Finally, the sequencing data have also been stored in a lossless, compressed, and indexed format using the multi-string Burrows-Wheeler transform. All data can be used without restriction.

Download Full-text

Family-based gene-environment interaction using sequence kernel association test (FGE-SKAT) for complex quantitative traits

Scientific Reports ◽

10.1038/s41598-021-86871-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chao-Yu Guo ◽

Reng-Hong Wang ◽

Hsin-Chou Yang

Keyword(s):

Complex Traits ◽

Association Studies ◽

Association Test ◽

Whole Genome Sequence ◽

Environment Interaction ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Sequence Kernel Association Test ◽

Gene Environment ◽

Family Based

AbstractAfter the genome-wide association studies (GWAS) era, whole-genome sequencing is highly engaged in identifying the association of complex traits with rare variations. A score-based variance-component test has been proposed to identify common and rare genetic variants associated with complex traits while quickly adjusting for covariates. Such kernel score statistic allows for familial dependencies and adjusts for random confounding effects. However, the etiology of complex traits may involve the effects of genetic and environmental factors and the complex interactions between genes and the environment. Therefore, in this research, a novel method is proposed to detect gene and gene-environment interactions in a complex family-based association study with various correlated structures. We also developed an R function for the Fast Gene-Environment Sequence Kernel Association Test (FGE-SKAT), which is freely available as supplementary material for easy GWAS implementation to unveil such family-based joint effects. Simulation studies confirmed the validity of the new strategy and the superior statistical power. The FGE-SKAT was applied to the whole genome sequence data provided by Genetic Analysis Workshop 18 (GAW18) and discovered concordant and discordant regions compared to the methods without considering gene by environment interactions.

Download Full-text

Ethnically diverse urban transmission networks of Neisseria gonorrhoeae without evidence of HIV serosorting

Sexually Transmitted Infections ◽

10.1136/sextrans-2019-054025 ◽

2019 ◽

Vol 96 (2) ◽

pp. 106-109

Author(s):

Jayshree Dave ◽

John Paul ◽

Thomas Joshua Pasvol ◽

Andy Williams ◽

Fiona Warburton ◽

...

Keyword(s):

Neisseria Gonorrhoeae ◽

Ethnic Groups ◽

Antimicrobial Susceptibility ◽

Sequence Data ◽

Small Sample ◽

Whole Genome Sequence ◽

Whole Genome ◽

Sequencing Data ◽

Transmission Networks ◽

Hiv Serosorting

ObjectiveWe aimed to characterise gonorrhoea transmission patterns in a diverse urban population by linking genomic, epidemiological and antimicrobial susceptibility data.MethodsNeisseria gonorrhoeae isolates from patients attending sexual health clinics at Barts Health NHS Trust, London, UK, during an 11-month period underwent whole-genome sequencing and antimicrobial susceptibility testing. We combined laboratory and patient data to investigate the transmission network structure.ResultsOne hundred and fifty-eight isolates from 158 patients were available with associated descriptive data. One hundred and twenty-nine (82%) patients identified as male and 25 (16%) as female; four (3%) records lacked gender information. Self-described ethnicities were: 51 (32%) English/Welsh/Scottish; 33 (21%) white, other; 23 (15%) black British/black African/black, other; 12 (8%) Caribbean; 9 (6%) South Asian; 6 (4%) mixed ethnicity; and 10 (6%) other; data were missing for 14 (9%). Self-reported sexual orientations were 82 (52%) men who have sex with men (MSM); 49 (31%) heterosexual; 2 (1%) bisexual; data were missing for 25 individuals. Twenty-two (14%) patients were HIV positive. Whole-genome sequence data were generated for 151 isolates, which linked 75 (50%) patients to at least one other case. Using sequencing data, we found no evidence of transmission networks related to specific ethnic groups (p=0.64) or of HIV serosorting (p=0.35). Of 82 MSM/bisexual patients with sequencing data, 45 (55%) belonged to clusters of ≥2 cases, compared with 16/44 (36%) heterosexuals with sequencing data (p=0.06).ConclusionWe demonstrate links between 50% of patients in transmission networks using a relatively small sample in a large cosmopolitan city. We found no evidence of HIV serosorting. Our results do not support assortative selectivity as an explanation for differences in gonorrhoea incidence between ethnic groups.

Download Full-text

MTBseq: a comprehensive pipeline for whole genome sequence analysis of Mycobacterium tuberculosis complex isolates

PeerJ ◽

10.7717/peerj.5895 ◽

2018 ◽

Vol 6 ◽

pp. e5895 ◽

Cited By ~ 35

Author(s):

Thomas Andreas Kohl ◽

Christian Utpatel ◽

Viola Schleusener ◽

Maria Rosaria De Filippo ◽

Patrick Beckert ◽

...

Keyword(s):

Antibiotic Resistance ◽

Mycobacterium Tuberculosis ◽

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome Sequencing Data ◽

Phylogenomic Analysis ◽

Whole Genome ◽

Sequencing Data ◽

Desktop Computer

Analyzing whole-genome sequencing data of Mycobacterium tuberculosis complex (MTBC) isolates in a standardized workflow enables both comprehensive antibiotic resistance profiling and outbreak surveillance with highest resolution up to the identification of recent transmission chains. Here, we present MTBseq, a bioinformatics pipeline for next-generation genome sequence data analysis of MTBC isolates. Employing a reference mapping based workflow, MTBseq reports detected variant positions annotated with known association to antibiotic resistance and performs a lineage classification based on phylogenetic single nucleotide polymorphisms (SNPs). When comparing multiple datasets, MTBseq provides a joint list of variants and a FASTA alignment of SNP positions for use in phylogenomic analysis, and identifies groups of related isolates. The pipeline is customizable, expandable and can be used on a desktop computer or laptop without any internet connection, ensuring mobile usage and data security. MTBseq and accompanying documentation is available from https://github.com/ngs-fzb/MTBseq_source.

Download Full-text

Whole Genome Sequencing of the Mutamouse Model Reveals Strain- and Colony-Level Variation, and Genomic Features of the Transgene Integration Site

Scientific Reports ◽

10.1038/s41598-019-50302-0 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Matthew J. Meier ◽

Marc A. Beal ◽

Andrew Schoenrock ◽

Carole L. Yauk ◽

Francesco Marchetti

Keyword(s):

Integration Site ◽

Whole Genome Sequence ◽

Comparative Genomic ◽

Whole Genome ◽

Missense Mutations ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

High Coverage ◽

Deletion Event ◽

Transgene Integration

Abstract The MutaMouse transgenic rodent model is widely used for assessing in vivo mutagenicity. Here, we report the characterization of MutaMouse’s whole genome sequence and its genetic variants compared to the C57BL/6 reference genome. High coverage (>50X) next-generation sequencing (NGS) of whole genomes from multiple MutaMouse animals from the Health Canada (HC) colony showed ~5 million SNVs per genome, ~20% of which are putatively novel. Sequencing of two animals from a geographically separated colony at Covance indicated that, over the course of 23 years, each colony accumulated 47,847 (HC) and 17,677 (Covance) non-parental homozygous single nucleotide variants. We found no novel nonsense or missense mutations that impair the MutaMouse response to genotoxic agents. Pairing sequencing data with array comparative genomic hybridization (aCGH) improved the accuracy and resolution of copy number variants (CNVs) calls and identified 300 genomic regions with CNVs. We also used long-read sequence technology (PacBio) to show that the transgene integration site involved a large deletion event with multiple inversions and rearrangements near a retrotransposon. The MutaMouse genome gives important genetic context to studies using this model, offers insight on the mechanisms of structural variant formation, and contributes a framework to analyze aCGH results alongside NGS data.

Download Full-text

Transmission distortion and genetic incompatibilities between alleles in a multigenerational mouse advanced intercross line

10.1101/2021.06.09.447720 ◽

2021 ◽

Author(s):

Danny Arends ◽

Stefan Kärst ◽

Sebastian Heise ◽

Paula Korkuc ◽

Deike Hesse ◽

...

Keyword(s):

Protein Interactions ◽

Complex Traits ◽

Genetic Background ◽

Inbred Strains ◽

Parental Origin ◽

Protein Protein Interactions ◽

Sequencing Data ◽

Synonymous Snps ◽

Advanced Intercross Line ◽

Overrepresentation Analysis

Background/Objectives: While direct additive and dominance effects on complex traits have been mapped repeatedly, additional genetic factors contributing to the heterogeneity of complex traits have been scarcely investigated. To assess genetic background effects, we investigated transmission ratio distortions (TRDs) of alleles from parent to offspring using an advanced intercross line (AIL) of an initial cross between the mouse inbred strains C57BL/6NCrl (B6N) and BFMI860-12 (BFMI). Subjects/Methods: 341 males of generation 28 and their respective 61 parents and 66 grandparents were genotyped using Mega Mouse Universal Genotyping Arrays (MegaMUGA). TRDs were investigated using allele transmission asymmetry tests, and pathway overrepresentation analysis was performed. Sequencing data was used to test for overrepresentation of non-synonymous SNPs in TRD regions. Genetic incompatibilities were tested using the Bateson-Dobzhansky-Muller two-locus model. Results: 62 TRD regions were detected, many in close proximity to the telocentric centromere. TRD regions contained 44.5% more non-synonymous SNPs than randomly selected regions (182 vs. 125.9 17.0, P < 1x10-4). Testing for genetic incompatibilities between TRD regions identified 29 genome-wide significant incompatibilities between TRD regions (P(BF) < 0.05). Pathway overrepresentation analysis of genes in TRD regions showed that DNA methylation, epigenetic regulation of RNA, and meiotic/meiosis regulation pathways were affected independent of the parental origin of the TRD. Paternal BFMI TRD regions showed overrepresentation in the small interfering RNA (siRNA) biogenesis and in the metabolism of lipids and lipoproteins. Maternal B6N TRD regions harbored genes involved in meiotic recombination, cell death, and apoptosis pathways. The analysis of genes in TRD regions suggests the potential distortion of protein-protein interactions accounting for obesity and diabetic retinopathy as a result of disadvantageous combinations of allelic variants in Aass, Pgx6 and Nme8. Conclusions: Since genes in TRD regions showed a significant increase in the number of non-synonymous SNPs, these loci likely co-evolved to ensure protein-protein interaction compatibility, survival and optimal adaptation to the genetic background environment. Genes in these regions provide new targets for investigating genetic adaptation, protein-protein interactions, and determinants of complex traits such as obesity.

Download Full-text

Population-level genome-wide STR typing in Plasmodium species reveals higher resolution population structure and genetic diversity relative to SNP typing

10.1101/2021.05.19.444768 ◽

2021 ◽

Author(s):

Jiru Han ◽

Jacob E Munro ◽

Anthony Kocoski ◽

Alyssa E Barry ◽

Melanie Bahlo

Keyword(s):

Genetic Diversity ◽

Large Scale ◽

Tandem Repeats ◽

Plasmodium Species ◽

Whole Genome Sequence ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Genome Wide ◽

Field Samples

Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been made available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).

Download Full-text

Whole-genome sequence and methylome profiling of the almond (Prunus dulcis [Mill.] D.A.Webb) cultivar Nonpareil

10.1101/2021.10.27.466198 ◽

2021 ◽

Author(s):

Katherine M. D'Amico-Willman ◽

Wilberforce Z. Ouma ◽

Tea Meulia ◽

Gina M. Sideli ◽

Thomas M. Gradziel ◽

...

Keyword(s):

Genome Sequence ◽

Cytosine Methylation ◽

Gene Prediction ◽

Prunus Dulcis ◽

The United States ◽

Whole Genome Sequence ◽

Whole Genome ◽

Sequencing Data ◽

Oxford Nanopore ◽

Epigenetic Signatures

Almond (Prunus dulcis [Mill.] D.A. Webb) is an economically important, specialty nut crop grown almost exclusively in the United States. Breeding and improvement efforts worldwide have led to the development of key, productive cultivars, including Nonpareil, which is the most widely grown almond cultivar. Thus far, genomic resources for this species have been limited, and a whole-genome assembly for Nonpareil is not currently available despite its economic importance and use in almond breeding worldwide. We generated a 615.89X coverage genome sequence using Illumina, PacBio, and optical mapping technologies. Gene prediction revealed 27,487 genes using MinION Oxford nanopore and Illumina RNA sequencing, and genome annotation found that 68% of predicted models are associated with at least one biological function. Further, epigenetic signatures of almond, namely DNA cytosine methylation, have been implicated in a variety of phenotypes including self-compatibility, bud dormancy, and development of non-infectious bud failure. In addition to the genome sequence and annotation, this report also provides the complete methylome of several key almond tissues, including leaf, flower, endocarp, mesocarp, fruit skin, and seed coat. Comparisons between methylation profiles in these tissues revealed differences in genome-wide weighted percent methylation and chromosome-level methylation enrichment. The raw sequencing data are available on NCBI Sequence Read Archive, and the complete genome sequence and annotation files are available on NCBI Genbank. All data can be used without restriction.

Download Full-text

PHARP: A pig haplotype reference panel for genotype imputation

10.1101/2021.06.03.446888 ◽

2021 ◽

Author(s):

Zhen Wang ◽

Zhenyang Zhang ◽

Zitao Chen ◽

Jiabao Sun ◽

Caiyun Cao ◽

...

Keyword(s):

Complex Traits ◽

Sequence Data ◽

Genotype Imputation ◽

Reference Panel ◽

Whole Genome Sequence ◽

Sequencing Data ◽

Large White ◽

Downstream Analysis ◽

Low Coverage ◽

Analytical Tools

Pigs not only function as a major meat source worldwide but also are commonly used as an animal model for studying human complex traits. A large haplotype reference panel has been used to facilitate efficient phasing and imputation of relatively sparse genome-wide microarray chips and low-coverage sequencing data. Using the imputed genotypes in the downstream analysis, such as GWASs, TWASs, eQTL mapping and genomic prediction (GS), is beneficial for obtaining novel findings. However, currently, there is still a lack of publicly available and high-quality pig reference panels with large sample sizes and high diversity, which greatly limits the application of genotype imputation in pigs. In response, we built the pig Haplotype Reference Panel (PHARP) database. PHARP provides a reference panel of 2,012 pig haplotypes at 34 million SNPs constructed using whole-genome sequence data from more than 49 studies of 71 pig breeds. It also provides Web-based analytical tools that allow researchers to carry out phasing and imputation consistently and efficiently. PHARP is freely accessible at http://alphaindex.zju.edu.cn/PHARP/index.php. We demonstrate its applicability for pig commercial 50K SNP arrays, by accurately imputing 2.6 billion genotypes at a concordance rate value of 0.971 in 81 Large White pigs (~ 17x sequencing coverage). We also applied our reference panel to impute the low-density SNP chip into the high-density data for three GWASs and found novel significantly associated SNPs that might be casual variants.

Download Full-text

The future of genomics for developmentalists

Development and Psychopathology ◽

10.1017/s0954579413000606 ◽

2013 ◽

Vol 25 (4pt2) ◽

pp. 1263-1278 ◽

Cited By ~ 23

Author(s):

Robert Plomin ◽

Michael A. Simpson

Keyword(s):

Complex Traits ◽

Rare Variants ◽

Behavioral Development ◽

Genomic Research ◽

Whole Genome Sequence ◽

Whole Genome ◽

Risk And Resilience ◽

Polygenic Scores ◽

Dna Variants ◽

The Future

AbstractThe momentum of genomic science will carry it far into the future and into the heart of research on typical and atypical behavioral development. The purpose of this paper is to focus on a few implications and applications of these advances for understanding behavioral development. Quantitative genetics is genomic and will chart the course for molecular genomic research now that these two worlds of genetics are merging in the search for many genes of small effect. Although current attempts to identify specific genes have had limited success, known as the missing heritability problem, whole-genome sequencing will improve this situation by identifying all DNA sequence variations, including rare variants. Because the heritability of complex traits is caused by many DNA variants of small effect in the population, polygenic scores that are composites of hundreds or thousands of DNA variants will be used by developmentalists to predict children's genetic risk and resilience. The most far-reaching advance will be the widespread availability of whole-genome sequence for children, which means that developmentalists would no longer need to obtain DNA or to genotype children in order to use genomic information in research or in the clinic.

Download Full-text

Predicting causal variants affecting expression using whole genome sequence and RNA-seq from multiple human tissues

10.1101/088872 ◽

2016 ◽

Cited By ~ 2

Author(s):

Andrew Anand Brown ◽

Ana Viñuela ◽

Olivier Delaneau ◽

Tim Spector ◽

Kerrin Small ◽

...

Keyword(s):

Genome Sequence ◽

Complex Traits ◽

Causal Variant ◽

Whole Genome Sequence ◽

Open Chromatin ◽

Whole Genome ◽

Rna Seq ◽

Derived Properties ◽

Causal Variants ◽

Genomic Regions

Genetic association mapping produces statistical links between phenotypes and genomic regions, but identifying the causal variants themselves remains difficult. Complete knowledge of all genetic variants, as provided by whole genome sequence (WGS), will help, but is currently financially prohibitive for well powered GWAS studies. To explore the advantages of WGS in a well powered setting, we performed eQTL mapping using WGS and RNA-seq, and showed that the lead eQTL variants called using WGS are more likely to be causal. We derived properties of the causal variant from simulation studies, and used these to propose a method for implicating likely causal SNPs. This method predicts that 25% - 70% of the causal variants lie in open chromatin regions, depending on tissue and experiment. Finally, we identify a set of high confidence causal variants and show that they are more enriched in GWAS associations than other eQTL. Of these, we find 65 associations with GWAS traits and show examples where the gene implicated by expression has been functionally validated as relevant for complex traits.

Download Full-text