scholarly journals Whole Genome Sequencing of the Mutamouse Model Reveals Strain- and Colony-Level Variation, and Genomic Features of the Transgene Integration Site

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Matthew J. Meier ◽  
Marc A. Beal ◽  
Andrew Schoenrock ◽  
Carole L. Yauk ◽  
Francesco Marchetti

Abstract The MutaMouse transgenic rodent model is widely used for assessing in vivo mutagenicity. Here, we report the characterization of MutaMouse’s whole genome sequence and its genetic variants compared to the C57BL/6 reference genome. High coverage (>50X) next-generation sequencing (NGS) of whole genomes from multiple MutaMouse animals from the Health Canada (HC) colony showed ~5 million SNVs per genome, ~20% of which are putatively novel. Sequencing of two animals from a geographically separated colony at Covance indicated that, over the course of 23 years, each colony accumulated 47,847 (HC) and 17,677 (Covance) non-parental homozygous single nucleotide variants. We found no novel nonsense or missense mutations that impair the MutaMouse response to genotoxic agents. Pairing sequencing data with array comparative genomic hybridization (aCGH) improved the accuracy and resolution of copy number variants (CNVs) calls and identified 300 genomic regions with CNVs. We also used long-read sequence technology (PacBio) to show that the transgene integration site involved a large deletion event with multiple inversions and rearrangements near a retrotransposon. The MutaMouse genome gives important genetic context to studies using this model, offers insight on the mechanisms of structural variant formation, and contributes a framework to analyze aCGH results alongside NGS data.

2017 ◽  
Author(s):  
Fatemeh Dorri ◽  
Sean Jewell ◽  
Alexandre Bouchard-Côté ◽  
Sohrab P. Shah

AbstractAccurate detection and classification of somatic single nucleotide variants (SNVs) is important in defining the clonal composition of human cancers. Existing tools are prone to miss low prevalence mutations and methods for classification of mutations into clonal groups across the whole genome are underdeveloped. Increasing interest in deciphering clonal population dynamics over multiple samples in time or anatomic space from the same patient is resulting in whole genome sequence (WGS) data from phylogenetically related samples. With the access to this data, we posited that injecting clonal structure information into the inference of mutations from multiple samples would improve mutation detection.We developed MuClone: a novel statistical framework for simultaneous detection and classification of mutations across multiple tumour samples of a patient from whole genome or exome sequencing data. The key advance lies in incorporating prior knowledge about the cellular prevalences of clones to improve the performance of detecting mutations, particularly low prevalence mutations. We evaluated MuClone through synthetic and real data from spatially sampled ovarian cancers. Results support the hypothesis that clonal information improves sensitivity in detecting somatic mutations without compromising specificity. In addition, MuClone classifies mutations across whole genomes of multiple samples into biologically meaningful groups, providing additional phylogenetic insights and enhancing the study of WGS-derived clonal dynamics.


2020 ◽  
Vol 7 (Supplement_1) ◽  
pp. S299-S299
Author(s):  
Ahmed M Moustafa ◽  
Paul J Planet ◽  
Paul J Planet

Abstract Background As the pandemic SARS-CoV-2 virus has spread globally its genome has diversified and distinct clones can now be recognized, tracked, and traced. Identifying clonal groups allows for assessment of geographic spread, transmission events, and identification of more virulent or transmissible emerging strains. Methods All SARS-CoV-2 genomes (n=17,504) that are complete and high coverage were downloaded from GISAID on May 17th 2020. We developed a GNU-based Virus IDentification (GNUVID) tool that implements a whole genome multilocus sequence typing (wgMLST) scheme composed of all ten ORFs in the SARS-CoV-2 genome. The 10,422 genomes that passed our quality check were fed to the GNUVID tool, which assigned a ST profile to each genome. Global optimum eBURST was then used to cluster the STs in clonal complexes (CCs). Results Our ST/CC analysis uncovered strong associations of ST/CCs with certain geographical regions but also dynamic local changes in ST/CC prevalence. We also identified several unexpected putative global transmission events (e.g., from the US to the Middle East and reintroduction to China later in the pandemic). We have made our tool (GNUVID) available so that new WG sequences can be rapidly assigned to an ST/CC (https://github.com/ahmedmagds/GNUVID). Conclusion Our sequence typing system uncovered previously unappreciated transmission events and waves of expansion and replacement of SARS-CoV-2 STs and CCs in different geographical locations, suggesting complex dynamics in viral populations that previously seemed monomorphic. Because, our tool can be rapidly updated with new sequencing data it can track emerging clones and identifying new hotspots. Disclosures All Authors: No reported disclosures


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2003 ◽  
Author(s):  
Michael P. Heaton ◽  
Timothy P.L. Smith ◽  
Jacky K. Carnahan ◽  
Veronica Basnayake ◽  
Jiansheng Qiu ◽  
...  

The availability of whole genome sequence (WGS) data has made it possible to discover protein variantsin silico. However, existing bovine WGS databases do not show data in a form conducive to protein variant analysis, and tend to under represent the breadth of genetic diversity in global beef cattle. Thus, our first aim was to use 96 beef sires, sharing minimal pedigree relationships, to create a searchable and publicly viewable set of mapped genomes relevant for 19 popular breeds of U.S. cattle. Our second aim was to identify protein variants encoded by the bovine endothelial PAS domain-containing protein 1 gene (EPAS1), a gene associated with pulmonary hypertension in Angus cattle. The identity and quality of genomic sequences were verified by comparing WGS genotypes to those derived from other methods. The average read depth, genotype scoring rate, and genotype accuracy exceeded 14, 99%, and 99%, respectively. The 96 genomes were used to discover four amino acid variants encoded byEPAS1(E270Q, P362L, A671G, and L701F) and confirm two variants previously associated with disease (A606T and G610S). The sixEPAS1missense mutations were verified with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry assays, and their frequencies were estimated in a separate collection of 1154 U.S. cattle representing 46 breeds. A rooted phylogenetic tree of eight polypeptide sequences provided a framework for evaluating the likely order of mutations and potential impact ofEPAS1alleles on the adaptive response to chronic hypoxia in U.S. cattle. This public, whole genome resource facilitatesin silicoidentification of protein variants in diverse types of U.S. beef cattle, and provides a means of translating WGS data into a practical biological and evolutionary context for generating and testing hypotheses.


2019 ◽  
Vol 96 (2) ◽  
pp. 106-109
Author(s):  
Jayshree Dave ◽  
John Paul ◽  
Thomas Joshua Pasvol ◽  
Andy Williams ◽  
Fiona Warburton ◽  
...  

ObjectiveWe aimed to characterise gonorrhoea transmission patterns in a diverse urban population by linking genomic, epidemiological and antimicrobial susceptibility data.MethodsNeisseria gonorrhoeae isolates from patients attending sexual health clinics at Barts Health NHS Trust, London, UK, during an 11-month period underwent whole-genome sequencing and antimicrobial susceptibility testing. We combined laboratory and patient data to investigate the transmission network structure.ResultsOne hundred and fifty-eight isolates from 158 patients were available with associated descriptive data. One hundred and twenty-nine (82%) patients identified as male and 25 (16%) as female; four (3%) records lacked gender information. Self-described ethnicities were: 51 (32%) English/Welsh/Scottish; 33 (21%) white, other; 23 (15%) black British/black African/black, other; 12 (8%) Caribbean; 9 (6%) South Asian; 6 (4%) mixed ethnicity; and 10 (6%) other; data were missing for 14 (9%). Self-reported sexual orientations were 82 (52%) men who have sex with men (MSM); 49 (31%) heterosexual; 2 (1%) bisexual; data were missing for 25 individuals. Twenty-two (14%) patients were HIV positive. Whole-genome sequence data were generated for 151 isolates, which linked 75 (50%) patients to at least one other case. Using sequencing data, we found no evidence of transmission networks related to specific ethnic groups (p=0.64) or of HIV serosorting (p=0.35). Of 82 MSM/bisexual patients with sequencing data, 45 (55%) belonged to clusters of ≥2 cases, compared with 16/44 (36%) heterosexuals with sequencing data (p=0.06).ConclusionWe demonstrate links between 50% of patients in transmission networks using a relatively small sample in a large cosmopolitan city. We found no evidence of HIV serosorting. Our results do not support assortative selectivity as an explanation for differences in gonorrhoea incidence between ethnic groups.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5895 ◽  
Author(s):  
Thomas Andreas Kohl ◽  
Christian Utpatel ◽  
Viola Schleusener ◽  
Maria Rosaria De Filippo ◽  
Patrick Beckert ◽  
...  

Analyzing whole-genome sequencing data of Mycobacterium tuberculosis complex (MTBC) isolates in a standardized workflow enables both comprehensive antibiotic resistance profiling and outbreak surveillance with highest resolution up to the identification of recent transmission chains. Here, we present MTBseq, a bioinformatics pipeline for next-generation genome sequence data analysis of MTBC isolates. Employing a reference mapping based workflow, MTBseq reports detected variant positions annotated with known association to antibiotic resistance and performs a lineage classification based on phylogenetic single nucleotide polymorphisms (SNPs). When comparing multiple datasets, MTBseq provides a joint list of variants and a FASTA alignment of SNP positions for use in phylogenomic analysis, and identifies groups of related isolates. The pipeline is customizable, expandable and can be used on a desktop computer or laptop without any internet connection, ensuring mobile usage and data security. MTBseq and accompanying documentation is available from https://github.com/ngs-fzb/MTBseq_source.


2021 ◽  
Author(s):  
Jiru Han ◽  
Jacob E Munro ◽  
Anthony Kocoski ◽  
Alyssa E Barry ◽  
Melanie Bahlo

Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been made available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).


2020 ◽  
Author(s):  
Xueping LI ◽  
Jianhong Li ◽  
Yonghong Qi ◽  
Yonggang Liu ◽  
Minquan Li

Abstract BackgroundFusarium equiseti is a plant pathogen with a wide range of hosts and diverse effects, including probiotic activity. However, the underlying molecular mechanisms remain unclear, hindering its effective control and utilization. In this study, the Illumina HiSeq 4000 and PacBio platforms were used to sequence and assemble the whole genome of Fusarium equiseti D25-1.ResultsThe assembly included 16 fragments with a GC content of 48.01%, gap number of zero, and size of 40,776,005 bp. There were 40,110 exons and 26,281 introns having a total size of 19,787,286 bp and 2,290,434 bp, respectively. The genome had an average copy number of 333, 71, 69, 31, and 108 for tRNAs, rRNAs, sRNAs, snRNAs, and miRNAs, respectively. The total repetitive sequence length was 1,713,918 bp, accounting for 4.2033% of the genome. In total, 13,134 functional genes were annotated, accounting for 94.97% of the total gene number. Toxin-related genes, including two related to zearalenone and 23 related to trichothecene, were identified. A comparative genomic analysis supported the high quality of the F. equiseti assembly, exhibiting good collinearity with the reference strains, 3,483 species-specific genes, and 1,805 core genes. A gene family analysis revealed more than 2,500 single-copy orthologs. F. equiseti was most closely related to Fusarium pseudograminearum based on a phylogenetic analysis at the whole-genome level.ConclusionsOur comprehensive analysis of the whole genome of F. equiseti provides basic data for studies of gene expression, regulatory and functional mechanisms, evolutionary processes, as well as disease prevention and control.


2021 ◽  
Author(s):  
Katherine M. D'Amico-Willman ◽  
Wilberforce Z. Ouma ◽  
Tea Meulia ◽  
Gina M. Sideli ◽  
Thomas M. Gradziel ◽  
...  

Almond (Prunus dulcis [Mill.] D.A. Webb) is an economically important, specialty nut crop grown almost exclusively in the United States. Breeding and improvement efforts worldwide have led to the development of key, productive cultivars, including Nonpareil, which is the most widely grown almond cultivar. Thus far, genomic resources for this species have been limited, and a whole-genome assembly for Nonpareil is not currently available despite its economic importance and use in almond breeding worldwide. We generated a 615.89X coverage genome sequence using Illumina, PacBio, and optical mapping technologies. Gene prediction revealed 27,487 genes using MinION Oxford nanopore and Illumina RNA sequencing, and genome annotation found that 68% of predicted models are associated with at least one biological function. Further, epigenetic signatures of almond, namely DNA cytosine methylation, have been implicated in a variety of phenotypes including self-compatibility, bud dormancy, and development of non-infectious bud failure. In addition to the genome sequence and annotation, this report also provides the complete methylome of several key almond tissues, including leaf, flower, endocarp, mesocarp, fruit skin, and seed coat. Comparisons between methylation profiles in these tissues revealed differences in genome-wide weighted percent methylation and chromosome-level methylation enrichment. The raw sequencing data are available on NCBI Sequence Read Archive, and the complete genome sequence and annotation files are available on NCBI Genbank. All data can be used without restriction.


2021 ◽  
Author(s):  
Daniel DiCorpo ◽  
Sheila M Gaynor ◽  
Emily M Russell ◽  
Kenneth E Westerman ◽  
Laura M Raffield ◽  
...  

ABSTRACTThe genetic determinants of fasting glucose (FG) and fasting insulin (FI) have been studied mostly through genome and exome arrays, resulting in over 100 associated variants. We extended this work with a high-coverage whole genome sequencing (WGS) analysis from fifteen cohorts in the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. More than 23,000 non-diabetic individuals from five self-reported race/ethnicities (African, Asian, European, Hispanic and Samoan) were included for each trait. We analyzed 60M variants in race/ethnicity-specific and pooled single variant and rare variant aggregate tests. Twenty-two variants across sixteen gene regions were found significantly associated with FG or FI, eight of which were rare (Minor Allele Frequency, MAF<0.05). Functional annotation from resources including the Diabetes Epigenome Atlas were compiled for each signal (chromatin states, annotation principal components, and others) to elucidate variant-to-function hypotheses. Near the G6PC2 locus we identified a distinct FG signal at rare variant rs2232326 (MAF=0.01) after conditioning on known common variants. Functional annotations show rs2232326 to be disruptive and likely damaging while being weakly transcribed in islets. A pair of FG-associated variants were identified near the SLC30A8 locus. These variants, one of which was rare (MAF=0.001) and Asian race/ethnicity-specific, were shown to be in islet-specific active enhancer regions. Other associated regions include rare variants near ROBO1 and PTPRT, and common variants near MTNR1B, GCK, GCKR, FOXA2, APOB, TCF7L2, and ADCY5. We provide a catalog of nucleotide-resolution genomic variation spanning intergenic and intronic regions down to a minor allele count of 20, creating a foundation for future sequencing-based investigation of glycemic traits.


2021 ◽  
Author(s):  
Masako Ichikawa ◽  
Norio Kato ◽  
Erika Toda ◽  
Masakazu Kashihara ◽  
Yuji Ishida ◽  
...  

AbstractSomaclonal variation was studied by whole-genome sequencing in rice plants (Oryza sativa L., ‘Nipponbare’) regenerated from the zygotes, mature embryos, and immature embryos of a single mother plant. The mother plant and its seed-propagated progeny were also sequenced. A total of 338 variants of the mother plant sequence were detected in the progeny, and mean values ranged from 9.0 of the seed-propagated plants to 37.4 of regenerants from mature embryos. The ratio of single nucleotide variants among the variants was 74.3%, and the natural mutation rate calculated using the variants in the seed-propagated plants was 1.2 × 10−8. The percentage and the mutation rate were consistent with the values reported previously. Plants regenerated from mature embryos had significantly more variants than different progeny types. Therefore, using zygotes and immature embryos can reduce somaclonal variation during the genetic manipulation of rice.


Sign in / Sign up

Export Citation Format

Share Document