Genomic diversity revealed by whole-genome sequencing in three Danish commercial pig breeds

Abstract Whole-genome sequencing of 217 animals from three Danish commercial pig breeds (Duroc, Landrace [LL], and Yorkshire [YY]) was performed. Twenty-six million single-nucleotide polymorphisms (SNPs) and 8 million insertions or deletions (indels) were uncovered. Among the SNPs, 493,099 variants were located in coding sequences, and 29,430 were predicted to have a high functional impact such as gain or loss of stop codon. Using the whole-genome sequence dataset as the reference, the imputation accuracy for pigs genotyped with high-density SNP chips was examined. The overall average imputation accuracy for all biallelic variants (SNP and indel) was 0.69, while it was 0.83 for variants with minor allele frequency > 0.1. This study provides whole-genome reference data to impute SNP chip-genotyped animals for further studies to fine map quantitative trait loci as well as improving the prediction accuracy in genomic selection. Signatures of selection were identified both through analyses of fixation and differentiation to reveal selective sweeps that may have had prominent roles during breed development or subsequent divergent selection. However, the fixation indices did not indicate a strong divergence among these three breeds. In LL and YY, the integrated haplotype score identified genomic regions under recent selection. These regions contained genes for olfactory receptors and oxidoreductases. Olfactory receptor genes that might have played a major role in the domestication were previously reported to have been under selection in several species including cattle and swine.

Download Full-text

Optimizing Genomic Selection in Dezhou Donkey Using Low Coverage Whole Genome Sequencing

10.21203/rs.3.rs-607740/v1 ◽

2021 ◽

Author(s):

Changheng Zhao ◽

Jun Teng ◽

Xinhao Zhang ◽

Dan Wang ◽

Xinyi Zhang ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genomic Selection ◽

Genome Sequencing ◽

Sequence Data ◽

Low Cost ◽

Imputation Accuracy ◽

Genotype Imputation ◽

Whole Genome Sequence ◽

Whole Genome ◽

Low Coverage

Abstract Background Low coverage whole genome sequencing is a low-cost genotyping technology. Combining with genotype imputation approaches, it is likely to become a critical component of cost-efficient genomic selection programs in agricultural livestock. Here, we used the low-coverage sequence data of 617 Dezhou donkeys to investigate the performance of genotype imputation for low coverage whole genome sequence data and genomic selection based on the imputed genotype data. The specific aims were: (i) to measure the accuracy of genotype imputation under different sequencing depths, sample sizes, MAFs, and imputation pipelines; and (ii) to assess the accuracy of genomic selection under different marker densities derived from the imputed sequence data, different strategies for constructing the genomic relationship matrixes, and single- vs multi-trait models. Results We found that a high imputation accuracy (> 0.95) can be achieved for sequence data with sequencing depth as low as 1x and the number of sequenced individuals equal to 400. For genomic selection, the best performance was obtained by using a marker density of 410K and a G matrix constructed using marker dosage information. Multi-trait GBLUP performed better than single-trait GBLUP. Conclusions Our study demonstrates that low coverage whole genome sequencing would be a cost-effective method for genomic selection in Dezhou Donkey.

Download Full-text

Salmonella entericaPhylogeny Based on Whole-Genome Sequencing Reveals Two New Clades and Novel Patterns of Horizontally Acquired Genetic Elements

mBio ◽

10.1128/mbio.02303-18 ◽

2018 ◽

Vol 9 (6) ◽

Cited By ~ 24

Author(s):

Jay Worley ◽

Jianghong Meng ◽

Marc W. Allard ◽

Eric W. Brown ◽

Ruth E. Timme

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Foodborne Pathogens ◽

Bacterial Species ◽

Whole Genome Sequence ◽

Nucleotide Sequencing ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Content Type ◽

Genetic Elements

ABSTRACTUsing whole-genome sequence (WGS) data from the GenomeTrakr network, a globally distributed network of laboratories sequencing foodborne pathogens, we present a new phylogeny ofSalmonella entericacomprising 445 isolates from 266 distinct serovars and originating from 52 countries. This phylogeny includes two previously unidentifiedS. entericasubsp.entericaclades. Serovar Typhi is shown to be nested within clade A. Our findings are supported by both phylogenetic support, based on a core genome alignment, and Bayesian approaches, based on single-nucleotide polymorphisms. Serovar assignments were refined byin silicoanalysis using SeqSero. More than 10% of serovars were either polyphyletic or paraphyletic. We found variable genetic content in these isolates relating to gene mobilization and virulence factors which have different distributions within clades. Gifsy-1- and Gifsy-2-like phages appear more prevalent in clade A; other viruses are more evenly distributed. Our analyses reveal IncFII is the predominant plasmid replicon inS. enterica. Few core or clade-defining virulence genes are observed, and their distributions appear probabilistic in nature. Together, these patterns demonstrate that genetic exchange withinS. entericais more extensive and frequent than previously realized, which significantly alters how we view the genetic structure of the bacterial species.IMPORTANCERapid improvements in nucleotide sequencing access and affordability have led to a drastic increase in availability of genetic information. This information will improve the accuracy of molecular descriptions, including serovars, withinS. enterica. Although the concept of serovars continues to be useful, it may have more significant limitations than previously understood. Furthermore, the discrete absence or presence of specific genes can be an unstable indicator of phylogenetic identity. Whole-genome sequencing provides more rigorous tools for assessing the distributions of these genes. Our phylogenetic and genetic content analyses reveal how active genetic elements are dynamically distributed within a species, allowing us to better understand genetic reservoirs and underlying bacterial evolution.

Download Full-text

Assessing genomic diversity and signatures of selection in Jiaxian Red cattle using whole-genome sequencing data

BMC Genomics ◽

10.1186/s12864-020-07340-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Xiaoting Xia ◽

Shunjin Zhang ◽

Huaju Zhang ◽

Zijing Zhang ◽

Ningbo Chen ◽

...

Keyword(s):

Population Structure ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genomic Variation ◽

Genomic Diversity ◽

System Response ◽

Whole Genome ◽

Population Structure Analysis ◽

Native Cattle ◽

Genomic Regions

Abstract Background Native cattle breeds are an important source of genetic variation because they might carry alleles that enable them to adapt to local environment and tough feeding conditions. Jiaxian Red, a Chinese native cattle breed, is reported to have originated from crossbreeding between taurine and indicine cattle; their history as a draft and meat animal dates back at least 30 years. Using whole-genome sequencing (WGS) data of 30 animals from the core breeding farm, we investigated the genetic diversity, population structure and genomic regions under selection of Jiaxian Red cattle. Furthermore, we used 131 published genomes of world-wide cattle to characterize the genomic variation of Jiaxian Red cattle. Results The population structure analysis revealed that Jiaxian Red cattle harboured the ancestry with East Asian taurine (0.493), Chinese indicine (0.379), European taurine (0.095) and Indian indicine (0.033). Three methods (nucleotide diversity, linkage disequilibrium decay and runs of homozygosity) implied the relatively high genomic diversity in Jiaxian Red cattle. We used θπ, CLR, FST and XP-EHH methods to look for the candidate signatures of positive selection in Jiaxian Red cattle. A total number of 171 (θπ and CLR) and 17 (FST and XP-EHH) shared genes were identified using different detection strategies. Functional annotation analysis revealed that these genes are potentially responsible for growth and feed efficiency (CCSER1), meat quality traits (ROCK2, PPP1R12A, CYB5R4, EYA3, PHACTR1), fertility (RFX4, SRD5A2) and immune system response (SLAMF1, CD84 and SLAMF6). Conclusion We provide a comprehensive overview of sequence variations in Jiaxian Red cattle genomes. Selection signatures were detected in genomic regions that are possibly related to economically important traits in Jiaxian Red cattle. We observed a high level of genomic diversity and low inbreeding in Jiaxian Red cattle. These results provide a basis for further resource protection and breeding improvement of this breed.

Download Full-text

Whole-genome sequencing reveals rare off-target mutations in CRISPR/Cas9-edited grapevine

Horticulture Research ◽

10.1038/s41438-021-00549-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Xianhang Wang ◽

Mingxing Tu ◽

Ya Wang ◽

Wuchen Yin ◽

Yu Zhang ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Editing ◽

Genome Sequencing ◽

Plant Biotechnology ◽

High Specificity ◽

Fruit Trees ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Indel Mutation ◽

Target Sites

AbstractThe CRISPR (clustered regularly interspaced short palindromic repeats)-associated protein 9 (Cas9) system is a powerful tool for targeted genome editing, with applications that include plant biotechnology and functional genomics research. However, the specificity of Cas9 targeting is poorly investigated in many plant species, including fruit trees. To assess the off-target mutation rate in grapevine (Vitis vinifera), we performed whole-genome sequencing (WGS) of seven Cas9-edited grapevine plants in which one of two genes was targeted by CRISPR/Cas9 and three wild-type (WT) plants. In total, we identified between 202,008 and 272,397 single nucleotide polymorphisms (SNPs) and between 26,391 and 55,414 insertions/deletions (indels) in the seven Cas9-edited grapevine plants compared with the three WT plants. Subsequently, 3272 potential off-target sites were selected for further analysis. Only one off-target indel mutation was identified from the WGS data and validated by Sanger sequencing. In addition, we found 243 newly generated off-target sites caused by genetic variants between the Thompson Seedless cultivar and the grape reference genome (PN40024) but no true off-target mutations. In conclusion, we observed high specificity of CRISPR/Cas9 for genome editing of grapevine.

Download Full-text

Risk prediction and marker selection in nonsynonymous single nucleotide polymorphisms using whole genome sequencing data

Animal Cells and Systems ◽

10.1080/19768354.2020.1860125 ◽

2020 ◽

Vol 24 (6) ◽

pp. 321-328

Author(s):

Young-Sup Lee ◽

KyeongHye Won ◽

Donghyun Shin ◽

Jae-Don Oh

Keyword(s):

Single Nucleotide Polymorphisms ◽

Whole Genome Sequencing ◽

Risk Prediction ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Single Nucleotide ◽

Marker Selection

Download Full-text

Fast genetic mapping using insertion-deletion polymorphisms in Caenorhabditis elegans

Scientific Reports ◽

10.1038/s41598-021-90190-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ho-Yon Hwang ◽

Jiou Wang

Keyword(s):

Caenorhabditis Elegans ◽

Genetic Mapping ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genetic Material ◽

Mapping Method ◽

Forward Genetics ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Large Populations

AbstractGenetic mapping is used in forward genetics to narrow the list of candidate mutations and genes corresponding to the mutant phenotype of interest. Even with modern advances in biology such as efficient identification of candidate mutations by whole-genome sequencing, mapping remains critical in pinpointing the responsible mutation. Here we describe a simple, fast, and affordable mapping toolkit that is particularly suitable for mapping in Caenorhabditis elegans. This mapping method uses insertion-deletion polymorphisms or indels that could be easily detected instead of single nucleotide polymorphisms in commonly used Hawaiian CB4856 mapping strain. The materials and methods were optimized so that mapping could be performed using tiny amount of genetic material without growing many large populations of mutants for DNA purification. We performed mapping of previously known and unknown mutations to show strengths and weaknesses of this method and to present examples of completed mapping. For situations where Hawaiian CB4856 is unsuitable, we provide an annotated list of indels as a basis for fast and easy mapping using other wild isolates. Finally, we provide rationale for using this mapping method over other alternatives as a part of a comprehensive strategy also involving whole-genome sequencing and other methods.

Download Full-text

Mycobacterium chimaera genomics with regard to epidemiological and clinical investigations conducted for the open-chest post-surgical Mycobacterium chimaera infections outbreak

Open Forum Infectious Diseases ◽

10.1093/ofid/ofab192 ◽

2021 ◽

Author(s):

Emmanuel Lecorche ◽

Côme Daniau ◽

Kevin La ◽

Faiza Mougari ◽

Hanaa Benmansour ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Clinical Isolates ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Healthcare Facilities ◽

Open Chest

Abstract Background Post-surgical infections due to Mycobacterium chimaera appeared as a novel nosocomial threat in 2015, with a worldwide outbreak due to contaminated heater-cooler units used in open chest surgery. We report the results of investigations conducted in France including whole genome sequencing comparison of patient and HCU isolates. Methods We sought M. chimaera infection cases from 2010 onwards through national epidemiological investigations in healthcare facilities performing cardiopulmonary bypass together with a survey on good practices and systematic heater-cooler unit microbial analyses. Clinical and HCU isolates were subjected to whole genome sequencing analyzed with regards to the reference outbreak strain Zuerich-1. Results Only two clinical cases were shown to be related to the outbreak, although 23% (41/175) heater-cooler units were declared positive for M. avium complex. Specific measures to prevent infection were applied in 89% (50/56) healthcare facilities although only 14% (8/56) of them followed the manufacturer maintenance recommendations. Whole genome sequencing comparison showed that the clinical isolates and 72% (26/36) of heater-cooler unit isolates belonged to the epidemic cluster. Within clinical isolates, 5 to 9 non-synonymous single nucleotide polymorphisms were observed, among which an in vivo mutation in a putative efflux pump gene observed in a clinical isolate obtained for one patient under antimicrobial treatment. Conclusions Cases of post-surgical M. chimaera infections were declared to be rare in France, although heater-cooler units were contaminated as in other countries. Genomic analyses confirmed the connection to the outbreak and identified specific single nucleotide polymorphisms, including one suggesting fitness evolution in vivo.

Download Full-text

SureSelect targeted enrichment, a new cost effective method for the whole genome sequencing of Candidatus Liberibacter asiaticus

Scientific Reports ◽

10.1038/s41598-019-55144-4 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Weili Cai ◽

Schyler Nunziata ◽

John Rascoe ◽

Michael J. Stulberg

Keyword(s):

Whole Genome Sequencing ◽

Molecular Characterization ◽

Genome Sequencing ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Coverage ◽

Metagenomic Sample ◽

Liberibacter Asiaticus

AbstractHuanglongbing (HLB) is a worldwide deadly citrus disease caused by the phloem-limited bacteria ‘Candidatus Liberibacter asiaticus’ (CLas) vectored by Asian citrus psyllids. In order to effectively manage this disease, it is crucial to understand the relationship among the bacterial isolates from different geographical locations. Whole genome sequencing approaches will provide more precise molecular characterization of the diversity among populations. Due to the lack of in vitro culture, obtaining the whole genome sequence of CLas is still a challenge, especially for medium to low titer samples. Hundreds of millions of sequencing reads are needed to get good coverage of CLas from an HLB positive citrus sample. In order to overcome this limitation, we present here a new method, Agilent SureSelect XT HS target enrichment, which can specifically enrich CLas from a metagenomic sample while greatly reducing cost and increasing whole genome coverage of the pathogen. In this study, the CLas genome was successfully sequenced with 99.3% genome coverage and over 72X sequencing coverage from low titer tissue samples (equivalent to 28.52 Cq using Li 16 S qPCR). More importantly, this method also effectively captures regions of diversity in the CLas genome, which provides precise molecular characterization of different strains.

Download Full-text

Whole-Genome Sequencing for Bacterial Strain Typing Using the iSeq100 Platform

Infection Control and Hospital Epidemiology ◽

10.1017/ice.2020.1098 ◽

2020 ◽

Vol 41 (S1) ◽

pp. s434-s434

Author(s):

Grant Vestal ◽

Steven Bruzek ◽

Amanda Lasher ◽

Amorce Lima ◽

Suzane Silbert

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Outbreak Detection ◽

Epidemiological Surveillance ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Dna Libraries ◽

Patient Health ◽

Hospital Acquired ◽

Reference Genomes

Background: Hospital-acquired infections pose a significant threat to patient health. Laboratories are starting to consider whole-genome sequencing (WGS) as a molecular method for outbreak detection and epidemiological surveillance. The objective of this study was to assess the use of the iSeq100 platform (Illumina, San Diego, CA) for accurate sequencing and WGS-based outbreak detection using the bioMérieux EPISEQ CS, a novel cloud-based software for sequence assembly and data analysis. Methods: In total, 25 isolates, including 19 MRSA isolates and 6 ATCC strains were evaluated in this study: A. baumannii ATCC 19606, B. cepacia ATCC 25416, E. faecalis ATCC 29212, E. coli ATCC 25922, P. aeruginosa ATCC 27853 and S. aureus ATCC 25923. DNA extraction of all isolates was performed on the QIAcube (Qiagen, Hilden, Germany) using the DNEasy Ultra Clean Microbial kit extraction protocol. DNA libraries were prepared for WGS using the Nextera DNA Flex Library Prep Kit (Illumina) and sequenced at 2×150-bp on the iSeq100 according to the manufacturer’s instructions. The 19 MRSA isolates were previously characterized by the DiversiLab system (bioMérieux, France). Upon validation of the iSeq100 platform, a new outbreak analysis was performed using WGS analysis using EPISEQ CS. ATCC sequences were compared to assembled reference genomes from the NCBI GenBank to assess the accuracy of the iSeq100 platform. The FASTQ files were aligned via BowTie2 version 2.2.6 software, using default parameters, and FreeBayes version 1.1.0.46-0 was used to call homozygous single-nucleotide polymorphisms (SNPs) with a minimum coverage of 5 and an allele frequency of 0.87 using default parameters. ATCC sequences were analyzed using ResFinder version 3.2 and were compared in silico to the reference genome. Results: EPISEQ CS classified 8 MRSA isolates as unrelated and grouped 11 isolates into 2 separate clusters: cluster A (5 isolates) and cluster B (6 isolates) with similarity scores of ≥99.63% and ≥99.50%, respectively. This finding contrasted with the previous characterization by DiversiLab, which identified 3 clusters of 2, 8, and 11 isolates, respectively. The EPISEQ CS resistome data detected the mecA gene in 18 of 19 MRSA isolates. Comparative analysis of the ATCCsequences to the reference genomes showed 99.9986% concordance of SNPs and 100.00% concordance between the resistance genes present. Conclusions: The iSeq100 platform accurately sequenced the bacterial isolates and could be an affordable alternative in conjunction with EPISEQ CS for epidemiological surveillance analysis and infection prevention.Funding: NoneDisclosures: None

Download Full-text

Abstract 343: Bayesian Selection of Modifier Genes in Hypertrophic Cardiomyopathy Through Whole Genome Sequencing

Circulation Research ◽

10.1161/res.117.suppl_1.343 ◽

2015 ◽

Vol 117 (suppl_1) ◽

Author(s):

Matthew Wheeler ◽

Daryl Waggott ◽

Megan Grove ◽

Frederick Dewey ◽

Cuiping Pan ◽

...

Keyword(s):

Hypertrophic Cardiomyopathy ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

A Priori ◽

Copy Number Variants ◽

Whole Genome Sequence ◽

Monogenic Disease ◽

Whole Genome ◽

Genetic Modifiers ◽

Structural Variants

Background: Technological advances have greatly reduced the cost of whole genome sequencing. For single individuals clinical application is apparent, while exome sequencing in tens of thousands of people has allowed a more global view of genetic variation that can inform interpretation of specific variants in individuals. We hypothesized that genome sequencing of patients with monogenic cardiomyopathy would facilitate discovery of genetic modifiers of phenotype. Methods and Results: We identified 48 individuals diagnosed with cardiomyopathy and with putative mutations in MYH7, the gene encoding beta myosin heavy chain. We carried out whole genome sequencing and applied a newly developed analytical pipeline optimized for discovery of genes modifying severity of clinical presentation and outcomes. Using a combination of external priors and rare variant burden tests we scored genes as potential modifiers. There were 96 genes that reached a modifier score of 6 out of 12 or better (9=2, 8=8, 7=17, 6=69). We identified NCKAP1, a gene that regulates actin filament dynamics, and CAMSAP1, a calmodulin regulate gene that regulates microtubule dynamics, as top scoring modifiers of hypertrophic cardiomyopathy phenotypes (score=9) while LDB2, RYR2, FBN1 and ATP1A2 had modifier scores of 8. Of the top scoring genes, 21 out of 96 were identified as candidates a priori. Our candidate prioritization scheme identified the previously described modifiers of cardiomyopathy phenotype, FHOD3 and MYBPC3, as top scoring genes. We identified structural variants in 21 clinically sequenced cardiomyopathy associated genes, 13 of which were at less than 10% frequency. Copy number variants in ILK and CSRP3 were nominally associated with ejection fraction (p=0.03), while 8 genes showed copy gains (GLA, FKTN, SGCD, TTN, SOS1, ANKRD1, VCL and NEBL). Structural variants were found in CSRP3, MYL3 and TNNC1, all of which have been implicated as causative for HCM. Conclusion: Evaluation of the whole genome sequence, even in the case of putatively monogenic disease, leads to important diagnostic and scientific insights not revealed by panel-based sequencing.

Download Full-text