Analysis of Structural Variants Reveal Novel Selective Regions in the Genome of Meishan Pigs by Whole Genome Sequencing

Structural variants (SVs) represent essential forms of genetic variation, and they are associated with various phenotypic traits in a wide range of important livestock species. However, the distribution of SVs in the pig genome has not been fully characterized, and the function of SVs in the economic traits of pig has rarely been studied, especially for most domestic pig breeds. Meishan pig is one of the most famous Chinese domestic pig breeds, with excellent reproductive performance. Here, to explore the genome characters of Meishan pig, we construct an SV map of porcine using whole-genome sequencing data and report 33,698 SVs in 305 individuals of 55 globally distributed pig breeds. We perform selective signature analysis using these SVs, and a number of candidate variants are successfully identified. Especially for the Meishan pig, 64 novel significant selection regions are detected in its genome. A 140-bp deletion in the Indoleamine 2,3-Dioxygenase 2 (IDO2) gene, is shown to be associated with reproduction traits in Meishan pig. In addition, we detect two duplications only existing in Meishan pig. Moreover, the two duplications are separately located in cytochrome P450 family 2 subfamily J member 2 (CYP2J2) gene and phospholipase A2 group IVA (PLA2G4A) gene, which are related to the reproduction trait. Our study provides new insights into the role of selection in SVs' evolution and how SVs contribute to phenotypic variation in pigs.

Download Full-text

Identification of Pathogenic Structural Variants in Rare Disease Patients through Genome Sequencing

10.1101/627661 ◽

2019 ◽

Cited By ~ 2

Author(s):

James M. Holt ◽

Camille L. Birch ◽

Donna M. Brown ◽

Manavalan Gajapathy ◽

Nadiya Sosonkina ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Rare Disease ◽

Genome Sequencing ◽

Genomic Analysis ◽

Whole Genome ◽

Structural Variants ◽

Single Nucleotide Variants ◽

Standard Clinical Practice ◽

Wide Range ◽

Genetic Features

AbstractPurposeClinical whole genome sequencing is becoming more common for determining the molecular diagnosis of rare disease. However, standard clinical practice often focuses on small variants such as single nucleotide variants and small insertions/deletions. This leaves a wide range of larger “structural variants” that are not commonly analyzed in patients.MethodsWe developed a pipeline for processing structural variants for patients who received whole genome sequencing through the Undiagnosed Diseases Network (UDN). This pipeline called structural variants, stored them in an internal database, and filtered the variants based on internal frequencies and external annotations. The remaining variants were manually inspected and then interesting findings were reported as research variants to clinical sites in the UDN.ResultsOf 477 analyzed UDN cases, 286 cases (≈ 60%) received at least one structural variant as a research finding. The variants in 16 cases (≈ 4%) are considered “Certain” or “Highly likely” molecularly diagnosed and another 4 cases are currently in review. Of those 20 cases, at least 13 were identified originally through our pipeline with one finding leading to identification of a new disease. As part of this paper, we have also released the collection of variant calls identified in our cohort along with heterozygous and homozygous call counts. This data is available at https://github.com/HudsonAlpha/UDN_SV_export.ConclusionStructural variants are key genetic features that should be analyzed during routine clinical genomic analysis. For our UDN patients, structural variants helped solve ≈ 4% of the total number of cases (≈ 13% of all genome sequencing solves), a success rate we expect to improve with better tools and greater understanding of the human genome.

Download Full-text

eSCAN: Scan Regulatory Regions for Aggregate Association Testing using Whole Genome Sequencing Data

10.1101/2020.11.30.405266 ◽

2020 ◽

Author(s):

Yingxi Yang ◽

Yuchen Yang ◽

Le Huang ◽

Jai G. Broome ◽

Adolfo Correa ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

New Technologies ◽

Real Data ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Association Testing ◽

Wide Range ◽

Sequencing Studies

AbstractWith advances in whole genome sequencing (WGS) technology, multiple statistical methods for aggregate association testing have been developed. Many common approaches aggregate variants in a given genomic window of a fixed/varying size and are not reliant on existing knowledge to define appropriate test units, resulting in most identified regions not being clearly linked to genes, limiting biological understanding. Functional information from new technologies (such as Hi-C and its derivatives), which can help link enhancers to the genes they affect, can be leveraged to predefine variant sets for aggregate testing in WGS. Therefore, in this paper we propose the eSCAN (Scan the Enhancers) method for genome-wide assessment of enhancer regions in sequencing studies, combining the advantages of dynamic window selection in SCANG with the advantages of increased incorporation of genomic annotation. eSCAN searches biologically meaningful searching windows, increasing power and aiding biological interpretation, as demonstrated by simulation studies under a wide range of scenarios. We also apply eSCAN for association analysis of blood cell traits using TOPMed WGS data from Women’s Health Initiative (WHI) and Jackson Heart Study (JHS). Results from this real data example show that eSCAN is able to capture more significant signals, and these signals are of shorter length and drive association of larger regions detected by other methods.

Download Full-text

Cyrius: accurate CYP2D6 genotyping using whole genome sequencing data

10.1101/2020.05.05.077966 ◽

2020 ◽

Author(s):

Xiao Chen ◽

Fei Shen ◽

Nina Gonzaludo ◽

Alka Malhotra ◽

Cande Rogert ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Sequence Similarity ◽

Ethnically Diverse ◽

Haplotype Frequency ◽

Superior Performance ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Structural Variants ◽

Sequencing Data

AbstractResponsible for the metabolism of 25% of clinically used drugs, CYP2D6 is a critical component of personalized medicine initiatives. Genotyping CYP2D6 is challenging due to sequence similarity with its pseudogene paralog CYP2D7 and a high number and variety of common structural variants (SVs). Here we describe a novel bioinformatics method, Cyrius, that accurately genotypes CYP2D6 using whole-genome sequencing (WGS) data. We show that Cyrius has superior performance (96.5% concordance with truth genotypes) compared to existing methods (84-86.8%). After implementing the improvements identified from the comparison against the truth data, Cyrius’s accuracy has since been improved to 99.3%. Using Cyrius, we built a haplotype frequency database from 2504 ethnically diverse samples and estimate that SV-containing star alleles are more frequent than previously reported. Cyrius will be an important tool to incorporate pharmacogenomics in WGS-based precision medicine initiatives.

Download Full-text

A portable and scalable workflow for detecting structural variants in whole-genome sequencing data

2018 IEEE 14th International Conference on e-Science (e-Science) ◽

10.1109/escience.2018.00064 ◽

2018 ◽

Cited By ~ 1

Author(s):

Arnold Kuzniar ◽

Jason Maassen ◽

Stefan Verhoeven ◽

Luca Santuari ◽

Carl Shneider ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Structural Variants ◽

Sequencing Data

Download Full-text

Genome Sequence of the Wolbachia Endosymbiont of Culex quinquefasciatus JHB

Journal of Bacteriology ◽

10.1128/jb.01731-08 ◽

2008 ◽

Vol 191 (5) ◽

pp. 1725-1725 ◽

Cited By ~ 39

Author(s):

Steven L. Salzberg ◽

Daniela Puiu ◽

Daniel D. Sommer ◽

Vish Nene ◽

Norman H. Lee

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genome Sequence ◽

Culex Quinquefasciatus ◽

Cytoplasmic Incompatibility ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Wide Range ◽

Wolbachia Endosymbiont

ABSTRACT Wolbachia species are endosymbionts of a wide range of invertebrates, including mosquitoes, fruit flies, and nematodes. The wPip strains can cause cytoplasmic incompatibility in some strains of the Culex mosquito. Here we describe the genome sequence of a Wolbachia strain that was discovered in the whole-genome sequencing data for the mosquito Culex quinquefasciatus strain JHB.

Download Full-text

Predicting Antimicrobial Resistance and Associated Genomic Features from Whole-Genome Sequencing

Journal of Clinical Microbiology ◽

10.1128/jcm.01610-18 ◽

2018 ◽

Vol 57 (2) ◽

Cited By ~ 2

Author(s):

Jonathan M. Monk

Keyword(s):

Antimicrobial Resistance ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Error Rate ◽

Pathogenic Bacteria ◽

The United States ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Major Error ◽

Wide Range

ABSTRACT Thanks to the genomics revolution, thousands of strain-specific whole-genome sequences are now accessible for a wide range of pathogenic bacteria. This availability enables big data informatics approaches to be used to study the spread and acquisition of antimicrobial resistance (AMR). In this issue of the Journal of Clinical Microbiology, Nguyen et al. (M. Nguyen, S. W. Long, P. F. McDermott, R. J. Olsen, R. Olson, R. L. Stevens, G. H. Tyson, S. Zhao, and J. J. Davis, J Clin Microbiol 57:e01260-18, 2019, https://doi.org/10.1128/JCM.01260-18) report the results obtained with their machine learning models based on whole-genome sequencing data to predict the MICs of antibiotics for 5,728 nontyphoidal Salmonella genomes collected over 15 years in the United States. Their major finding demonstrates that MICs can be predicted with an average accuracy of 95% within ±1 2-fold dilution step (confidence interval, 95% to 95%), an average very major error rate of 2.7%, and an average major error rate of 0.1%. Importantly, these models predict MICs with no a priori information about the underlying gene content or resistance phenotypes of the strains, enabling the possibility to identify AMR determinants and rapidly diagnose and prioritize antibiotic use directly from the organism sequence. Employing such tools to diagnose and limit the spread of resistance-conferring mechanisms could help ameliorate the looming antibiotic resistance crisis.

Download Full-text