scholarly journals Identification of High-Confidence Structural Variants in Domesticated Rainbow Trout Using Whole-Genome Sequencing

2021 ◽  
Vol 12 ◽  
Author(s):  
Sixin Liu ◽  
Guangtu Gao ◽  
Ryan M. Layer ◽  
Gary H. Thorgaard ◽  
Gregory D. Wiens ◽  
...  

Genomic structural variants (SVs) are a major source of genetic and phenotypic variation but have not been investigated systematically in rainbow trout (Oncorhynchus mykiss), an important aquaculture species of cold freshwater. The objectives of this study were 1) to identify and validate high-confidence SVs in rainbow trout using whole-genome re-sequencing; and 2) to examine the contribution of transposable elements (TEs) to SVs in rainbow trout. A total of 96 rainbow trout, including 11 homozygous lines and 85 outbred fish from three breeding populations, were whole-genome sequenced with an average genome coverage of 17.2×. Putative SVs were identified using the program Smoove which integrates LUMPY and other associated tools into one package. After rigorous filtering, 13,863 high-confidence SVs were identified. Pacific Biosciences long-reads of Arlee, one of the homozygous lines used for SV detection, validated 98% (3,948 of 4,030) of the high-confidence SVs identified in the Arlee homozygous line. Based on principal component analysis, the 85 outbred fish clustered into three groups consistent with their populations of origin, further indicating that the high-confidence SVs identified in this study are robust. The repetitive DNA content of the high-confidence SV sequences was 86.5%, which is much higher than the 57.1% repetitive DNA content of the reference genome, and is also higher than the repetitive DNA content of Atlantic salmon SVs reported previously. TEs thus contribute substantially to SVs in rainbow trout as TEs make up the majority of repetitive sequences. Hundreds of the high-confidence SVs were annotated as exon-loss or gene-fusion variants, and may have phenotypic effects. The high-confidence SVs reported in this study provide a foundation for further rainbow trout SV studies.

2014 ◽  
Author(s):  
Maria Avila-Arcos ◽  
Marcela Sandoval-Velasco ◽  
Hannes Schroeder ◽  
Meredith L Carpenter ◽  
Anna-Sapfo Malaspinas ◽  
...  

1. The application of whole genome capture (WGC) methods to ancient DNA (aDNA) promises to increase the efficiency of ancient genome sequencing. 2. We compared the performance of two recently developed WGC methods in enriching human aDNA within Illumina libraries built using both double-stranded (DSL) and single-stranded (SSL) build protocols. Although both methods effectively enriched aDNA, one consistently produced marginally better results, giving us the opportunity to further explore the parameters influencing WGC experiments. 3. Our results suggest that bait length has an important influence on library enrichment. Moreover, we show that WGC biases against the shorter molecules that are enriched in SSL preparation protocols. Therefore application of WGC to such samples is not recommended without future optimization. Lastly, we document the effect of WGC on other features including clonality, GC composition and repetitive DNA content of captured libraries. 4. Our findings provide insights for researchers planning to perform WGC on aDNA, and suggest future tests and optimization to improve WGC efficiency.


Author(s):  
Shangzhe Zhang ◽  
Wenyu Liu ◽  
Xinfeng Liu ◽  
Xin Du ◽  
Ke Zhang ◽  
...  

Abstract Structural variants (SVs) represent an important genetic resource for both natural and artificial selection. Here we present a chromosome-scale reference genome for domestic yak (Bos grunniens) that has longer contigs and scaffolds (N50 44.72Mb and 114.39 Mb, respectively) than reported for any other ruminant genome. We further obtained long-read resequencing data for 6 wild and 23 domestic yaks and constructed a genetic SV map of 37,220 SVs that covers the geographic range of the yaks. The majority of the SVs contains repetitive sequences and several are in or near genes. By comparing SVs in domestic and wild yaks, we identified genes that are predominantly related to the nervous system, behavior, immunity and reproduction and may have been targeted by artificial selection during yak domestication. These findings provide new insights in the domestication of animals living at high altitude and highlight the importance of SVs in animal domestication.


2017 ◽  
Vol 95 (suppl_4) ◽  
pp. 103-104
Author(s):  
R. M. O. Silva ◽  
R. L. Vallejo ◽  
J. P. Evenhuis ◽  
T. D. Leeds ◽  
G. Gao ◽  
...  

Animals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 1810
Author(s):  
Paul Uiuiu ◽  
Călin Lațiu ◽  
Tudor Păpuc ◽  
Cristina Craioveanu ◽  
Andrada Ihuț ◽  
...  

Blood biochemistry parameters are valuable tools for monitoring fish health. Their baseline values are still undefined for a multitude of farmed fish species. In this study, changes in the blood profile of rainbow trout females (Oncorhynchus mykiss) from three farms were investigated using different biomarkers during the summer season. In the given context, the main water physicochemical parameters were investigated and twelve biochemical parameters were measured from blood samples of rainbow trout reared in the Fiad, Șoimul de Jos, and Strâmba farms. We selected these farms because the genetic background of the rainbow trout is the same, with all studied specimens coming from the Fiad farm, which has an incubation station. Forty-five samples were collected monthly (May to August) throughout summer to observe the changes in the blood profile of rainbow trout. Principal component analysis showed a clear separation both among the studied farms and months. Furthermore, significant correlations (p < 0.05) between the majority of the biochemical parameters were found, indicating that the environmental parameters can influence several blood parameters at the same time. The present study provides several useful norms for assessing the welfare of rainbow trout, indicating that the relationships among different parameters are important factors in interpreting the blood biochemical profiles.


2015 ◽  
Vol 117 (suppl_1) ◽  
Author(s):  
Matthew Wheeler ◽  
Daryl Waggott ◽  
Megan Grove ◽  
Frederick Dewey ◽  
Cuiping Pan ◽  
...  

Background: Technological advances have greatly reduced the cost of whole genome sequencing. For single individuals clinical application is apparent, while exome sequencing in tens of thousands of people has allowed a more global view of genetic variation that can inform interpretation of specific variants in individuals. We hypothesized that genome sequencing of patients with monogenic cardiomyopathy would facilitate discovery of genetic modifiers of phenotype. Methods and Results: We identified 48 individuals diagnosed with cardiomyopathy and with putative mutations in MYH7, the gene encoding beta myosin heavy chain. We carried out whole genome sequencing and applied a newly developed analytical pipeline optimized for discovery of genes modifying severity of clinical presentation and outcomes. Using a combination of external priors and rare variant burden tests we scored genes as potential modifiers. There were 96 genes that reached a modifier score of 6 out of 12 or better (9=2, 8=8, 7=17, 6=69). We identified NCKAP1, a gene that regulates actin filament dynamics, and CAMSAP1, a calmodulin regulate gene that regulates microtubule dynamics, as top scoring modifiers of hypertrophic cardiomyopathy phenotypes (score=9) while LDB2, RYR2, FBN1 and ATP1A2 had modifier scores of 8. Of the top scoring genes, 21 out of 96 were identified as candidates a priori. Our candidate prioritization scheme identified the previously described modifiers of cardiomyopathy phenotype, FHOD3 and MYBPC3, as top scoring genes. We identified structural variants in 21 clinically sequenced cardiomyopathy associated genes, 13 of which were at less than 10% frequency. Copy number variants in ILK and CSRP3 were nominally associated with ejection fraction (p=0.03), while 8 genes showed copy gains (GLA, FKTN, SGCD, TTN, SOS1, ANKRD1, VCL and NEBL). Structural variants were found in CSRP3, MYL3 and TNNC1, all of which have been implicated as causative for HCM. Conclusion: Evaluation of the whole genome sequence, even in the case of putatively monogenic disease, leads to important diagnostic and scientific insights not revealed by panel-based sequencing.


2009 ◽  
Vol 91 (6) ◽  
pp. 367-371 ◽  
Author(s):  
B. J. HAYES ◽  
I. M. MACLEOD ◽  
M. BARANSKI

SummaryA number of farmed species are characterized by breeding populations of large full-sib families, including aquaculture species and outcrossing plant species. Whole genome association studies in such species must account for stratification arising from the full-sib family structure to avoid high rates of false discovery. Here, we demonstrate the value of selective genotyping strategies which balance the contribution of families across high and low phenotypes to greatly reduce rates of false discovery with a minimal effect on power.


2021 ◽  
Author(s):  
Adéla Nosková ◽  
Meenu Bhati ◽  
Naveen Kumar Kadri ◽  
Danang Crysnanto ◽  
Stefan Neuenschwander ◽  
...  

Abstract Background The key-ancestor approach has been frequently applied to prioritize individuals for whole-genome sequencing based on their marginal genetic contribution to current populations. Using this approach, we selected 70 key ancestors from two lines of the Swiss Large White breed that have been selected divergently for fertility and fattening traits and sequenced their genomes with short paired-end reads. Results Using pedigree records, we estimated the effective population size of the dam and sire line to 72 and 44, respectively. In order to assess sequence variation in both lines, we sequenced the genomes of 70 boars at an average coverage of 16.69-fold. The boars explained 87.95 and 95.35% of the genetic diversity of the breeding populations of the dam and sire line, respectively. Reference-guided variant discovery using the GATK revealed 26,862,369 polymorphic sites. Principal component, admixture and FST analyses indicated considerable genetic differentiation between the lines. Genomic inbreeding quantified using runs of homozygosity was higher in the sire than dam line (0.28 vs 0.26). Using two complementary approaches (CLR and iHS), we detected 51 signatures of selection. However, only six signatures of selection overlapped between both lines. We used the sequenced haplotypes of the 70 key ancestors as a reference panel to call 22,618,811 genotypes in 175 pigs that had been sequenced at very low coverage (1.11-fold) using GLIMPSE. The genotype concordance, non-reference sensitivity and non-reference discrepancy between thus inferred and Illumina PorcineSNP60 BeadChip-called genotypes was 97.60, 98.73 and 3.24%, respectively. The low-pass sequencing-derived genomic relationship coefficients were highly correlated (r > 0.99) with those obtained from microarray genotyping. Conclusions We assessed genetic diversity within and between two lines of the Swiss Large White pig breed. Our analyses revealed considerable differentiation, even though the split into two populations occurred only few generations ago. The sequenced haplotypes of the key ancestor animals enabled us to implement genotyping by low-pass sequencing which offers an intriguing cost-effective approach to increase the variant density over current array-based genotyping by more than 350-fold.


2020 ◽  
Author(s):  
Christopher W. Whelan ◽  
Robert E. Handsaker ◽  
Giulio Genovese ◽  
Seva Kashin ◽  
Monkol Lek ◽  
...  

AbstractTwo intriguing forms of genome structural variation (SV) – dispersed duplications, and de novo rearrangements of complex, multi-allelic loci – have long escaped genomic analysis. We describe a new way to find and characterize such variation by utilizing identity-by-descent (IBD) relationships between siblings together with high-precision measurements of segmental copy number. Analyzing whole-genome sequence data from 706 families, we find hundreds of “IBD-discordant” (IBDD) CNVs: loci at which siblings’ CNV measurements and IBD states are mathematically inconsistent. We found that commonly-IBDD CNVs identify dispersed duplications; we mapped 95 of these common dispersed duplications to their true genomic locations through family-based linkage and population linkage disequilibrium (LD), and found several to be in strong LD with genome-wide association (GWAS) signals for common diseases or gene expression variation at their revealed genomic locations. Other CNVs that were IBDD in a single family appear to involve de novo mutations in complex and multi-allelic loci; we identified 26 de novo structural mutations that had not been previously detected in earlier analyses of the same families by diverse SV analysis methods. These included a de novo mutation of the amylase gene locus and multiple de novo mutations at chromosome 15q14. Combining these complex mutations with more-conventional CNVs, we estimate that segmental mutations larger than 1kb arise in about one per 22 human meioses. These methods are complementary to previous techniques in that they interrogate genomic regions that are home to segmental duplication, high CNV allele frequencies, and multi-allelic CNVs.Author SummaryCopy number variation is an important form of genetic variation in which individuals differ in the number of copies of segments of their genomes. Certain aspects of copy number variation have traditionally been difficult to study using short-read sequencing data. For example, standard analyses often cannot tell whether the duplicated copies of a segment are located near the original copy or are dispersed to other regions of the genome. Another aspect of copy number variation that has been difficult to study is the detection of mutations in the copy number of DNA segments passed down from parents to their children, particularly when the mutations affect genome segments which already display common copy number variation in the population. We develop an analytical approach to solving these problems when sequencing data is available for all members of families with at least two children. This method is based on determining the number of parental haplotypes the two siblings share at each location in their genome, and using that information to determine the possible inheritance patterns that might explain the copy numbers we observe in each family member. We show that dispersed duplications and mutations can be identified by looking for copy number variants that do not follow these expected inheritance patterns. We use this approach to determine the location of 95 common duplications which are dispersed to distant regions of the genome, and demonstrate that these duplications are linked to genetic variants that affect disease risk or gene expression levels. We also identify a set of copy number mutations not detected by previous analyses of sequencing data from a large cohort of families, and show that repetitive and complex regions of the genome undergo frequent mutations in copy number.


Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 3767-3767 ◽  
Author(s):  
Cody Ashby ◽  
Eileen M Boyle ◽  
Brian A Walker ◽  
Michael A Bauer ◽  
Katie Rose Ryan ◽  
...  

Background: Structural variants are key recurrent molecular features of myeloma (MM) with two types of complex rearrangement, chromoplexy and chromothripsis, having been described recently. The contribution of these to MM prognosis, rapid changes in clinical behavior and punctuated evolution is currently unknown as is the mechanism by which they deregulate gene function. Methods: We analyzed two sets of newly diagnosed MM data: 85 cases with phased whole genome sequencing; and 812 cases from CoMMpass where long-insert whole-genome sequencing was available. Patient derived xenografts from five MM cases were used to generate epigenetic maps for the histone marks, BRD4, MED1, H3K27Ac, H3K4me1, H3K4me3, H3K9me3, H3K36me3 and H3K27me3. Results: In the 10X data the median number of structural events per case was 25 (range 1 - 182); with a median of 14 intra-chromosomal events (range 1 - 179; P<0.001) and 7 inter-chromosomal events (range 0 - 29). Structural events were seen most frequently on chromosomes 14 (64%), 8 (53%), 1 (44%) and 6 (42%). Complex chromosomal rearrangements involving 3 or more chromosomal sites were seen in 46%, 4 or more sites in 20%, 5 or more in 10% and 6 or more in 5% of samples. There were significantly more structural events in the t(4;14) subgroup compared to the t(11;14) subgroup. Significantly more events were also seen in the bi-allelically inactivated TP53 cases. Using an elbow test defined cutoff, we identified cases with high structural variant load in 10% of cases. Chromoplexy called by "Chainfinder" was seen in 18% of cases. Chromothripsis called by "Shatterseek" was seen in 9% of cases. Cases with a high structural load alone were not associated with an adverse outcome whereas cases with chromoplexy or chromothripsis were associated with adverse PFS and OS, p=0.001. A new high-risk subgroup comprising approximately 5% of cases was identified with chromoplexy, chromothripsis and a high structural load. Gene set enrichment analysis of cases with chromoplexy and chromothripsis showed an excess of MYC, E2F and G2M targets, and a reduction in RAS signaling. Interferon a and g responses, an excess of TP53 and reduction in TRAF3 mutations was associated predominantly with chromothripsis. How chromoplexy and chromothripsis are tolerated by the cell is unknown and the association with the cGAS/STING response is further being explored. To determine how chromoplexy may deregulate multiple genes we identified the full spectrum of structural variants to the immunoglobulin (Ig) and non-Ig loci. A range of genes are deregulated by Ig loci including MAP3K14 at a frequency of 2% confirming the importance of non-canonical NFkB signaling. A novel intra-chromosomal rearrangement to ZFP36L1 was upregulated in 10% of cases but was not prognostic. Gene upregulation by non-Ig super enhancers is frequent and targets include PAX5, GLI3, CD40, NFKB1, MAP3K14, LRRC37A, LIPG, PHLDA3, ZNF267, CENPF, SLC44A2, MIER1, SOX30, TMEM258, PPIL1, and BUB3. The topologically associating domain (TADs) containing super enhancers bringing about gene deregulation include TXNDC5, FOXO3, FCHSD2, SP2, FAM46C, CACNA1C, TLCD2 and PIK3C2G. These super enhancers frequently contain important MM genes, the coding sequence of which are disrupted by the rearrangement and could contribute to the clinical phenotype. Accurately reconstructing the structure of the complex rearrangements will allow us to identify the mechanism of gene deregulation and to distinguish between either gene stacking, receptor stacking or both. Conclusions: Upregulation of gene expression by super enhancer rearrangement is a major mechanism of gene deregulation in MM and complex structural events contribute significantly to adverse prognosis by a range of mechanisms as well as simple gene overexpression. Disclosures Boyle: Amgen, Abbvie, Janssen, Takeda, Celgene Corporation: Honoraria; Amgen, Janssen, Takeda, Celgene Corporation: Other: Travel expenses. Walker:Celgene: Research Funding. Thakurta:Celgene: Employment, Equity Ownership. Flynt:Celgene Corporation: Employment, Equity Ownership. Davies:Amgen, Celgene, Janssen, Oncopeptides, Roche, Takeda: Membership on an entity's Board of Directors or advisory committees, Other: Consultant/Advisor; Janssen, Celgene: Other: Research Grant, Research Funding. Morgan:Amgen, Roche, Abbvie, Takeda, Celgene, Janssen: Honoraria, Membership on an entity's Board of Directors or advisory committees; Celgene: Other: research grant, Research Funding.


Sign in / Sign up

Export Citation Format

Share Document