Whole genome sequencing for quantifying germline mutation frequency in humans and model species: Cautious optimism

2012 ◽  
Vol 750 (2) ◽  
pp. 96-106 ◽  
Author(s):  
Marc A. Beal ◽  
Travis C. Glenn ◽  
Christopher M. Somers
Cell ◽  
2012 ◽  
Vol 151 (7) ◽  
pp. 1431-1442 ◽  
Author(s):  
Jacob J. Michaelson ◽  
Yujian Shi ◽  
Madhusudan Gujral ◽  
Hancheng Zheng ◽  
Dheeraj Malhotra ◽  
...  

Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 924-924
Author(s):  
Anna Stengel ◽  
Alexander Höllein ◽  
Wolfgang Kern ◽  
Manja Meggendorfer ◽  
Claudia Haferlach ◽  
...  

Abstract Background: Persistent polyclonal B-cell lymphocytosis (PPBL) is a rare disorder, occurs almost exclusively in smoking women and is characterized by a chronic polyclonal lymphocytosis with circulating binucleated lymphocytes, clonal cytogenetic abnormalities involving chromosome 3, and chromosomal instability. Outcome of PPBL patients is mostly benign, but subsequent malignancies (non-Hodgkin´s lymphomas and solid tumors) were described. Potential molecular factors leading to their development are yet unclear. Aims: Detailed molecular genetic characterization of PPBL by whole genome sequencing (WGS) and RNA sequencing (RNAseq) in comparison to the well-characterized lymphoid malignancy CLL. Patient cohorts and methods: The total cohort comprised 27 PPBL (3 male, 24 female) and 250 CLL cases (163 male, 87 female). WGS was performed for all patients: 150bp paired-end reads where generated on Illumina HiseqX and NovaSeq 6000 machines (Illumina, San Diego, CA). A mixture genomic DNA from multiple anonymous donors was used as normal controls. To remove potential germline variants, each variant was queried against the gnomAD database, variants with global population frequencies >1% where excluded. Final analysis was performed only on protein-altering and splice-site variants. For further analysis, a virtual panel of 355 lymphoid genes was selected. All reported p-values are two-sided and were considered significant at p<0.05. For gene expression analysis, estimated gene counts were normalized applying Trimmed mean of M-values (TMM) normalization method and the resulting log2 counts per million (CPMs) were used as a proxy of gene expression in each sample. Genes were kept if they were expressed (> 5 CPM) in at least 66% of the samples. Genes with FDR (false discovery rate) < 0.05 and an absolute logFC > 1.5 were considered differentially expressed (DE). Results: Median age was 46 years for PPBL patients (range: 23-67 years) and 67 years for CLL patients (range: 39-94 years). Mean number of mutations per patient was 18 for PPBL and 20 for CLL. For both entities, the majority of mutations were missense mutations (88% in PPBL vs. 81% in CLL), followed by splice-site mutations (7% vs. 10%), other mutation types were only rarely detected. In PPBL, 42 genes were found to be mutated at a frequency of >15%, including ATM (22%), CREBBP (19%), NCOR2 (19%), AHNAK2 (15%), JAK3 (15%), NOTCH2 (15%) and TRAF1 (15%), all of which have been associated with a variety of cancers. Moreover, ATM, NOTCH2 and TRAF1 mutations were described before to be associated with lymphomas. In PPBL patients, mutations in TRAF1 and ATM as well as mutations in TRAF1 and NOTCH2 were found to be mutually exclusive. For CLL patients, 29 genes showed a mutation frequency of >15%, comprising ATM (26%), KMT2D (23%), NOTCH1 (23%), LRP1B (19%), TP53 (16%) and CREBBP (15%). Comparison of the mutation frequencies between the two entities revealed several genes with significant differences: whereas mutations in CKAP5 (11% vs. 2%, p=0.022), DNMT3A (11% vs. 3%, p=0.033), MAP2 (19% vs. 4%, p=0.009), ROBO1 (15% vs. 4%, p=0.046) and TRAF1 (15% vs. 2%, p=0.006) were found to be more frequent in PPBL cases compared to CLL cases, KMT2D (4% vs. 23%, p=0.014), TDRD6 (0% vs. 14%, p=0.032) and TP53 (4% vs. 16%, p=0.048) mutations were more abundantly detected in CLL patients. Moreover, NOTCH1 was mutated more frequently in CLL cases (7% vs. 23%, p=0.082), whereas mutated NOTCH2 (known to be frequently mutated in splenic marginal zone lymphoma), was more abundant in PPBL patients (15% vs. 6%, p=0.116), although both correlations were not statistically significant. Gene expression analyses by RNAseq revealed 337 genes to be differentially expressed between the entities. 207 genes were upregulated in PPBL, including PTPRK, CXCR1, BCL11B, CEPBA, CCR4 and MYC, whereas 130 genes were found to be upregulated in CLL cases, comprising ID3, BCL2, FGF2 and FLT1. Conclusions: 1) WGS analysis identifies high frequencies of cancer/lymphoma-associated gene mutations in PPBL, including mutated ATM, NOTCH2 and TRAF1. 2) Five genes showed a higher mutation frequency compared to CLL including TRAF1,DNMT3A, CKAP5 and MAP2. 3) Lymphoma associated genes (BCL11B and MYC) were overexpressed in PPBL vs CLL. 4) Taken together our results question PPBL as a benign entity and identify molecular markers that might contribute to development of subsequent malignancies. Disclosures Stengel: MLL Munich Leukemia Laboratory: Employment. Höllein:MLL Munich Leukemia Laboratory: Employment. Kern:MLL Munich Leukemia Laboratory: Employment, Equity Ownership. Meggendorfer:MLL Munich Leukemia Laboratory: Employment. Haferlach:MLL Munich Leukemia Laboratory: Employment, Equity Ownership. Walter:MLL Munich Leukemia Laboratory: Employment. Hutter:MLL Munich Leukemia Laboratory: Employment. Haferlach:MLL Munich Leukemia Laboratory: Employment, Equity Ownership.


Author(s):  
Runyang Nicolas Lou ◽  
Arne Jacobs ◽  
Aryn Wilder ◽  
Nina Overgaard Therkildsen

Low-coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost-effective approach for population genomic studies in both model and non-model species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data, and how the distribution of sequencing effort between the number of samples analyzed and per-sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate how the per-sample cost for lcWGS is now comparable to RAD-seq and Pool-seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency and genetic diversity estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference, with a few notable exceptions. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in non-model species, and discuss current limitations and future perspectives for lcWGS-based population genomics research. With this overview, we hope to make lcWGS more approachable and stimulate its broader adoption.


Author(s):  
Runyang Nicolas Lou ◽  
Arne Jacobs ◽  
Aryn Wilder ◽  
Nina Overgaard Therkildsen

Low-coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost-effective approach for population genomic studies in both model and non-model species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data, and how the distribution of sequencing effort between the number of samples analyzed and per-sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate how the per-sample cost for lcWGS is now comparable to RAD-seq and Pool-seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency and genetic diversity estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference, with a few notable exceptions. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in non-model species, and discuss current limitations and future perspectives for lcWGS-based population genomics research. With this overview, we hope to make lcWGS more approachable and stimulate its broader adoption.


2019 ◽  
Vol 3 (2) ◽  
pp. 7 ◽  
Author(s):  
Yeong Deuk Jo ◽  
Jin-Baek Kim

Mutation breeding and functional genomics studies of mutant populations have made important contributions to plant research involving the application of radiation. The frequency and spectrum of induced mutations have long been regarded as the crucial determinants of the efficiency of the development and use of mutant populations. Systematic studies regarding the mutation frequency and spectrum, including genetic and genomic analyses, have recently resulted in considerable advances. These studies have consistently shown that the mutation frequency and spectrum are affected by diverse factors, including radiation type, linear energy transfer, and radiation dose, as well as the plant tissue type and condition. Moreover, the whole-genome sequencing of mutant individuals based on next-generation sequencing technologies has enabled the genome-wide quantification of mutation frequencies according to DNA mutation types as well as the elucidation of mutation mechanisms based on sequence characteristics. These studies will contribute to the development of a highly efficient and more controlled mutagenesis method relevant for the customized research of plants. We herein review the characteristics of radiation-induced mutations in plants, mainly focusing on recent whole-genome sequencing analyses as well as factors affecting the mutation frequency and spectrum.


Author(s):  
Runyang Nicolas Lou ◽  
Arne Jacobs ◽  
Aryn Wilder ◽  
Nina Overgaard Therkildsen

Low-coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost-effective approach for population genomic studies in both model and non-model species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data and how the distribution of sequencing effort between the number of samples analyzed and per-sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate that the per-sample cost for lcWGS is now comparable to RAD-seq and Pool-seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference compared to sequencing fewer samples each at higher depth. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in non-model species, and discuss current limitations and future perspectives for lcWGS-based analysis. With this overview, we hope to make lcWGS more approachable and stimulate broader adoption.


Breast Cancer ◽  
2021 ◽  
Author(s):  
Liang Zhu ◽  
Jia-Ni Pan ◽  
Ziliang Qian ◽  
Wei-Wu Ye ◽  
Xiao-Jia Wang ◽  
...  

Abstract Background Though BRCA1 mutation is the most susceptible factor of breast cancer, its prognostic value is disputable. Here in this study, we use a novel method which based on whole-genome analysis to evaluate the chromosome instability (CIN) value and identified the potential relationship between CIN and prognosis of breast cancer patients with germline-BRCA1 mutation. Materials and methods Sanger sequencing or a 98-gene panel sequencing assay was used to screen for BRCA1 germline small mutations in 1151 breast cancer patients with high-risk factors. MLPA assay was employed to screen BRCA1 large genomic rearrangements in familial breast cancer patients with BRCA1 negative for small mutations. Thirty-two samples with unique BRCA1 germline mutation patterns were further subjected to CIN evaluation by LPWGS (low-pass whole-genome sequencing) technology. Results Firstly, 113 patients with germline BRCA1 mutations were screened from the cohort. Further CIN analysis by the LPWGS assay indicated that CIN was independent from the mutation location or type of BRCA1. Patients with high CIN status had shorter disease-free survival rates (DFS) (HR = 6.54, 95% CI 1.30–32.98, P = 0.034). The TP53 copy loss was also characterized by LPWGS assay. The rates of TP53 copy loss in CIN high and CIN low groups were 85.71% (12/14) and 16.67% (3/18), respectively. Conclusion CIN-high is a prognostic factor correlated with shorter DFS and was independent with the germline BRCA1 mutation pattern. Higher CIN values were significantly correlated with TP53 copy loss in breast cancer patients with germline BRCA1 mutation. Our results revealed a reliable molecular parameter for distinguishing patients with poor prognosis from the BRCA1-mutated breast cancer patients.


Author(s):  
Runyang Nicolas Lou ◽  
Arne Jacobs ◽  
Aryn Wilder ◽  
Nina Overgaard Therkildsen

Low-coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost-effective approach for population genomic studies in both model and non-model species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data and how the distribution of sequencing effort between the number of samples analyzed and per-sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate that the per-sample cost for lcWGS is now comparable to RAD-seq and Pool-seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference compared to sequencing fewer samples each at higher depth. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in non-model species, and discuss current limitations and future perspectives for lcWGS-based analysis. With this overview, we hope to make lcWGS more approachable and stimulate broader adoption.


Sign in / Sign up

Export Citation Format

Share Document