scholarly journals Noninvasive Detection of Urothelial Carcinoma by Cost-effective Low-coverage Whole-genome Sequencing from Urine-Exfoliated Cell DNA

2020 ◽  
Vol 26 (21) ◽  
pp. 5646-5654
Author(s):  
Shuxiong Zeng ◽  
Yidie Ying ◽  
Naidong Xing ◽  
Baiyun Wang ◽  
Ziliang Qian ◽  
...  
2020 ◽  
Vol 38 (15_suppl) ◽  
pp. 1552-1552
Author(s):  
Shuxiong Zeng ◽  
Dongxing Nai ◽  
Yingdie Ye ◽  
Jiatao Ji ◽  
Zunlin Zhou ◽  
...  

1552 Background: Urothelial carcinoma (UC) is a malignancy with frequent chromosomal aberrations. The FISH assays were more sensitive as compared to cytology tests. Here we investigated cost-effective whole genome sequencing technology, which is able to detect all chromosomal aberrations for UC diagnoses. Methods: UC patients and control group are prospectively recruited in trial NCT03998371. First-morning-voided urine were freshly collected before TURBT or cystectomy. Urine Exfoliated Cells DNA was analyzed by illumina HiSeq X10, followed by genotyping by bioinformatics workflow UCAD. Results: 195 individuals were prospectively recruited. 121 UC patients and 67 non-tumor diseases were included in this study. 7 other malignancies as confirmed by pathological testing were excluded. Frequent chromosome copy number changes were found in cancer patients as compared non-tumor controls, including chromosome 3 gain, 17 gain, 7 gain and 9p loss used in FISH assays were found. In addition to that, chr9q loss, 8q gain, 5q loss, 17p loss, 11p loss, 1q gain, 8p loss, 10q loss, 6q loss, 4q loss and 11q loss were also frequent in cancer patients (AUC > 0.65). Metacentric chromosomes showed better AUC compared to acrocentric and telocentric chromosomes (P = 1.7e-03). A novel diagnosis model UCAD was built by incorporating all the chromosomal changes. The model reached performance of AUC = 0.933. At the optimal cutoff |Z| > = 3.16, the sensitivity, specificity and accuracy were 84.7%, 97.9% and 89.0% respectively. The prediction positivity was found correlated with urine microscopy visible epithelial cells (P = 0.00069), tumor invasiveness (Ta/Tis vs the other, P = 0.0048) and tumor grade (P = 0.0030), but not microscopy RBC/WBC findings, urine culture findings, smoke and drinking history. The UCAD model outperformed cytology tests by predicting all 16-cytology positive and 12 cytology negative tumors with comparable specificity. The model found 75.0% more tumors. And UCAD identified more upper urinary tract cancer (P = 0.012) and smaller tumors ( < 3cm, P = 5.9e-04). The adding of cytology to UCAD did not improve diagnosing sensitivity and specificity. UCAD reproduce the diagnoses among morning - void urine, morning, afternoon urine samples with correlation coefficient R2> 0.98. All the urine samples showed high concordances with matched tumor samples (R2> 0.85). Conclusions: UCAD could be a high specific, robust UC diagnoses method with improved sensitivity as compared to cytology tests. Clinical trial information: NCT03998371.


Author(s):  
Runyang Nicolas Lou ◽  
Arne Jacobs ◽  
Aryn Wilder ◽  
Nina Overgaard Therkildsen

Low-coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost-effective approach for population genomic studies in both model and non-model species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data, and how the distribution of sequencing effort between the number of samples analyzed and per-sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate how the per-sample cost for lcWGS is now comparable to RAD-seq and Pool-seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency and genetic diversity estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference, with a few notable exceptions. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in non-model species, and discuss current limitations and future perspectives for lcWGS-based population genomics research. With this overview, we hope to make lcWGS more approachable and stimulate its broader adoption.


Author(s):  
Runyang Nicolas Lou ◽  
Arne Jacobs ◽  
Aryn Wilder ◽  
Nina Overgaard Therkildsen

Low-coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost-effective approach for population genomic studies in both model and non-model species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data, and how the distribution of sequencing effort between the number of samples analyzed and per-sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate how the per-sample cost for lcWGS is now comparable to RAD-seq and Pool-seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency and genetic diversity estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference, with a few notable exceptions. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in non-model species, and discuss current limitations and future perspectives for lcWGS-based population genomics research. With this overview, we hope to make lcWGS more approachable and stimulate its broader adoption.


Author(s):  
Runyang Nicolas Lou ◽  
Arne Jacobs ◽  
Aryn Wilder ◽  
Nina Overgaard Therkildsen

Low-coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost-effective approach for population genomic studies in both model and non-model species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data and how the distribution of sequencing effort between the number of samples analyzed and per-sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate that the per-sample cost for lcWGS is now comparable to RAD-seq and Pool-seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference compared to sequencing fewer samples each at higher depth. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in non-model species, and discuss current limitations and future perspectives for lcWGS-based analysis. With this overview, we hope to make lcWGS more approachable and stimulate broader adoption.


2020 ◽  
Vol 36 (12) ◽  
pp. 3888-3889
Author(s):  
Alexandre Eeckhoutte ◽  
Alexandre Houy ◽  
Elodie Manié ◽  
Manon Reverdy ◽  
Ivan Bièche ◽  
...  

Abstract Summary We introduce shallowHRD, a software tool to evaluate tumor homologous recombination deficiency (HRD) based on whole genome sequencing (WGS) at low coverage (shallow WGS or sWGS; ∼1X coverage). The tool, based on mining copy number alterations profile, implements a fast and straightforward procedure that shows 87.5% sensitivity and 90.5% specificity for HRD detection. shallowHRD could be instrumental in predicting response to poly(ADP-ribose) polymerase inhibitors, to which HRD tumors are selectively sensitive. shallowHRD displays efficiency comparable to most state-of-art approaches, is cost-effective, generates low-storable outputs and is also suitable for fixed-formalin paraffin embedded tissues. Availability and implementation shallowHRD R script and documentation are available at https://github.com/aeeckhou/shallowHRD. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Runyang Nicolas Lou ◽  
Arne Jacobs ◽  
Aryn Wilder ◽  
Nina Overgaard Therkildsen

Low-coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost-effective approach for population genomic studies in both model and non-model species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data and how the distribution of sequencing effort between the number of samples analyzed and per-sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate that the per-sample cost for lcWGS is now comparable to RAD-seq and Pool-seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference compared to sequencing fewer samples each at higher depth. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in non-model species, and discuss current limitations and future perspectives for lcWGS-based analysis. With this overview, we hope to make lcWGS more approachable and stimulate broader adoption.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 81-82
Author(s):  
Joaquim Casellas ◽  
Melani Martín de Hijas-Villalba ◽  
Marta Vázquez-Gómez ◽  
Samir Id Lahoucine

Abstract Current European regulations for autochthonous livestock breeds put a special emphasis on pedigree completeness, which requires laboratory paternity testing by genetic markers in most cases. This entails significant economic expenditure for breed societies and precludes other investments in breeding programs, such as genomic evaluation. Within this context, we developed paternity testing through low-coverage whole-genome data in order to reuse these data for genomic evaluation at no cost. Simulations relied on diploid genomes composed by 30 chromosomes (100 cM each) with 3,000,000 SNP per chromosome. Each population evolved during 1,000 non-overlapping generations with effective size 100, mutation rate 10–4, and recombination by Kosambi’s function. Only those populations with 1,000,000 ± 10% polymorphic SNP per chromosome in generation 1,000 were retained for further analyses, and expanded to the required number of parents and offspring. Individuals were sequenced at 0.01, 0.05, 0.1, 0.5 and 1X depth, with 100, 500, 1,000 or 10,000 base-pair reads and by assuming a random sequencing error rate per SNP between 10–2 and 10–5. Assuming known allele frequencies in the population and sequencing error rate, 0.05X depth sufficed to corroborate the true father (85,0%) and to discard other candidates (96,3%). Those percentages increased up to 99,6% and 99,9% with 0,1X depth, respectively (read length = 10,000 bp; smaller read lengths slightly improved the results because they increase the number of sequenced SNP). Results were highly sensitive to biases in allele frequencies and robust to inaccuracies regarding sequencing error rate. Low-coverage whole-genome sequencing data could be subsequently integrated into genomic BLUP equations by appropriately constructing the genomic relationship matrix. This approach increased the correlation between simulated and predicted breeding values by 1.21% (h2 = 0.25; 100 parents and 900 offspring; 0.1X depth by 10,000 bp reads). Although small, this increase opens the door to genomic evaluation in local livestock breeds.


Sign in / Sign up

Export Citation Format

Share Document