scholarly journals CaMuS: simultaneous fitting and de novo imputation of cancer mutational signature

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Maria Cartolano ◽  
Nima Abedpour ◽  
Viktor Achter ◽  
Tsun-Po Yang ◽  
Sandra Ackermann ◽  
...  

Abstract The identification of the mutational processes operating in tumour cells has implications for cancer diagnosis and therapy. These processes leave mutational patterns on the cancer genomes, which are referred to as mutational signatures. Recently, 81 mutational signatures have been inferred using computational algorithms on sequencing data of 23,879 samples. However, these published signatures may not always offer a comprehensive view on the biological processes underlying tumour types that are not included or underrepresented in the reference studies. To circumvent this problem, we designed CaMuS (Cancer Mutational Signatures) to construct de novo signatures while simultaneously fitting publicly available mutational signatures. Furthermore, we propose to estimate signature similarity by comparing probability distributions using the Hellinger distance. We applied CaMuS to infer signatures of mutational processes in poorly studied cancer types. We used whole genome sequencing data of 56 neuroblastoma, thus providing evidence for the versatility of CaMuS. Using simulated data, we compared the performance of CaMuS to sigfit, a recently developed algorithm with comparable inference functionalities. CaMuS and sigfit reconstructed the simulated datasets with similar accuracy; however two main features may argue for CaMuS over sigfit: (i) superior computational performance and (ii) a reliable parameter selection method to avoid spurious signatures.

2021 ◽  
Author(s):  
Boas CL van der Putten ◽  
Niek AH Huijsmans ◽  
Daniel R Mende ◽  
Constance Schultsz

Phylogenetic analyses are widely used in microbiological research, for example to trace the progression of bacterial outbreaks based on whole-genome sequencing data. In practice, multiple analysis steps such as de novo assembly, alignment and phylogenetic inference are combined to form phylogenetic workflows. Comprehensive benchmarking of the accuracy of complete phylogenetic workflows is lacking. To benchmark different phylogenetic workflows, we simulated bacterial evolution under a wide range of evolutionary models, varying the relative rates of substitution, insertion, deletion, gene duplication, gene loss and lateral gene transfer events. The generated datasets corresponded to a genetic diversity usually observed within bacterial species (≥95% average nucleotide identity). We replicated each simulation three times to assess replicability. In total, we benchmarked seventeen distinct phylogenetic workflows using 8 different simulated datasets. We found that recently developed k-mer alignment methods such as kSNP and SKA achieve similar accuracy as reference mapping. The high accuracy of k-mer alignment methods can be explained by the large fractions of genomes these methods can align, relative to other approaches. We also found that the choice of de novo assembly algorithm influences the accuracy of phylogenetic reconstruction, with workflows employing SPAdes or SKESA outperforming those employing Velvet. Finally, we found that the results of phylogenetic benchmarking are highly variable between replicates. We conclude that for phylogenomic reconstruction k-mer alignment methods are relevant alternatives to reference mapping at species level, especially in the absence of suitable reference genomes. We show de novo genome assembly accuracy to be an underappreciated parameter required for accurate phylogenomic reconstruction.


2020 ◽  
Vol 21 (S21) ◽  
Author(s):  
Zicheng Zhao ◽  
Yingxiao Zhou ◽  
Shuai Wang ◽  
Xiuqing Zhang ◽  
Changfa Wang ◽  
...  

Abstract Background Genome assembly is fundamental for de novo genome analysis. Hybrid assembly, utilizing various sequencing technologies increases both contiguity and accuracy. While such approaches require extra costly sequencing efforts, the information provided millions of existed whole-genome sequencing data have not been fully utilized to resolve the task of scaffolding. Genetic recombination patterns in population data indicate non-random association among alleles at different loci, can provide physical distance signals to guide scaffolding. Results In this paper, we propose LDscaff for draft genome assembly incorporating linkage disequilibrium information in population data. We evaluated the performance of our method with both simulated data and real data. We simulated scaffolds by splitting the pig reference genome and reassembled them. Gaps between scaffolds were introduced ranging from 0 to 100 KB. The genome misassembly rate is 2.43% when there is no gap. Then we implemented our method to refine the Giant Panda genome and the donkey genome, which are purely assembled by NGS data. After LDscaff treatment, the resulting Panda assembly has scaffold N50 of 3.6 MB, 2.5 times larger than the original N50 (1.3 MB). The re-assembled donkey assembly has an improved N50 length of 32.1 MB from 23.8 MB. Conclusions Our method effectively improves the assemblies with existed re-sequencing data, and is an potential alternative to the existing assemblers required for the collection of new data.


2018 ◽  
Author(s):  
Yulia Rubanova ◽  
Ruian Shi ◽  
Caitlin F Harrigan ◽  
Roujia Li ◽  
Jeff Wintersinger ◽  
...  

ABSTRACTWe present a new method, TrackSig, to estimate the evolutionary trajectories of signatures of different somatic mutational processes from DNA sequencing data from a single, bulk tumour sample. TrackSig uses probability distributions over mutation types, called mutational signatures, to represent different mutational processes and detects the changes in the signature activity using an optimal segmentation algorithm that groups somatic mutations based on their estimated cancer cellular fraction (CCF) and their mutation type (e.g. CAG->CTG). We use two different simulation frameworks to assess both TrackSig’s reconstruction accuracy and its robustness to violations of its assumptions, as well as to compare it to a baseline approach. We find 2-4% median error in reconstructing the signature activities on simulations with varying difficulty with one to three subclones at an average depth of 30x. The size and the direction of the activity change is consistent in 83% and 95% of cases respectively. There were an average of 0.02 missed and 0.12 false positive subclones per sample. In our simulations, grouping mutations by mutation type (TrackSig), rather than by clustering CCF (baseline strategy), performs better at estimating signature activities and at identifying subclonal populations in the complex scenarios like branching, CNA gain, violation of infinite site assumption, and the inclusion of neutrally evolving mutations. TrackSig is open source software, freely available at https://github.com/morrislab/TrackSig.


2018 ◽  
Author(s):  
Avantika Lal ◽  
Keli Liu ◽  
Robert Tibshirani ◽  
Arend Sidow ◽  
Daniele Ramazzotti

AbstractCancer is the result of mutagenic processes that can be inferred from tumor genomes by analyzing rate spectra of point mutations, or “mutational signatures”. Here we present SparseSignatures, a novel framework to extract signatures from somatic point mutation data. Our approach incorporates DNA replication error as a background, employs regularization to reduce noise in non-background signatures, uses cross-validation to identify the number of signatures, and is scalable to large datasets. We show that SparseSignatures outperforms current state-of-the-art methods on simulated data using standard metrics. We then apply SparseSignatures to whole genome sequences of 147 tumors from pancreatic cancer, discovering 8 signatures in addition to the background.


2022 ◽  
Author(s):  
Lars Wienbrandt ◽  
David Ellinghaus

Background: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. Methods: We developed EagleImp, a software with algorithmic and technical improvements and new features for accurate and accelerated phasing and imputation in a single tool. Results: We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with more than 1 million reference genomes. EagleImp is 2 to 10 times faster (depending on the single or multiprocessor configuration selected) than Eagle2/PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp provides same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. It has many new features, including automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files, and various user-configurable algorithm and output options. Conclusions: Due to the technical optimisations, EagleImp can perform fast and accurate reference-based phasing and imputation for future very large reference panels with more than 1 million genomes. EagleImp is freely available for download from https://github.com/ikmb/eagleimp.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Lilian J. Gehrke ◽  
Maulik Upadhyay ◽  
Kristin Heidrich ◽  
Elisabeth Kunz ◽  
Daniela Klaus-Halla ◽  
...  

Abstract Polledness in cattle is an autosomal dominant trait. Previous studies have revealed allelic heterogeneity at the polled locus and four different variants were identified, all in intergenic regions. In this study, we report a case of polled bull (FV-Polled1) born to horned parents, indicating a de novo origin of this polled condition. Using 50K genotyping and whole genome sequencing data, we identified on chromosome 2 an 11-bp deletion (AC_000159.1:g.52364063_52364073del; Del11) in the second exon of ZEB2 gene as the causal mutation for this de novo polled condition. We predicted that the deletion would shorten the protein product of ZEB2 by almost 91%. Moreover, we showed that all animals carrying Del11 mutation displayed symptoms similar to Mowat-Wilson syndrome (MWS) in humans, which is also associated with genetic variations in ZEB2. The symptoms in cattle include delayed maturity, small body stature and abnormal shape of skull. This is the first report of a de novo dominant mutation affecting only ZEB2 and associated with a genetic absence of horns. Therefore our results demonstrate undoubtedly that ZEB2 plays an important role in the process of horn ontogenesis as well as in the regulation of overall development and growth of animals.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Clémentine Escouflaire ◽  
Emmanuelle Rebours ◽  
Mathieu Charles ◽  
Sébastien Orellana ◽  
Margarita Cano ◽  
...  

Abstract Background In mammals, hypohidrotic ectodermal dysplasia (HED) is a genetic disorder that is characterized by sparse hair, tooth abnormalities, and defects in cutaneous glands. Only four genes, EDA, EDAR, EDARADD and WNT10A account for more than 90% of HED cases, and EDA, on chromosome X, is involved in 50% of the cases. In this study, we explored an isolated case of a female Holstein calf with symptoms similar to HED. Results Clinical examination confirmed the diagnosis. The affected female showed homogeneous hypotrichosis and oligodontia as previously observed in bovine EDAR homozygous and EDA hemizygous mutants. Under light microscopy, the hair follicles were thinner and located higher in the dermis of the frontal skin in the affected animal than in the control. Moreover, the affected animal showed a five-fold increase in the number of hair follicles and a four-fold decrease in the diameter of the pilary canals. Pedigree analysis revealed that the coefficient of inbreeding of the affected calf (4.58%) was not higher than the average population inbreeding coefficient (4.59%). This animal had ten ancestors in its paternal and maternal lineages. By estimating the number of affected cases that would be expected if any of these common ancestors carried a recessive mutation, we concluded that, if they existed, other cases of HED should have been reported in France, which is not the case. Therefore, we assumed that the causal mutation was dominant and de novo. By analyzing whole-genome sequencing data, we identified a large chromosomal inversion with breakpoints located in the first introns of the EDA and XIST genes. Genotyping by PCR-electrophoresis the case and its parents allowed us to demonstrate the de novo origin of this inversion. Finally, using various sources of information we present a body of evidence that supports the hypothesis that this mutation is responsible for a skewed inactivation of X, and that only the normal X can be inactivated. Conclusions In this article, we report a unique case of X-linked HED affected Holstein female calf with an assumed full inactivation of the normal X-chromosome, thus leading to a severe phenotype similar to that of hemizygous males.


2017 ◽  
Author(s):  
Adriana Munoz ◽  
Boris Yamrom ◽  
Yoon-ha Lee ◽  
Peter Andrews ◽  
Steven Marks ◽  
...  

AbstractCopy number profiling and whole-exome sequencing has allowed us to make remarkable progress in our understanding of the genetics of autism over the past ten years, but there are major aspects of the genetics that are unresolved. Through whole-genome sequencing, additional types of genetic variants can be observed. These variants are abundant and to know which are functional is challenging. We have analyzed whole-genome sequencing data from 510 of the Simons Simplex Collections quad families and focused our attention on intronic variants. Within the introns of 546 high-quality autism target genes, we identified 63 de novo indels in the affected and only 37 in the unaffected siblings. The difference of 26 events is significantly larger than expected (p-val = 0.01) and using reasonable extrapolation shows that de novo intronic indels can contribute to at least 10% of simplex autism. The significance increases if we restrict to the half of the autism targets that are intolerant to damaging variants in the normal human population, which half we expect to be even more enriched for autism genes. For these 273 targets we observe 43 and 20 events in affected and unaffected siblings, respectively (p-value of 0.005). There was no significant signal in the number of de novo intronic indels in any of the control sets of genes analyzed. We see no signal from de novo substitutions in the introns of target genes.


2020 ◽  
Author(s):  
Hokuto Nakayama ◽  
Steven D. Rowland ◽  
Zizhang Cheng ◽  
Kristina Zumstein ◽  
Julie Kang ◽  
...  

AbstractDomesticated plants and animals display tremendous diversity in various phenotypic traits and often this diversity is seen within the same species. Tomato (Solanum lycopersicum; Solanaceae) cultivars show wide variation in leaf morphology, but the influence of breeding efforts in sculpting this diversity is not known. Here, we demonstrate that a single nucleotide deletion in the homeobox motif of BIPINNATA, which is a BEL-LIKE HOMEODOMAIN gene, led to a highly complex leaf phenotype in an heirloom tomato, Silvery Fir Tree (SiFT). Additionally, a comparative gene network analysis revealed that reduced expression of the ortholog of WUSCHEL RELATED HOMEOBOX 1 is also important for the narrow leaflet phenotype seen in SiFT. Phylogenetic and comparative genome analysis using whole-genome sequencing data suggests that the bip mutation in SiFT is likely a de novo mutation, instead of standing genetic variation. These results provide new insights into natural variation in phenotypic traits introduced into crops during improvement processes after domestication.


Sign in / Sign up

Export Citation Format

Share Document