scholarly journals Investigating Human Mitochondrial Genomes in Single Cells

Genes ◽  
2020 ◽  
Vol 11 (5) ◽  
pp. 534
Author(s):  
Maria Angela Diroma ◽  
Angelo Sante Varvara ◽  
Marcella Attimonelli ◽  
Graziano Pesole ◽  
Ernesto Picardi

Mitochondria host multiple copies of their own small circular genome that has been extensively studied to trace the evolution of the modern eukaryotic cell and discover important mutations linked to inherited diseases. Whole genome and exome sequencing have enabled the study of mtDNA in a large number of samples and experimental conditions at single nucleotide resolution, allowing the deciphering of the relationship between inherited mutations and phenotypes and the identification of acquired mtDNA mutations in classical mitochondrial diseases as well as in chronic disorders, ageing and cancer. By applying an ad hoc computational pipeline based on our MToolBox software, we reconstructed mtDNA genomes in single cells using whole genome and exome sequencing data obtained by different amplification methodologies (eWGA, DOP-PCR, MALBAC, MDA) as well as data from single cell Assay for Transposase Accessible Chromatin with high-throughput sequencing (scATAC-seq) in which mtDNA sequences are expected as a byproduct of the technology. We show that assembled mtDNAs, with the exception of those reconstructed by MALBAC and DOP-PCR methods, are quite uniform and suitable for genomic investigations, enabling the study of various biological processes related to cellular heterogeneity such as tumor evolution, neural somatic mosaicism and embryonic development.

Author(s):  
Zhisong He ◽  
Agnieska Brazovskaja ◽  
Sebastian Ebert ◽  
J. Gray Camp ◽  
Barbara Treutlein

Technologies to sequence the transcriptome, genome or epigenome from thousands of single cells in an experiment provide extraordinary resolution into the molecular states present within a complex biological system at any given moment. However, it is a major challenge to integrate single-cell sequencing data across experiments, conditions, batches, timepoints and other technical considerations. New computational methods are required that can integrate samples while simultaneously preserving biological information. Here, we propose an unsupervised reference-free data representation, Cluster Similarity Spectrum (CSS), where each cell is represented by its similarities to clusters independently identified across samples. We show that CSS can be used to assess cellular heterogeneity and enable reconstruction of differentiation trajectories from cerebral organoid single-cell transcriptomic data, and to integrate data across experimental conditions and human individuals. We compare CSS to other integration algorithms and show that it can outperform other methods in certain integration scenarios. We also show that CSS allows projection of single-cell genomic data of different modalities to the CSS-represented reference atlas for visualization and cell type identity prediction. In summary, CSS provides a straightforward and powerful approach to understand and integrate challenging single-cell multi-omic data.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sung Yong Park ◽  
Gina Faraci ◽  
Pamela M. Ward ◽  
Jane F. Emerson ◽  
Ha Youn Lee

AbstractCOVID-19 global cases have climbed to more than 33 million, with over a million total deaths, as of September, 2020. Real-time massive SARS-CoV-2 whole genome sequencing is key to tracking chains of transmission and estimating the origin of disease outbreaks. Yet no methods have simultaneously achieved high precision, simple workflow, and low cost. We developed a high-precision, cost-efficient SARS-CoV-2 whole genome sequencing platform for COVID-19 genomic surveillance, CorvGenSurv (Coronavirus Genomic Surveillance). CorvGenSurv directly amplified viral RNA from COVID-19 patients’ Nasopharyngeal/Oropharyngeal (NP/OP) swab specimens and sequenced the SARS-CoV-2 whole genome in three segments by long-read, high-throughput sequencing. Sequencing of the whole genome in three segments significantly reduced sequencing data waste, thereby preventing dropouts in genome coverage. We validated the precision of our pipeline by both control genomic RNA sequencing and Sanger sequencing. We produced near full-length whole genome sequences from individuals who were COVID-19 test positive during April to June 2020 in Los Angeles County, California, USA. These sequences were highly diverse in the G clade with nine novel amino acid mutations including NSP12-M755I and ORF8-V117F. With its readily adaptable design, CorvGenSurv grants wide access to genomic surveillance, permitting immediate public health response to sudden threats.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Kelley Paskov ◽  
Jae-Yoon Jung ◽  
Brianna Chrisman ◽  
Nate T. Stockham ◽  
Peter Washington ◽  
...  

Abstract Background As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample. Results We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method’s versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites. Conclusion Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.


Blood ◽  
2013 ◽  
Vol 122 (21) ◽  
pp. 3995-3995
Author(s):  
Arati V. Rao ◽  
Yuri D. Fedoriw ◽  
Kristy L. Richards ◽  
Zhen Sun ◽  
Cassandra L Love ◽  
...  

Abstract Background Over 90% of Ph-positive chronic myelogenous leukemia (“typical CML”) patients have breakpoints in the M-bcr, which typically result in b2a2 (e13a2) and/or b3a2 (e14a2) fusion mRNAs, both of which are translated into the p210 BCR-ABL protein. CML patients with the p190 BCR-ABL (m-bcr) or p230 BCR-ABL (μ-bcr) fusion genes have been reported. Atypical BCR breakpoints outside these cluster regions are extremely rare. For instance, only 8 cases have been described of e6a2 fusion CML. Very little is known about the clinical or biological characteristics of this subtype of CML, including the role of collaborating gene mutations in the development of disease. In this study, we defined the gene mutations that occurred in a rare e6a2 CML case and compared the observed gene mutations to those in “typical” chronic phase (CP)-CML cases. To our knowledge, this is the first comparison of the genetic mutations occurring in typical CML and in this rare atypical form of CML. Methodology We identified the index e6a2 CML patient, and eight additional typical CML patients for whom we had bone marrow aspirate, peripheral blood and paired normal tissue. We performed whole-exome sequencing for all of these samples using the Agilent solution-based system of exon capture, which uses RNA baits to target all protein coding genes (CCDS database), as well as ∼700 human miRNAs from miRBase (v13). In all, we generated over 3 GB of sequencing data using high throughput sequencing on the Illumina platform. Results We identified 15 candidate cancer genes that were somatically mutated in our e6a2 CML patient. Commonly implicated biological processes comprising these genes included transcription (STAT5A, TET2, GTF2F1), cellular differentiation (TP73), and signal transduction (GPR116). Interestingly, the majority of these mutations also occurred in typical CML, albeit at lower frequency. Thus, genes mutated common to our atypical case and typical CMLs included STAT5A, TET2, GTF2F1, ABL1 and CYP2A6. Thus, while atypical e6a2 BCR-ABL fusion CML cases are extremely rare, they appear to share many aspects of the biology with typical CMLs. Conclusion This study represents an in-depth analysis of a rare e6a2 CML in combination with one of the first analyses of gene mutations that occur in typical CML. Our data provide a significant first step to identifying genes that play a role in the pathogenesis along with BCR-ABL that perhaps contribute to drug resistance, and ultimately impact overall survival. Disclosures: No relevant conflicts of interest to declare.


2015 ◽  
Vol 25 (3) ◽  
pp. 316-327 ◽  
Author(s):  
Hoon Kim ◽  
Siyuan Zheng ◽  
Seyed S. Amini ◽  
Selene M. Virk ◽  
Tom Mikkelsen ◽  
...  

Author(s):  
Michael I. Love ◽  
Alena Myšičková ◽  
Ruping Sun ◽  
Vera Kalscheuer ◽  
Martin Vingron ◽  
...  

Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods.


2021 ◽  
Author(s):  
Víctor García-Olivares ◽  
Adrián Muñoz-Barrera ◽  
José Miguel Lorenzo-Salazar ◽  
Carlos Zaragoza-Trello ◽  
Luis A. Rubio-Rodríguez ◽  
...  

AbstractThe mitochondrial genome (mtDNA) is of interest for a range of fields including evolutionary, forensic, and medical genetics. Human mitogenomes can be classified into evolutionary related haplogroups that provide ancestral information and pedigree relationships. Because of this and the advent of high-throughput sequencing (HTS) technology, there is a diversity of bioinformatic tools for haplogroup classification. We present a benchmarking of the 11 most salient tools for human mtDNA classification using empirical whole-genome (WGS) and whole-exome (WES) short-read sequencing data from 36 unrelated donors. Besides, because of its relevance, we also assess the best performing tool in third-generation long noisy read WGS data obtained with nanopore technology for a subset of the donors. We found that, for short-read WGS, most of the tools exhibit high accuracy for haplogroup classification irrespective of the input file used for the analysis. However, for short-read WES, Haplocheck and MixEmt were the most accurate tools. Based on the performance shown for WGS and WES, and the accompanying qualitative assessment, Haplocheck stands out as the most complete tool. For third-generation HTS data, we also showed that Haplocheck was able to accurately retrieve mtDNA haplogroups for all samples assessed, although only after following assembly-based approaches (either based on a referenced-based assembly or a hybrid de novo assembly). Taken together, our results provide guidance for researchers to select the most suitable tool to conduct the mtDNA analyses from HTS data.


Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. 535-535
Author(s):  
Kenichi Yoshida ◽  
Tsutomu Toki ◽  
Myoung-ja Park ◽  
Yusuke Okuno ◽  
Yuichi Shiraishi ◽  
...  

Abstract Abstract 535 Background Transient abnormal myelopoiesis (TAM) represents a self-limited proliferation exclusively affecting perinatal infants with Down syndrome (DS), morphologically and immunologically characterized by immature blasts indistinguishable from acute megakaryoblastic leukemia (AMKL). Although spontaneous regression is as a rule in most cases, about 20–30% of the survived infants develop non-self-limited AMKL (DS-AMKL) 3 to 4 years after the remission. As for the molecular pathogenesis of these DS-related myeloid proliferations, it has been well established that GATA1 mutations are detected in virtually all TAM cases as well as DS-AMKL. However, it is still open to question whether a GATA1 mutation is sufficient for the development of TAM, what is the cellular origin of the subsequent AMKL, whether additional gene mutations are required for the progression to AMKL, and if so, what are their gene targets, although several genes have been reported to be mutated in occasional cases with AMKL, including JAK2/3, TP53 and FLT3. Methods To answer these questions, we identify a comprehensive spectrum of gene mutations in TAM/AMKL cases using whole genome sequencing of three trio samples sequentially obtained at initial presentation of TAM, during remission and at the subsequent relapse phase of AMKL. Whole exome sequencing was also performed for TAM (N=16) and AMKL (N=15) samples, using SureSelect (Agilent) enrichment of 50M exomes followed by high-throughput sequencing. The recurrent mutations in the discovery cohort were further screened in an extended cohort of DS-AMKL (N = 35) as well as TAM (N = 26) and other AMKL cases (N = 19) using target deep sequencing. Results TAM samples had significantly fewer numbers of somatic mutations compared to AMKL samples with the mean numbers of all mutations of 30 (1.0/Mb) and 180 (6.0/Mb) per samples in whole genome sequencing or non-silent somatic mutations of 1.73 and 5.71 per sample in whole exome sequencing in TAM and AMKL cases, respectively (p=0.001). Comprehensive detections of the full spectrum of mutations together with subsequent deep sequencing of the individual mutations allowed to reveal more complicated clonological pictures of clonal evolutions leading to AMKL. In every patient, the major AMKL clones did not represent the direct offspring from the dominant TAM clone. Instead, the direct ancestor of the AMKL clones could be back-traced to a more upstream branch-point of the evolution before the major TAM clone had appeared or, as previously reported, to an earlier founder having an independent GATA1 mutation. Intratumoral heterogeneity was evident at the time of diagnosis as the presence of major subpopulations in both TAM and AMKL populations, which were more often than not characterized by RAS pathway mutations. While GATA1 was the only recurrent mutational target in the TAM phase, 8 genes were recurrently mutated in AMKL samples in whole genome/exome sequencing, including NRAS, TP53 and other novel gene targets that had not been previously reported to be mutated in other neoplasms. The recurrent mutations found in the discovery cohort, in addition to known mutational targets in myeloid malignancies, were screened in an extended cohort of DS-associated myeloid disorders (N=61) as well as other AMKL cases, using high-throughput sequencing of SureSelect-captured and/or PCR amplified targets. Secondary mutations other than GATA1 mutations were found in 3 out of 26 TAM, 20 out of 35 DS-AMKL and 4 out of 19 other AMKL cases. Conclusion TAM is characterized by a paucity of somatic mutations and thought to be virtually caused by a GATA1 mutation in combination with constitutive trisomy 21. Subsequent AMKL evolved from a minor independent subclone acquiring additional mutations. Secondary genetic hits other than GATA1 mutations were common, where deregulated epigenetic controls as well as abnormal signaling pathway mutations play a major role. Disclosures: No relevant conflicts of interest to declare.


2011 ◽  
Vol 09 (02) ◽  
pp. 269-282 ◽  
Author(s):  
HATICE GULCIN OZER ◽  
YI-WEN HUANG ◽  
JIEJUN WU ◽  
JEFFREY D. PARVIN ◽  
TIM HUI-MING HUANG ◽  
...  

New high-throughput sequencing technologies can generate millions of short sequences in a single experiment. As the size of the data increases, comparison of multiple experiments on different cell lines under different experimental conditions becomes a big challenge. In this paper, we investigate ways to compare multiple ChIP-sequencing experiments. We specifically studied epigenetic regulation of breast cancer and the effect of estrogen using 50 ChIP-sequencing data from Illumina Genome Analyzer II. First, we evaluate the correlation among different experiments focusing on the total number of reads in transcribed and promoter regions of the genome. Then, we adopt the method that is used to identify the most stable genes in RT-PCR experiments to understand background signal across all of the experiments and to identify the most variable transcribed and promoter regions of the genome. We observed that the most variable genes for transcribed regions and promoter regions are very distinct. Gene ontology and function enrichment analysis on these most variable genes demonstrate the biological relevance of the results. In this study, we present a method that can effectively select differential regions of the genome based on protein-binding profiles over multiple experiments using real data points without any normalization among the samples.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yifan Wang ◽  
◽  
Taejeong Bae ◽  
Jeremy Thorpe ◽  
Maxwell A. Sherman ◽  
...  

Abstract Background Post-zygotic mutations incurred during DNA replication, DNA repair, and other cellular processes lead to somatic mosaicism. Somatic mosaicism is an established cause of various diseases, including cancers. However, detecting mosaic variants in DNA from non-cancerous somatic tissues poses significant challenges, particularly if the variants only are present in a small fraction of cells. Results Here, the Brain Somatic Mosaicism Network conducts a coordinated, multi-institutional study to examine the ability of existing methods to detect simulated somatic single-nucleotide variants (SNVs) in DNA mixing experiments, generate multiple replicates of whole-genome sequencing data from the dorsolateral prefrontal cortex, other brain regions, dura mater, and dural fibroblasts of a single neurotypical individual, devise strategies to discover somatic SNVs, and apply various approaches to validate somatic SNVs. These efforts lead to the identification of 43 bona fide somatic SNVs that range in variant allele fractions from ~ 0.005 to ~ 0.28. Guided by these results, we devise best practices for calling mosaic SNVs from 250× whole-genome sequencing data in the accessible portion of the human genome that achieve 90% specificity and sensitivity. Finally, we demonstrate that analysis of multiple bulk DNA samples from a single individual allows the reconstruction of early developmental cell lineage trees. Conclusions This study provides a unified set of best practices to detect somatic SNVs in non-cancerous tissues. The data and methods are freely available to the scientific community and should serve as a guide to assess the contributions of somatic SNVs to neuropsychiatric diseases.


Sign in / Sign up

Export Citation Format

Share Document