Abstract 5173: Identification of single-nucleotide variants by high-throughput RNA sequencing in endemic Burkitt Lymphoma

Demultiplexing methods have facilitated the widespread use of single-cell RNA sequencing (scRNAseq) experiments by lowering costs and reducing technical variations. Here, we present demuxalot: a method for probabilistic genotype inference from aligned reads, with no assumptions about allele ratios and efficient incorporation of prior genotype information from historical experiments in a multi-batch setting. Our method efficiently incorporates additional information across reads originating from the same transcript, enabling up to 3x more calls per read relative to naive approaches. We also propose a novel and highly performant tradeoff between methods that rely on reference genotypes and methods that learn variants from the data, by selecting a small number of highly informative variants that maximize the marginal information with respect to reference single nucleotide variants (SNVs). Our resulting improved SNV-based demultiplex method is up to 3x faster, 3x more data efficient, and achieves significantly more accurate doublet discrimination than previously published methods. This approach renders scRNAseq feasible for the kind of large multi-batch, multi-donor studies that are required to prosecute diseases with heterogeneous genetic backgrounds.

Download Full-text

The Phenolyzer Suite: Prioritizing the Candidate Genes Involved in Microtia

Annals of Otology Rhinology & Laryngology ◽

10.1177/0003489419840052 ◽

2019 ◽

Vol 128 (6) ◽

pp. 556-562 ◽

Cited By ~ 1

Author(s):

Huang Xin ◽

Wang Changchen ◽

Liu Lei ◽

Yang Meirong ◽

Zhang Ye ◽

...

Keyword(s):

Candidate Genes ◽

High Throughput ◽

Potential Candidate ◽

Single Nucleotide Variants ◽

Gene Score ◽

Single Nucleotide ◽

Research Directions ◽

Score System ◽

Pathogenic Genes ◽

First Time

Objective: Microtia is a congenital malformation of the external ear. Great progress about the genetic of microtia has been made in recent years. This article was to prioritize the potential candidate pathogenic genes of microtia based on existing studies and reports, with the purpose of narrowing the range of following study scientifically and quickly. Method: A computational tool called Phenolyzer (phenotype-based gene analyzer) was used to prioritize microtia genes. Microtia, as a query term, was input in the interface of Phenolyzer. After several steps, including disease match, gene query, gene score system, seed gene growth, and gene ranking, the final results about genetic information of microtia were provided. Then we tracked details of the top 10 genes ranked by Phenolyzer on the basis of previous reports. Results: We detected 10 348 genes associated with microtia or related syndromes, and 78 genes of those genes belonged to seed genes. Every gene was given a score, and the gene with higher scores was more likely influence microtia. The top 10 ranked genes included HOXA2, CHD7, CDT1, ORC1, ORC4, ORC6, CDC6, MED12, TWIST1, and GLI3. Otherwise, four gene-gene interactions were displayed. Conclusion: This article prioritized candidate genes of microtia for the first time. High-throughput methods provide tens of thousands of single-nucleotide variants, indels, and structural variants, and only a handful are relevant to microtia or associated syndromes. Combine the ranked potential pathogenic genes list from Phenolyzer with the results of samples provided by high-throughput methods, and more precise research directions are presented.

Download Full-text

Analysis and design of RNA sequencing experiments for identifying RNA editing and other single-nucleotide variants

RNA ◽

10.1261/rna.037903.112 ◽

2013 ◽

Vol 19 (6) ◽

pp. 725-732 ◽

Cited By ~ 42

Author(s):

J.-H. Lee ◽

J. K. Ang ◽

X. Xiao

Keyword(s):

Rna Sequencing ◽

Rna Editing ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Analysis And Design

Download Full-text

The discrepancy among single nucleotide variants detected by DNA and RNA high throughput sequencing data

BMC Genomics ◽

10.1186/s12864-017-4022-x ◽

2017 ◽

Vol 18 (S6) ◽

Cited By ~ 16

Author(s):

Yan Guo ◽

Shilin Zhao ◽

Quanhu Sheng ◽

David C Samuels ◽

Yu Shyr

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Dna And Rna ◽

High Throughput Sequencing Data

Download Full-text

MethylExtract: High-Quality methylation maps and SNV calling from whole genome bisulfite sequencing data

F1000Research ◽

10.12688/f1000research.2-217.v2 ◽

2014 ◽

Vol 2 ◽

pp. 217 ◽

Cited By ~ 8

Author(s):

Guillermo Barturen ◽

Antonio Rueda ◽

José L. Oliver ◽

Michael Hackenberg

Keyword(s):

High Throughput ◽

Sequence Variation ◽

High Throughput Sequencing ◽

Whole Genome ◽

Single Nucleotide Variants ◽

High Quality ◽

Single Nucleotide ◽

Error Sources ◽

Link Type ◽

Genome Methylation

Whole genome methylation profiling at a single cytosine resolution is now feasible due to the advent of high-throughput sequencing techniques together with bisulfite treatment of the DNA. To obtain the methylation value of each individual cytosine, the bisulfite-treated sequence reads are first aligned to a reference genome, and then the profiling of the methylation levels is done from the alignments. A huge effort has been made to quickly and correctly align the reads and many different algorithms and programs to do this have been created. However, the second step is just as crucial and non-trivial, but much less attention has been paid to the final inference of the methylation states. Important error sources do exist, such as sequencing errors, bisulfite failure, clonal reads, and single nucleotide variants.We developed MethylExtract, a user friendly tool to: i) generate high quality, whole genome methylation maps and ii) detect sequence variation within the same sample preparation. The program is implemented into a single script and takes into account all major error sources. MethylExtract detects variation (SNVs – Single Nucleotide Variants) in a similar way to VarScan, a very sensitive method extensively used in SNV and genotype calling based on non-bisulfite-treated reads. The usefulness of MethylExtract is shown by means of extensive benchmarking based on artificial bisulfite-treated reads and a comparison to a recently published method, called Bis-SNP.MethylExtract is able to detect SNVs within High-Throughput Sequencing experiments of bisulfite treated DNA at the same time as it generates high quality methylation maps. This simultaneous detection of DNA methylation and sequence variation is crucial for many downstream analyses, for example when deciphering the impact of SNVs on differential methylation. An exclusive feature of MethylExtract, in comparison with existing software, is the possibility to assess the bisulfite failure in a statistical way. The source code, tutorial and artificial bisulfite datasets are available at http://bioinfo2.ugr.es/MethylExtract/ and http://sourceforge.net/projects/methylextract/, and also permanently accessible from 10.5281/zenodo.7144.

Download Full-text

Abstract 5296: R2D2: An integrated analysis framework to infer the functional impact of single nucleotide variants (SNVs) using matched germline and tumor DNA and RNA sequencing data

10.1158/1538-7445.am2018-5296 ◽

2018 ◽

Author(s):

Alma Imamovic ◽

Saud H. AlDubayan ◽

Nathanael Moore ◽

Celine G. Han ◽

Brendan Reardon ◽

...

Keyword(s):

Rna Sequencing ◽

Integrated Analysis ◽

Analysis Framework ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Functional Impact ◽

Dna And Rna ◽

Tumor Dna

Download Full-text

Harnessing Expressed Single Nucleotide Variation and Single Cell RNA Sequencing To Define Immune Cell Chimerism in the Rejecting Kidney Transplant

Journal of the American Society of Nephrology ◽

10.1681/asn.2020030326 ◽

2020 ◽

Vol 31 (9) ◽

pp. 1977-1986 ◽

Cited By ~ 2

Author(s):

Andrew F. Malone ◽

Haojia Wu ◽

Catrina Fronick ◽

Robert Fulton ◽

Joseph P. Gaut ◽

...

Keyword(s):

T Cells ◽

Single Cell ◽

Rna Sequencing ◽

Kidney Transplant ◽

Immune Cell ◽

Solid Organ ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Transcriptional Profiles ◽

Single Cell Rna Sequencing

BackgroundIn solid organ transplantation, donor-derived immune cells are assumed to decline with time after surgery. Whether donor leukocytes persist within kidney transplants or play any role in rejection is unknown, however, in part because of limited techniques for distinguishing recipient from donor cells.MethodsWhole-exome sequencing of donor and recipient DNA and single-cell RNA sequencing (scRNA-seq) of five human kidney transplant biopsy cores distinguished immune cell contributions from both participants. DNA-sequence comparisons used single nucleotide variants (SNVs) identified in the exome sequences across all samples.ResultsAnalysis of expressed SNVs in the scRNA-seq data set distinguished recipient versus donor origin for all 81,139 cells examined. The leukocyte donor/recipient ratio varied with rejection status for macrophages and with time post-transplant for lymphocytes. Recipient macrophages displayed inflammatory activation whereas donor macrophages demonstrated antigen presentation and complement signaling. Recipient-origin T cells expressed cytotoxic and proinflammatory genes consistent with an effector cell phenotype, whereas donor-origin T cells appeared quiescent, expressing oxidative phosphorylation genes. Finally, both donor and recipient T cell clones within the rejecting kidney suggested lymphoid aggregation. The results indicate that donor-origin macrophages and T cells have distinct transcriptional profiles compared with their recipient counterparts, and that donor macrophages can persist for years post-transplantation.ConclusionsAnalysis of single nucleotide variants and their expression in single cells provides a powerful novel approach to accurately define leukocyte chimerism in a complex organ such as a transplanted kidney, coupled with the ability to examine transcriptional profiles at single-cell resolution.PodcastThis article contains a podcast at https://www.asn-online.org/media/podcast/JASN/2020_08_07_JASN2020030326.mp3

Download Full-text

High Throughput Transcriptome Sequencing of Pediatric Relapsed Acute Lymphoblastic Leukemia (ALL) Identifies Relapse Specific Mutations and Expression

Blood ◽

10.1182/blood.v116.21.3233.3233 ◽

2010 ◽

Vol 116 (21) ◽

pp. 3233-3233

Author(s):

Julia A. Meyer ◽

Laura E. Hogan ◽

Jinhua Wang ◽

Jun J. Yang ◽

Jay Patel ◽

...

Keyword(s):

High Throughput ◽

Genetic Variants ◽

Conflicts Of Interest ◽

Lymphoblastic Leukemia ◽

Intensive Therapy ◽

Illumina Genome Analyzer ◽

Sequencing Analysis ◽

Single Nucleotide Variants ◽

Specific Expression ◽

Single Nucleotide

Abstract Abstract 3233 Introduction: Relapsed ALL carries a very poor prognosis despite intensive therapy, indicating the need for new insights into disease mechanisms. We have previously used gene expression profiling (Hogan et al. ASH 2009) and copy number analysis (Yang et al. Blood 2008) in paired diagnosis and relapsed ALL samples to better understand the biologic mechanisms leading to recurrent disease. To create an integrated genomic profile of ALL, we have now focused on high throughput RNA sequencing to detect changes in the transcriptome from diagnosis to relapse. Patients/Methods: To date we have sequenced 6 matched diagnosis/relapse pairs (i.e. 12 marrow samples) from B-precursor ALL patients enrolled on Children's Oncology Group (COG) P9906 and AALL0232 trials. RNA libraries were prepared from poly-A selected RNA and sequenced using 54 base pair single end reads using the Illumina Genome Analyzer IIx. Each sample was sequenced in at least 7 lanes, generating an average of 100 million reads per sample. BWA (v0.5.8) was used to align the reads to the human genome, producing an average of 53 million mapped reads. Samtools (v0.1.8) was then used to predict genetic variants across the genome, filtering out variants with a low mapping quality (<Q20), sub-optimal alignment (X:1>0), low coverage (<8X), or overlap with known single nucleotide polymorphisms (SNPs) from dbSNP (r131) or the 1000 Genomes Project. Results: We observed a total of 119,000 genetic variants across all samples, with comparable overall mutational burden at relapse and diagnosis. To identify candidate lesions that may indicate a selection for common chemoresistance pathways, we focused our analysis on relapse-enriched, non-synonymous variants. 8,486 non-synonymous variants (insertions/deletions and single nucleotide variants [SNV]) were identified that occurred more often at relapse compared to diagnosis. Our analysis was focused on relapse-enriched SNVs that coded for non-synonymous changes, of which 154 were prioritized for validation. Validation was completed using matched genomic DNA samples and PCR products were directly sequenced. Mutation calls were made by manual review of tracings using the Mutation Surveyor program from Softgenetics. Thirty-three percent of predicted SNV loci were validated, but upon further sequencing of matched germline samples, five relapse specific mutations were confirmed. Mutations in COBRA1, FAM120A, RGS12, SND2, and SMEK2 were found in individual patient relapse samples. Validation is currently ongoing to confirm additional SNVs and an expanded validation of mutations will be completed in an additional 66 matched diagnosis/relapse pairs from COG 9906 and AALL 0232 and 0331 studies. Relapse specific isoforms identifying alternative exon usage was also detected in 15 genes, all of which were shared amongst multiple patients. In addition, a significant increase (p=6.7×10−6) was observed in the number of poly-adenylation sites in the genes of the relapse samples. Conclusions: While, isoform specific expression was shared amongst patients at relapse, all relapse specific mutations were private and our data to date indicate that a diversity of mechanisms contribute to relapsed disease. Further sequencing analysis of our expanded cohort of samples will determine the mutation and isoform expression prevalence, as well as the functional significance and the potential therapeutic relevance. Disclosures: No relevant conflicts of interest to declare.

Download Full-text

Systematic comparative analysis of single-nucleotide variant detection methods from single-cell RNA sequencing data

Genome Biology ◽

10.1186/s13059-019-1863-4 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 12

Author(s):

Fenglin Liu ◽

Yuanyuan Zhang ◽

Lei Zhang ◽

Ziyi Li ◽

Qiao Fang ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Allele Frequencies ◽

Cellular Heterogeneity ◽

Variant Allele ◽

Detection Methods ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Single Cell Rna Sequencing

Abstract Background Systematic interrogation of single-nucleotide variants (SNVs) is one of the most promising approaches to delineate the cellular heterogeneity and phylogenetic relationships at the single-cell level. While SNV detection from abundant single-cell RNA sequencing (scRNA-seq) data is applicable and cost-effective in identifying expressed variants, inferring sub-clones, and deciphering genotype-phenotype linkages, there is a lack of computational methods specifically developed for SNV calling in scRNA-seq. Although variant callers for bulk RNA-seq have been sporadically used in scRNA-seq, the performances of different tools have not been assessed. Results Here, we perform a systematic comparison of seven tools including SAMtools, the GATK pipeline, CTAT, FreeBayes, MuTect2, Strelka2, and VarScan2, using both simulation and scRNA-seq datasets, and identify multiple elements influencing their performance. While the specificities are generally high, with sensitivities exceeding 90% for most tools when calling homozygous SNVs in high-confident coding regions with sufficient read depths, such sensitivities dramatically decrease when calling SNVs with low read depths, low variant allele frequencies, or in specific genomic contexts. SAMtools shows the highest sensitivity in most cases especially with low supporting reads, despite the relatively low specificity in introns or high-identity regions. Strelka2 shows consistently good performance when sufficient supporting reads are provided, while FreeBayes shows good performance in the cases of high variant allele frequencies. Conclusions We recommend SAMtools, Strelka2, FreeBayes, or CTAT, depending on the specific conditions of usage. Our study provides the first benchmarking to evaluate the performances of different SNV detection tools for scRNA-seq data.

Download Full-text

MethylExtract: High-Quality methylation maps and SNV calling from whole genome bisulfite sequencing data

F1000Research ◽

10.12688/f1000research.2-217.v1 ◽

2013 ◽

Vol 2 ◽

pp. 217 ◽

Cited By ~ 18

Author(s):

Guillermo Barturen ◽

Antonio Rueda ◽

José L. Oliver ◽

Michael Hackenberg

Keyword(s):

High Throughput ◽

Sequence Variation ◽

High Throughput Sequencing ◽

Whole Genome ◽

Single Nucleotide Variants ◽

High Quality ◽

Single Nucleotide ◽

Error Sources ◽

Link Type ◽

Genome Methylation

Whole genome methylation profiling at a single cytosine resolution is now feasible due to the advent of high-throughput sequencing techniques together with bisulfite treatment of the DNA. To obtain the methylation value of each individual cytosine, the bisulfite-treated sequence reads are first aligned to a reference genome, and then the profiling of the methylation levels is done from the alignments. A huge effort has been made to quickly and correctly align the reads and many different algorithms and programs to do this have been created. However, the second step is just as crucial and non-trivial, but much less attention has been paid to the final inference of the methylation states. Important error sources do exist, such as sequencing errors, bisulfite failure, clonal reads, and single nucleotide variants.We developed MethylExtract, a user friendly tool to: i) generate high quality, whole genome methylation maps and ii) detect sequence variation within the same sample preparation. The program is implemented into a single script and takes into account all major error sources. MethylExtract detects variation (SNVs – Single Nucleotide Variants) in a similar way to VarScan, a very sensitive method extensively used in SNV and genotype calling based on non-bisulfite-treated reads. The usefulness of MethylExtract is shown by means of extensive benchmarking based on artificial bisulfite-treated reads and a comparison to a recently published method, called Bis-SNP.MethylExtract is able to detect SNVs within High-Throughput Sequencing experiments of bisulfite treated DNA at the same time as it generates high quality methylation maps. This simultaneous detection of DNA methylation and sequence variation is crucial for many downstream analyses, for example when deciphering the impact of SNVs on differential methylation. An exclusive feature of MethylExtract, in comparison with existing software, is the possibility to assess the bisulfite failure in a statistical way. The source code, tutorial and artificial bisulfite datasets are available at http://bioinfo2.ugr.es/MethylExtract/ and http://sourceforge.net/projects/methylextract/, and also permanently accessible from 10.5281/zenodo.7144.

Download Full-text