Single molecule real time (SMRT) full length RNA-sequencing reveals novel and distinct mRNA isoforms in human bone marrow cell subpopulations

AbstractHematopoietic cells are continuously replenished from progenitor cells that reside in the bone marrow. To evaluate molecular changes during this process, we analyzed the transcriptomes of freshly harvested human bone marrow progenitor (lineage-negative) and differentiated (lineage-positive) cells by single molecule, real time (SMRT) full length RNA sequencing. This analysis revealed a ∼5-fold higher number of transcript isoforms than previously detected and showed a distinct composition of individual transcript isoforms characteristic for bone marrow subpopulations. A detailed analysis of mRNA isoforms transcribed from the ANXA1 and EEF1A1 loci confirmed their distinct composition. The expression of proteins predicted from the transcriptome analysis was validated by mass spectrometry and validated previously unknown protein isoforms predicted e.g. for EEF1A1. These protein isoforms distinguished the lineage negative cell population from the lineage positive cell population. Finally, transcript isoforms expressed from paralogous gene loci (e.g. CFD, GATA2, HLA-A, B & C) also distinguished cell subpopulations but were only detectable by full length RNA sequencing. Thus, qualitatively distinct transcript isoforms from individual genomic loci separate bone marrow cell subpopulations indicating complex transcriptional regulation and protein isoform generation during hematopoiesis.

Download Full-text

Single-Molecule Real-Time (SMRT) Full-Length RNA-Sequencing Reveals Novel and Distinct mRNA Isoforms in Human Bone Marrow Cell Subpopulations

Genes ◽

10.3390/genes10040253 ◽

2019 ◽

Vol 10 (4) ◽

pp. 253 ◽

Cited By ~ 1

Author(s):

Anne Deslattes Mays ◽

Marcel Schmidt ◽

Garrett Graham ◽

Elizabeth Tseng ◽

Primo Baybayan ◽

...

Keyword(s):

Bone Marrow ◽

Real Time ◽

Rna Sequencing ◽

Cell Population ◽

Single Molecule ◽

Full Length ◽

Protein Isoforms ◽

Mrna Isoforms ◽

Transcript Isoforms ◽

Cell Subpopulations

Hematopoietic cells are continuously replenished from progenitor cells that reside in thebone marrow. To evaluate molecular changes during this process, we analyzed the transcriptomesof freshly harvested human bone marrow progenitor (lineage-negative) and differentiated (lineagepositive)cells by single-molecule real-time (SMRT) full-length RNA-sequencing. This analysisrevealed a ~5-fold higher number of transcript isoforms than previously detected and showed adistinct composition of individual transcript isoforms characteristic for bone marrowsubpopulations. A detailed analysis of messenger RNA (mRNA) isoforms transcribed from theANXA1 and EEF1A1 loci confirmed their distinct composition. The expression of proteins predictedfrom the transcriptome analysis was evaluated by mass spectrometry and validated previouslyunknown protein isoforms predicted e.g., for EEF1A1. These protein isoforms distinguished thelineage negative cell population from the lineage positive cell population. Finally, transcriptisoforms expressed from paralogous gene loci (e.g., CFD, GATA2, HLA-A, B, and C) alsodistinguished cell subpopulations but were only detectable by full-length RNA sequencing. Thus,qualitatively distinct transcript isoforms from individual genomic loci separate bone marrow cellsubpopulations indicating complex transcriptional regulation and protein isoform generationduring hematopoiesis.

Download Full-text

Single Molecule Real-Time Sequencing of the M Protein (SMaRT M-Seq): Toward Personalized Medicine Approaches in Monoclonal Gammopathies

Blood ◽

10.1182/blood-2021-147227 ◽

2021 ◽

Vol 138 (Supplement 1) ◽

pp. 2673-2673

Author(s):

Pasquale Cascino ◽

Alice Nevone ◽

Claudia Scopelliti ◽

Maria Girelli ◽

Giulia Mazzini ◽

...

Keyword(s):

Bone Marrow ◽

Real Time ◽

Single Molecule ◽

Light Chain ◽

Advisory Board ◽

Full Length ◽

M Protein ◽

Al Amyloidosis ◽

Germline Genes ◽

M Proteins

Abstract Introduction In patients affected by monoclonal gammopathies, tumoral B cells or plasma cells secrete a monoclonal antibody (termed M protein), which can be used to track the presence of the tumor itself. Moreover, the M protein can directly cause potentially life-threatening organ damage, which is dictated by the specific, patient's unique clonal light and/or heavy chain, as in patients affected by immunoglobulin light chain (AL) amyloidosis. Yet, the current paradigm in the diagnosis and management of these conditions treats the M protein as a simple tumor biomarker to be identified/quantified. Patients' specific M protein sequences remain mostly undefined and molecular mechanisms underlying M-protein related clinical manifestations are largely obscure. Methods By combining the unbiased amplification of expressed immunoglobulin genes with long-read, single molecule real-time DNA sequencing and bioinformatics analyses, we have established a method to identify the full-length sequence of the variable region of expressed immunoglobulin genes and to rank the obtained sequences based on their relative abundance, thus enabling the identification of the full-length variable sequence of M protein genes from a high number of patients analysed in parallel. Results The assay, which we termed Single Molecule Real-Time Sequencing of the M protein (SMaRT M-Seq), has undergone an extensive technical validation. Sequencing of contrived bone marrow samples generated through serial dilutions of plasma cell lines into control bone marrow, as well as sequencing of bona fide bone marrow samples from AL patients and comparison with gold-standard techniques of immunoglobulin gene sequencing showed: 100% sequence-accuracy at the individual base-pair level; High repeatability (CV<0.8% for sequencing of pentaplicates) in defining the molecular clonal size (i.e. the fraction of total immunoglobulin sequences coinciding with the clonal sequence); A high sensitivity in identifying clonal immunoglobulin sequences (10 -3 when employing low-coverage sequencing on multiple, pooled samples). Noteworthy, SMaRT M Seq was applied to a cohort of 86 consecutive patients with AL amyloidosis (17 κ and 69 λ; median BMPC infiltration 9%, IQR 6-13%; median dFLC 176 mg/L, IQR 75-370 mg/L), including cases with small clonal burden and M protein which was undetectable with conventional M protein studies. A full-length sequence of the variable region of the clonal light chain was obtained in all patients (median molecular clonal size of 88.3%, IQR: 70.7 - 93%). The most common κ germline genes were IGKV1-33 and IGKV4-01 (24% each of the 17 κ AL patients), and the most common λ germline genes were IGLV6-57 (26% of the 69 λ AL patients), IGLV2-14 (17%), IGLV3-01 (17%) and IGLV1-44 (10%). The most frequent λ and κ germline genes together (IGLV6-57, IGLV2-14, IGLV3-01, IGLV1-44, IGKV1-33 and IGKV4-01) accounted for 66% of all the clones. Germline gene usage correlated with selected clinical features. Sequence information was then exploited to improve mass spectrometry-based amyloid typing on fat pad aspirates and to enable the sensitive detection of clonotypic sequences using short-read DNA sequencing of the involved light chain isotype (up to 10 -7 dilution). Conclusions We have established SMaRT M-Seq as a novel valuable assay to reliably identify the full-length variable sequence of M proteins. SMaRT M-Seq has undergone extensive technical validation, showing high accuracy, repeatability and sensitivity. The latter is determined by the number of reads analyzed per sample. This is in turn dictated by the sequencing output of the employed sequencing platform, and by the number of pooled samples analyzed in a given sequencing round, thus proving to be scalable. Even when analyzing multiple samples on a sequencing platform with low sequencing output, the achieved sensitivity of SMaRT M-Seq significantly exceeds the requirements for the identification of clonal B cells/plasma cells in patients with AL amyloidosis. Sequencing disease-associated M proteins from large cohorts of patients has the potential to uncover molecular mechanisms of M protein-related clinical manifestations which have remained largely unexplored so far, and could enable approaches of personalized medicine for the sensitive detection of patients' specific M proteins at diagnosis and after anti-clonal therapy. Disclosures Milani: Celgene: Other: Travel support; Janssen-Cilag: Honoraria. Fazio: Janseen: Honoraria. Petrucci: GSK: Honoraria, Other: Advisory Board; Amgen: Honoraria, Other: Advisory Board; Takeda: Honoraria, Other: Advisory Board; BMS: Honoraria, Other: Advisory Board; Janssen-Cilag: Honoraria, Other: Advisory Board; Celgene: Honoraria, Other: Advisory Board; Karyopharm: Honoraria, Other: Advisory Board. Palladini: Pfizer: Honoraria; Siemens: Honoraria; Janssen Global Services: Honoraria, Other: advisory board fees. Nuvolone: Janssen-Cilag: Honoraria; Oncopeptides, Inc.: Research Funding.

Download Full-text

Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m6A modification

eLife ◽

10.7554/elife.49658 ◽

2020 ◽

Vol 9 ◽

Cited By ~ 43

Author(s):

Matthew T Parker ◽

Katarzyna Knop ◽

Anna V Sherwood ◽

Nicholas J Schurch ◽

Katarzyna Mackinnon ◽

...

Keyword(s):

Rna Sequencing ◽

Single Molecule ◽

Tail Length ◽

Transcript Abundance ◽

Mrna Processing ◽

Full Length ◽

Transcription Start Sites ◽

Model Plant Arabidopsis Thaliana ◽

Defective Rna ◽

A Site

Understanding genome organization and gene regulation requires insight into RNA transcription, processing and modification. We adapted nanopore direct RNA sequencing to examine RNA from a wild-type accession of the model plant Arabidopsis thaliana and a mutant defective in mRNA methylation (m6A). Here we show that m6A can be mapped in full-length mRNAs transcriptome-wide and reveal the combinatorial diversity of cap-associated transcription start sites, splicing events, poly(A) site choice and poly(A) tail length. Loss of m6A from 3’ untranslated regions is associated with decreased relative transcript abundance and defective RNA 3′ end formation. A functional consequence of disrupted m6A is a lengthening of the circadian period. We conclude that nanopore direct RNA sequencing can reveal the complexity of mRNA processing and modification in full-length single molecule reads. These findings can refine Arabidopsis genome annotation. Further, applying this approach to less well-studied species could transform our understanding of what their genomes encode.

Download Full-text

Construction of full-length Japanese reference panel of class I HLA genes with single-molecule, real-time sequencing

The Pharmacogenomics Journal ◽

10.1038/s41397-017-0010-4 ◽

2018 ◽

Vol 19 (2) ◽

pp. 136-146 ◽

Cited By ~ 2

Author(s):

Takahiro Mimori ◽

Jun Yasuda ◽

Yoko Kuroki ◽

Tomoko F. Shibata ◽

Fumiki Katsuoka ◽

...

Keyword(s):

Real Time ◽

Single Molecule ◽

Full Length ◽

Reference Panel ◽

Class I ◽

Hla Genes

Download Full-text

Full-length transcriptome sequencing of Heliocidaris crassispina using PacBio single-molecule real-time

Fish & Shellfish Immunology ◽

10.1016/j.fsi.2021.12.014 ◽

2021 ◽

Author(s):

Yongyu Huang ◽

Lili Zhang ◽

Shiyu Huang ◽

Guodong Wang

Keyword(s):

Real Time ◽

Single Molecule ◽

Transcriptome Sequencing ◽

Full Length

Download Full-text

The complexity of alternative splicing and landscape of tissue-specific expression in lotus (Nelumbo nucifera) unveiled by Illumina- and single-molecule real-time-based RNA-sequencing

DNA Research ◽

10.1093/dnares/dsz010 ◽

2019 ◽

Vol 26 (4) ◽

pp. 301-311 ◽

Cited By ~ 11

Author(s):

Yue Zhang ◽

Tonny Maraga Nyong'A ◽

Tao Shi ◽

Pingfang Yang

Keyword(s):

Alternative Splicing ◽

Real Time ◽

Single Molecule ◽

Intron Retention ◽

Critical Role ◽

Full Length ◽

Specific Expression ◽

Splice Isoforms ◽

Tissue Specific ◽

Genomic Studies

Abstract Alternative splicing (AS) plays a critical role in regulating different physiological and developmental processes in eukaryotes, by dramatically increasing the diversity of the transcriptome and the proteome. However, the saturation and complexity of AS remain unclear in lotus due to its limitation of rare obtainment of full-length multiple-splice isoforms. In this study, we apply a hybrid assembly strategy by combining single-molecule real-time sequencing and Illumina RNA-seq to get a comprehensive insight into the lotus transcriptomic landscape. We identified 211,802 high-quality full-length non-chimeric reads, with 192,690 non-redundant isoforms, and updated the lotus reference gene model. Moreover, our analysis identified a total of 104,288 AS events from 16,543 genes, with alternative 3ʹ splice-site being the predominant model, following by intron retention. By exploring tissue datasets, 370 tissue-specific AS events were identified among 12 tissues. Both the tissue-specific genes and isoforms might play important roles in tissue or organ development, and are suitable for ‘ABCE’ model partly in floral tissues. A large number of AS events and isoform variants identified in our study enhance the understanding of transcriptional diversity in lotus, and provide valuable resource for further functional genomic studies.

Download Full-text

Single-molecule real-time sequencing of the full-length transcriptome of loquat under low-temperature stress

PLoS ONE ◽

10.1371/journal.pone.0238942 ◽

2020 ◽

Vol 15 (9) ◽

pp. e0238942

Author(s):

Cuiping Pan ◽

Yongqing Wang ◽

Lian Tao ◽

Hui Zhang ◽

Qunxian Deng ◽

...

Keyword(s):

Low Temperature ◽

Real Time ◽

Temperature Stress ◽

Single Molecule ◽

Low Temperature Stress ◽

Full Length

Download Full-text

Single molecule studies reveal that p53 tetramers dynamically bind response elements containing one or two half sites

Scientific Reports ◽

10.1038/s41598-020-73234-6 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Elina Ly ◽

Jennifer F. Kugel ◽

James A. Goodrich

Keyword(s):

Real Time ◽

Single Molecule ◽

Target Genes ◽

Population Distribution ◽

Kinetic Stability ◽

Full Length ◽

Dna Complexes ◽

Response Elements ◽

Cell Fate Decisions ◽

Half Site

Abstract The tumor suppressor protein p53 is critical for cell fate decisions, including apoptosis, senescence, and cell cycle arrest. p53 is a tetrameric transcription factor that binds DNA response elements to regulate transcription of target genes. p53 response elements consist of two decameric half-sites, and data suggest one p53 dimer in the tetramer binds to each half-site. Despite a broad literature describing p53 binding DNA, unanswered questions remain, due partly to the need for more quantitative and structural studies with full length protein. Here we describe a single molecule fluorescence system to visualize full length p53 tetramers binding DNA in real time. The data revealed a dynamic interaction in which tetrameric p53/DNA complexes assembled and disassembled without a dimer/DNA intermediate. On a wild type DNA containing two half sites, p53/DNA complexes existed in two kinetically distinct populations. p53 tetramers bound response elements containing only one half site to form a single population of complexes with reduced kinetic stability. Altering the spacing and helical phasing between two half sites affected both the population distribution of p53/DNA complexes and their kinetic stability. Our real time single molecule measurements of full length p53 tetramers binding DNA reveal the parameters that define the stability of p53/DNA complexes, and provide insight into the pathways by which those complexes assemble.

Download Full-text

Single-Molecule Real-Time Sequencing of the Madhuca pasquieri (Dubard) Lam. Transcriptome Reveals the Diversity of Full-Length Transcripts

Forests ◽

10.3390/f11080866 ◽

2020 ◽

Vol 11 (8) ◽

pp. 866

Author(s):

Lei Kan ◽

Qicong Liao ◽

Zhiyao Su ◽

Yushan Tan ◽

Shuyu Wang ◽

...

Keyword(s):

Seed Germination ◽

Single Molecule ◽

Developmental Stages ◽

De Novo ◽

Full Length ◽

Wild Plant ◽

Transcript Isoforms ◽

Long Read ◽

Full Length Transcript ◽

Generation Sequencing

Madhuca pasquieri (Dubard) Lam. is a tree on the International Union for Conservation of Nature Red List and a national key protected wild plant (II) of China, known for its seed oil and timber. However, lacking of genomic and transcriptome data for this species hampers study of its reproduction, utilization, and conservation. Here, single-molecule long-read sequencing (PacBio) and next-generation sequencing (Illumina) were combined to obtain the transcriptome from five developmental stages of M. pasquieri. Overall, 25,339 transcript isoforms were detected by PacBio, including 24,492 coding sequences (CDSs), 9440 simple sequence repeats (SSRs), 149 long non-coding RNAs (lncRNAs), and 182 alternative splicing (AS) events, a majority was retained intron (RI). A further 1058 transcripts were identified as transcriptional factors (TFs) from 51 TF families. PacBio recovered more full-length transcript isoforms with a longer length, and a higher expression level, whereas larger number of transcripts (124,405) was captured in de novo from Illumina. Using Nr, Swissprot, KOG, and KEGG databases, 24,405 transcripts (96.31%) were annotated by PacBio. Functional annotation revealed a role for the auxin, abscisic acid, gibberellin, and cytokinine metabolic pathways in seed germination and post-germination. These findings support further studies on seed germination mechanism and genome of M. pasquieri, and better protection of this endangered species.

Download Full-text

Single-molecule real-time sequencing facilitates the analysis of transcripts and splice isoforms of anthers in Chinese cabbage (Brassica rapa L. ssp. pekinensis)

BMC Plant Biology ◽

10.1186/s12870-019-2133-z ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 2

Author(s):

Chong Tan ◽

Hongxin Liu ◽

Jie Ren ◽

Xueling Ye ◽

Hui Feng ◽

...

Keyword(s):

Real Time ◽

Single Molecule ◽

Chinese Cabbage ◽

Anther Development ◽

Full Length ◽

Transcriptional Level ◽

Systematic Analysis ◽

Splice Isoforms ◽

A Genome ◽

Novel Isoforms

Abstract Background Anther development has been extensively studied at the transcriptional level, but a systematic analysis of full-length transcripts on a genome-wide scale has not yet been published. Here, the Pacific Biosciences (PacBio) Sequel platform and next-generation sequencing (NGS) technology were combined to generate full-length sequences and completed structures of transcripts in anthers of Chinese cabbage. Results Using single-molecule real-time sequencing (SMRT), a total of 1,098,119 circular consensus sequences (CCSs) were generated with a mean length of 2664 bp. More than 75% of the CCSs were considered full-length non-chimeric (FLNC) reads. After error correction, 725,731 high-quality FLNC reads were estimated to carry 51,501 isoforms from 19,503 loci, consisting of 38,992 novel isoforms from known genes and 3691 novel isoforms from novel genes. Of the novel isoforms, we identified 407 long non-coding RNAs (lncRNAs) and 37,549 open reading frames (ORFs). Furthermore, a total of 453,270 alternative splicing (AS) events were identified and the majority of AS models in anther were determined to be approximate exon skipping (XSKIP) events. Of the key genes regulated during anther development, AS events were mainly identified in the genes SERK1, CALS5, NEF1, and CESA1/3. Additionally, we identified 104 fusion transcripts and 5806 genes that had alternative polyadenylation (APA). Conclusions Our work demonstrated the transcriptome diversity and complexity of anther development in Chinese cabbage. The findings provide a basis for further genome annotation and transcriptome research in Chinese cabbage.

Download Full-text