P218 Interrogation of transcriptome data in RA for identifying disease susceptibility loci and predictors of treatment response

Abstract Background Rheumatoid arthritis (RA) is a chronic autoimmune disease affecting approximately 1% of the Caucasian population worldwide causing significant morbidity. The genetics of disease pathogenesis remains poorly understood despite recent advances in high throughput genotyping and sequencing. Biological agents (e.g. TNF inhibitors, TNFi) have significantly impacted on disease management, however 30-40% of RA patients do not respond to this therapy. The aim of this study was to use transcriptome sequencing (RNA sequencing) data from human neutrophils to identify variants in RA that may underpin disease pathogenesis and predict response to biologic therapy. Methods RNA sequencing (RNA-seq) data from peripheral blood neutrophils isolated pre-TNFi treatment was analysed from 27 RA patients and 6 healthy controls. 21 RA patients subsequently responded to TNFi therapy (change in DAS28 >1.2). RNA-seq reads were mapped to the human genome (hg19) using TopHat2 and annotated using Cufflinks. Data was combined, calibrated and filtered using the Genome Analysis Tool Kit (GATK) to create a file of identified variants. These variants were subsequently interrogated using the VCFtools program package. Quality control parameters were applied in accordance with guidance and available literature, excluding variants that were: PHRED < 30, Minimum read depth < 4 and a loci sequencing success rate < 80%, with SNP clusters and indels also removed. Tajima D was used as a statistic for identifying regions of interest within the RNA-seq data. Identified variants were annotated and interrogated using the UCSC bioinformatics platform and pathway analysis of identified genes predicted through Ingenuity Pathway Analysis (IPA). Results GATK analysis identified 536,668 variants, which were refined to 5230 variants following application of QC parameters as specified with over 99% of variants excluded. RA patients had a mean Tajima-D score of 0.51 vs -0.19 in the controls (p < 0.0001) and furthermore had significantly more regions of transcriptome with extreme positive Tajima-D values (p < 0.0001). Bioinformatics analysis identified the variants with high Tajima-D scores to be within a number of biologically relevant loci, including NCF1, which has been associated with autoimmune diseases including SLE and is predictor of RA severity in rat models. IPA revealed that a number of the highest scoring variants were within loci that were linked via a gene network regulated by activation of Fcgamma receptors (FCGR1A/B/C, FCGR2A/B, FCGR3B) and p38 MAPK. Conclusion This study suggests that interrogation of transcriptome data has a role in elucidating the components underpinning RA pathogenesis, identifying a number of interesting loci that may contribute towards its missing heritability. However, such preliminary data will require validation through direct sequencing of variants and investigation in independent data sets as well sub-group analysis of treatment response to biological therapy. Disclosures R. Smith None. N. Goodson None. R.J. Moots None. H.L. Wright None.

Download Full-text

DEBKS: A Tool to Detect Differentially Expressed Circular RNA

10.1101/2020.10.14.336982 ◽

2020 ◽

Author(s):

Zelin Liu ◽

Huiru Ding ◽

Jianqi She ◽

Chunhua Chen ◽

Weiguang Zhang ◽

...

Keyword(s):

Open Source ◽

Rna Sequencing ◽

Open Source Software ◽

Simulated Data ◽

Circular Rna ◽

Host Gene ◽

Circular Rnas ◽

Biological Processes ◽

Rna Seq ◽

Disease Pathogenesis

AbstractCircular RNAs (circRNAs) are involved in various biological processes and in disease pathogenesis. However, only a small number of functional circRNAs have been identified among hundreds of thousands of circRNA species, partly because most current methods are based on circular junction counts and overlook the fact that circRNA is formed from the host gene by back-splicing (BS). To distinguish between expression originating from BS and that from the host gene, we present DEBKS, a software program to streamline the discovery of differential BS between two rRNA-depleted RNA sequencing (RNA-seq) sample groups. By applying real and simulated data and employing RT-qPCR for validation, we demonstrate that DEBKS is efficient and accurate in detecting circRNAs with differential BS events between paired and unpaired sample groups. DEBKS is available at https://github.com/yangence/DEBKS as open-source software.

Download Full-text

Unbiased Detection of Respiratory Viruses by Use of RNA Sequencing-Based Metagenomics: a Systematic Comparison to a Commercial PCR Panel

Journal of Clinical Microbiology ◽

10.1128/jcm.03060-15 ◽

2016 ◽

Vol 54 (4) ◽

pp. 1000-1007 ◽

Cited By ~ 91

Author(s):

Erin H. Graf ◽

Keith E. Simmon ◽

Keith D. Tardif ◽

Weston Hymas ◽

Steven Flygare ◽

...

Keyword(s):

Data Analysis ◽

Rna Sequencing ◽

Fungal Pathogens ◽

Respiratory Virus ◽

Sequence Data ◽

Sequence Information ◽

Analysis Tool ◽

Rna Seq ◽

Genome Sequences ◽

Tract Infections

Current infectious disease molecular tests are largely pathogen specific, requiring test selection based on the patient's symptoms. For many syndromes caused by a large number of viral, bacterial, or fungal pathogens, such as respiratory tract infections, this necessitates large panels of tests and has limited yield. In contrast, next-generation sequencing-based metagenomics can be used for unbiased detection of any expected or unexpected pathogen. However, barriers for its diagnostic implementation include incomplete understanding of analytical performance and complexity of sequence data analysis. We compared detection of known respiratory virus-positive (n= 42) and unselected (n= 67) pediatric nasopharyngeal swabs using an RNA sequencing (RNA-seq)-based metagenomics approach and Taxonomer, an ultrarapid, interactive, web-based metagenomics data analysis tool, with an FDA-cleared respiratory virus panel (RVP; GenMark eSensor). Untargeted metagenomics detected 86% of known respiratory virus infections, and additional PCR testing confirmed RVP results for only 2 (33%) of the discordant samples. In unselected samples, untargeted metagenomics had excellent agreement with the RVP (93%). In addition, untargeted metagenomics detected an additional 12 viruses that were either not targeted by the RVP or missed due to highly divergent genome sequences. Normalized viral read counts for untargeted metagenomics correlated with viral burden determined by quantitative PCR and showed high intrarun and interrun reproducibility. Partial or full-length viral genome sequences were generated in 86% of RNA-seq-positive samples, allowing assessment of antiviral resistance, strain-level typing, and phylogenetic relatedness. Overall, untargeted metagenomics had high agreement with a sensitive RVP, detected viruses not targeted by the RVP, and yielded epidemiologically and clinically valuable sequence information.

Download Full-text

easyMF: A Web Platform for Matrix Factorization-based Biological Discovery from Large-scale Transcriptome Data

10.1101/2020.12.21.405563 ◽

2020 ◽

Author(s):

Wenlong Ma ◽

Siyuan Chen ◽

Jingjing Zhai ◽

Yuhong Qi ◽

Shang Xie ◽

...

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Matrix Factorization ◽

Large Scale ◽

Biological Knowledge ◽

Transcriptome Data ◽

Rna Seq ◽

Biological Discovery ◽

User Friendly ◽

Web Platform

With the development of high-throughput experimental technologies, large-scale RNA sequencing (RNA-Seq) data have been and continue to be produced, but have led to challenges in extracting relevant biological knowledge hidden in the produced high-dimensional gene expression matrices. Here, we present easyMF, a user-friendly web platform that aims to facilitate biological discovery from large-scale transcriptome data through matrix factorization (MF). The easyMF platform enables users with little bioinformatics experience to streamline transcriptome analysis from raw reads to gene expression and to decompose expression matrix from thousands of genes to a handful of metagenes. easyMF also offers a series of functional modules for metagene-based exploratory analysis with an emphasis on functional gene discovery. As a modular, containerized and open-source platform, easyMF can be customized to satisfy users' specific demands and deployed as a web server for broad applications. easyMF is freely available at https://github.com/cma2015/easyMF. We demonstrated the application of easyMF with four case studies using 940 RNA sequencing datasets from maize (Zea mays L.).

Download Full-text

fRNAkenseq: a fully powered-by-CyVerse cloud integrated RNA-sequencing analysis tool

PeerJ ◽

10.7717/peerj.8592 ◽

2020 ◽

Vol 8 ◽

pp. e8592

Author(s):

Allen Hubbard ◽

Matthew Bomhoff ◽

Carl J. Schmidt

Keyword(s):

Rna Sequencing ◽

Data Storage ◽

Cross Talk ◽

Large Scale ◽

Draft Genome ◽

Analysis Tool ◽

Sequencing Analysis ◽

Rna Seq ◽

Cloud Data ◽

Downstream Analysis

Background Decreasing costs make RNA sequencing technologies increasingly affordable for biologists. However, many researchers who can now afford sequencing lack access to resources necessary for downstream analysis. This means that even as algorithms to process RNA-Seq data improve, many biologists still struggle to manage the sheer volume of data produced by next generation sequencing (NGS) technologies. Scalable bioinformatics tools that exploit multiple platforms are needed to democratize bioinformatics resources in the sequencing era. This is essential for equipping many research groups in the life sciences with the tools to process the increasingly unwieldy datasets they produce. Methods One strategy to address this challenge is to develop a modern generation of sequence analysis tools capable of seamless data sharing and communication. Such tools will provide interoperability through offerings of interlinked resources. Systems of interlinked, scalable resources, which often incorporate cloud data storage, are broadly referred to as cyberinfrastructure. Cyberinfrastructure integrated tools will help researchers to robustly analyze large scale datasets by efficiently sharing data burdens across a distributed architecture. Additionally, interoperability will allow emerging tools to cross-adapt features of existing tools. It is important that these tools are designed to be easy to use for biologists. Results We introduce fRNAkenseq, a powered-by-CyVerse RNA sequencing analysis tool that exhibits interoperability with other resources and meets the needs of biologists for comprehensive, easy to use RNA sequencing analysis. fRNAkenseq leverages a complex set of Application Programming Interfaces (APIs) associated with the NSF-funded cyberinfrastructure project, CyVerse, to execute FASTQ-to-differential expression RNA-Seq analyses. Integrating across bioinformatics platforms, fRNAkenseq also exploits cloud integration and cross-talk with another CyVerse associated tool, CoGe. fRNAkenseq offers novel features for the biologist such as more robust and comprehensive pipelines for enrichment than those currently available by default in a single tool, whether they are cloud-based or local installation. Importantly, cross-talk with CoGe allows fRNAkenseq users to execute RNA-Seq pipelines on an inventory of 47,000 archived genomes stored in CoGe or upload their own draft genome.

Download Full-text

A Streamlined Approach to Pathway Analysis from RNA-Sequencing Data

Methods and Protocols ◽

10.3390/mps4010021 ◽

2021 ◽

Vol 4 (1) ◽

pp. 21

Author(s):

Austin Bow

Keyword(s):

Rna Sequencing ◽

Pathway Analysis ◽

Valuable Insight ◽

Sequencing Data ◽

Analytical Technique ◽

Mapping Software ◽

Formidable Challenge ◽

Daunting Task ◽

Software Platforms ◽

Time Required

The reduction in costs associated with performing RNA-sequencing has driven an increase in the application of this analytical technique; however, restrictive factors associated with this tool have now shifted from budgetary constraints to time required for data processing. The sheer scale of the raw data produced can present a formidable challenge for researchers aiming to glean vital information about samples. Though many of the companies that perform RNA-sequencing provide a basic report for the submitted samples, this may not adequately capture particular pathways of interest for sample comparisons. To further assess these data, it can therefore be necessary to utilize various enrichment and mapping software platforms to highlight specific relations. With the wide array of these software platforms available, this can also present a daunting task. The methodology described herein aims to enable researchers new to handling RNA-sequencing data with a streamlined approach to pathway analysis. Additionally, the implemented software platforms are readily available and free to utilize, making this approach viable, even for restrictive budgets. The resulting tables and nodal networks will provide valuable insight into samples and can be used to generate high-quality graphics for publications and presentations.

Download Full-text

Identification of New Transcription Factors that Can Promote Pluripotent Reprogramming

Stem Cell Reviews and Reports ◽

10.1007/s12015-021-10220-z ◽

2021 ◽

Author(s):

Ping Huang ◽

Jieying Zhu ◽

Yu Liu ◽

Guihuan Liu ◽

Ran Zhang ◽

...

Keyword(s):

Stem Cells ◽

Transcription Factors ◽

Embryonic Stem Cells ◽

Rna Sequencing ◽

Human Urine ◽

Embryonic Stem ◽

Rna Seq ◽

Reprogramming Efficiency ◽

Yamanaka Factors ◽

Urine Cells

Abstract Background Four transcription factors, Oct4, Sox2, Klf4, and c-Myc (the Yamanka factors), can reprogram somatic cells to induced pluripotent stem cells (iPSCs). Many studies have provided a number of alternative combinations to the non-Yamanaka factors. However, it is clear that many additional transcription factors that can generate iPSCs remain to be discovered. Methods The chromatin accessibility and transcriptional level of human embryonic stem cells and human urine cells were compared by Assay for Transposase-Accessible Chromatin with high-throughput sequencing (ATAC-seq) and RNA sequencing (RNA-seq) to identify potential reprogramming factors. Selected transcription factors were employed to reprogram urine cells, and the reprogramming efficiency was measured. Urine-derived iPSCs were detected for pluripotency by Immunofluorescence, quantitative polymerase chain reaction, RNA sequencing and teratoma formation test. Finally, we assessed the differentiation potential of the new iPSCs to cardiomyocytes in vitro. Results ATAC-seq and RNA-seq datasets predicted TEAD2, TEAD4 and ZIC3 as potential factors involved in urine cell reprogramming. Transfection of TEAD2, TEAD4 and ZIC3 (in the presence of Yamanaka factors) significantly improved the reprogramming efficiency of urine cells. We confirmed that the newly generated iPSCs possessed pluripotency characteristics similar to normal H1 embryonic stem cells. We also confirmed that the new iPSCs could differentiate to functional cardiomyocytes. Conclusions In conclusion, TEAD2, TEAD4 and ZIC3 can increase the efficiency of reprogramming human urine cells into iPSCs, and provides a new stem cell sources for the clinical application and modeling of cardiovascular disease. Graphical abstract

Download Full-text

Methyltransferase-directed orthogonal tagging and sequencing of miRNAs and bacterial small RNAs

BMC Biology ◽

10.1186/s12915-021-01053-w ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Milda Mickutė ◽

Kotryna Kvederavičiūtė ◽

Aleksandr Osipenko ◽

Raminta Mineikaitė ◽

Saulius Klimašauskas ◽

...

Keyword(s):

Rna Sequencing ◽

Regulatory Networks ◽

Library Preparation ◽

Rna Seq ◽

Basic Principles ◽

Cofactor Binding ◽

Sequencing Library ◽

Sequencing Library Preparation ◽

Target Rna

Abstract Background Targeted installation of designer chemical moieties on biopolymers provides an orthogonal means for their visualisation, manipulation and sequence analysis. Although high-throughput RNA sequencing is a widely used method for transcriptome analysis, certain steps, such as 3′ adapter ligation in strand-specific RNA sequencing, remain challenging due to structure- and sequence-related biases introduced by RNA ligases, leading to misrepresentation of particular RNA species. Here, we remedy this limitation by adapting two RNA 2′-O-methyltransferases from the Hen1 family for orthogonal chemo-enzymatic click tethering of a 3′ sequencing adapter that supports cDNA production by reverse transcription of the tagged RNA. Results We showed that the ssRNA-specific DmHen1 and dsRNA-specific AtHEN1 can be used to efficiently append an oligonucleotide adapter to the 3′ end of target RNA for sequencing library preparation. Using this new chemo-enzymatic approach, we identified miRNAs and prokaryotic small non-coding sRNAs in probiotic Lactobacillus casei BL23. We found that compared to a reference conventional RNA library preparation, methyltransferase-Directed Orthogonal Tagging and RNA sequencing, mDOT-seq, avoids misdetection of unspecific highly-structured RNA species, thus providing better accuracy in identifying the groups of transcripts analysed. Our results suggest that mDOT-seq has the potential to advance analysis of eukaryotic and prokaryotic ssRNAs. Conclusions Our findings provide a valuable resource for studies of the RNA-centred regulatory networks in Lactobacilli and pave the way to developing novel transcriptome and epitranscriptome profiling approaches in vitro and inside living cells. As RNA methyltransferases share the structure of the AdoMet-binding domain and several specific cofactor binding features, the basic principles of our approach could be easily translated to other AdoMet-dependent enzymes for the development of modification-specific RNA-seq techniques.

Download Full-text

Mechanism of protective effect of xuan-bai-cheng-qi decoction on LPS-induced acute lung injury based on an integrated network pharmacology and RNA-sequencing approach

Respiratory Research ◽

10.1186/s12931-021-01781-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Huahe Zhu ◽

Shun Wang ◽

Cong Shan ◽

Xiaoqian Li ◽

Bo Tan ◽

...

Keyword(s):

Acute Lung Injury ◽

Lung Injury ◽

Rna Sequencing ◽

Target Genes ◽

Mammalian Target Of Rapamycin ◽

Network Pharmacology ◽

Protective Mechanism ◽

Rna Seq ◽

Active Components ◽

Integrated Network

AbstractXuan-bai-cheng-qi decoction (XCD), a traditional Chinese medicine (TCM) prescription, has been widely used to treat a variety of respiratory diseases in China, especially to seriously infectious diseases such as acute lung injury (ALI). Due to the complexity of the chemical constituent, however, the underlying pharmacological mechanism of action of XCD is still unclear. To explore its protective mechanism on ALI, firstly, a network pharmacology experiment was conducted to construct a component-target network of XCD, which identified 46 active components and 280 predicted target genes. Then, RNA sequencing (RNA-seq) was used to screen differentially expressed genes (DEGs) between ALI model rats treated with and without XCD and 753 DEGs were found. By overlapping the target genes identified using network pharmacology and DEGs using RNA-seq, and subsequent protein–protein interaction (PPI) network analysis, 6 kernel targets such as vascular epidermal growth factor (VEGF), mammalian target of rapamycin (mTOR), AKT1, hypoxia-inducible factor-1α (HIF-1α), and phosphoinositide 3-kinase (PI3K) and gene of phosphate and tension homology deleted on chromsome ten (PTEN) were screened out to be closely relevant to ALI treatment. Verification experiments in the LPS-induced ALI model rats showed that XCD could alleviate lung tissue pathological injury through attenuating proinflammatory cytokines release such as tumor necrosis factor (TNF)-α, interleukin (IL)-6, and IL-1β. Meanwhile, both the mRNA and protein expression levels of PI3K, mTOR, HIF-1α, and VEGF in the lung tissues were down-regulated with XCD treatment. Therefore, the regulations of XCD on PI3K/mTOR/HIF-1α/VEGF signaling pathway was probably a crucial mechanism involved in the protective mechanism of XCD on ALI treatment.

Download Full-text

Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence

Nature Communications ◽

10.1038/s41467-021-21894-x ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Ryan Lusk ◽

Evan Stene ◽

Farnoush Banaei-Kashani ◽

Boris Tabakoff ◽

Katerina Kechris ◽

...

Keyword(s):

Rna Sequencing ◽

Dna Sequence ◽

Mammalian Species ◽

Alternative Polyadenylation ◽

Sequence Information ◽

Rna Seq ◽

Average Precision ◽

Polyadenylation Sites ◽

Dna Nucleotide Sequence

AbstractAnnotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce aptardi (alternative polyadenylation transcriptome analysis from RNA-Seq data and DNA sequence information), which leverages both DNA sequence and RNA sequencing in a machine learning paradigm to predict expressed polyadenylation sites. Specifically, as input aptardi takes DNA nucleotide sequence, genome-aligned RNA-Seq data, and an initial transcriptome. The program evaluates these initial transcripts to identify expressed polyadenylation sites in the biological sample and refines transcript 3′-ends accordingly. The average precision of the aptardi model is twice that of a standard transcriptome assembler. In particular, the recall of the aptardi model (the proportion of true polyadenylation sites detected by the algorithm) is improved by over three-fold. Also, the model—trained using the Human Brain Reference RNA commercial standard—performs well when applied to RNA-sequencing samples from different tissues and different mammalian species. Finally, aptardi’s input is simple to compile and its output is easily amenable to downstream analyses such as quantitation and differential expression.

Download Full-text

Small RNA-Sequencing: Approaches and Considerations for miRNA Analysis

Diagnostics ◽

10.3390/diagnostics11060964 ◽

2021 ◽

Vol 11 (6) ◽

pp. 964

Author(s):

Sarka Benesova ◽

Mikael Kubista ◽

Lukas Valihrach

Keyword(s):

Rna Sequencing ◽

Small Rna ◽

High Sensitivity ◽

Small Rna Sequencing ◽

Rna Seq ◽

Liquid Biopsies ◽

Comprehensive Overview ◽

Rna Molecules ◽

Novel Mirna ◽

The Many

MicroRNAs (miRNAs) are a class of small RNA molecules that have an important regulatory role in multiple physiological and pathological processes. Their disease-specific profiles and presence in biofluids are properties that enable miRNAs to be employed as non-invasive biomarkers. In the past decades, several methods have been developed for miRNA analysis, including small RNA sequencing (RNA-seq). Small RNA-seq enables genome-wide profiling and analysis of known, as well as novel, miRNA variants. Moreover, its high sensitivity allows for profiling of low input samples such as liquid biopsies, which have now found applications in diagnostics and prognostics. Still, due to technical bias and the limited ability to capture the true miRNA representation, its potential remains unfulfilled. The introduction of many new small RNA-seq approaches that tried to minimize this bias, has led to the existence of the many small RNA-seq protocols seen today. Here, we review all current approaches to cDNA library construction used during the small RNA-seq workflow, with particular focus on their implementation in commercially available protocols. We provide an overview of each protocol and discuss their applicability. We also review recent benchmarking studies comparing each protocol’s performance and summarize the major conclusions that can be gathered from their usage. The result documents variable performance of the protocols and highlights their different applications in miRNA research. Taken together, our review provides a comprehensive overview of all the current small RNA-seq approaches, summarizes their strengths and weaknesses, and provides guidelines for their applications in miRNA research.

Download Full-text