scholarly journals SQuIRE: Software for Quantifying Interspersed Repeat Elements

2018 ◽  
Author(s):  
Wan R. Yang ◽  
Daniel Ardeljan ◽  
Clarissa N. Pacyna ◽  
Lindsay M. Payer ◽  
Kathleen H. Burns

AbstractTransposable elements are interspersed repeat sequences that make up much of the human genome. Conventional approaches to RNA-seq analysis often exclude these sequences, fail to optimally adjudicate read alignments, or align reads to interspersed repeat consensus sequences without considering these transcripts in their genomic contexts. As a result, repetitive sequence contributions to transcriptomes are not well understood. Here, we present Software for Quantifying Interspersed Repeat Expression (SQuIRE), an RNA-seq analysis pipeline that integrates repeat and genome annotation (RepeatMasker), read alignment (STAR), gene expression (StringTie) and differential expression (DESeq2). SQuIRE uniquely provides a locus-specific picture of interspersed repeat-encoded RNA expression. SQuIRE can be downloaded at (github.com/wyang17/SQuIRE).

2021 ◽  
Author(s):  
Dennis A Sun ◽  
Nipam H Patel

AbstractEmerging research organisms enable the study of biology that cannot be addressed using classical “model” organisms. The development of novel data resources can accelerate research in such animals. Here, we present new functional genomic resources for the amphipod crustacean Parhyale hawaiensis, facilitating the exploration of gene regulatory evolution using this emerging research organism. We use Omni-ATAC-Seq, an improved form of the Assay for Transposase-Accessible Chromatin coupled with next-generation sequencing (ATAC-Seq), to identify accessible chromatin genome-wide across a broad time course of Parhyale embryonic development. This time course encompasses many major morphological events, including segmentation, body regionalization, gut morphogenesis, and limb development. In addition, we use short- and long-read RNA-Seq to generate an improved Parhyale genome annotation, enabling deeper classification of identified regulatory elements. We leverage a variety of bioinformatic tools to discover differential accessibility, predict nucleosome positioning, infer transcription factor binding, cluster peaks based on accessibility dynamics, classify biological functions, and correlate gene expression with accessibility. Using a Minos transposase reporter system, we demonstrate the potential to identify novel regulatory elements using this approach, including distal regulatory elements. This work provides a platform for the identification of novel developmental regulatory elements in Parhyale, and offers a framework for performing such experiments in other emerging research organisms.Primary Findings-Omni-ATAC-Seq identifies cis-regulatory elements genome-wide during crustacean embryogenesis-Combined short- and long-read RNA-Seq improves the Parhyale genome annotation-ImpulseDE2 analysis identifies dynamically regulated candidate regulatory elements-NucleoATAC and HINT-ATAC enable inference of nucleosome occupancy and transcription factor binding-Fuzzy clustering reveals peaks with distinct accessibility and chromatin dynamics-Integration of accessibility and gene expression reveals possible enhancers and repressors-Omni-ATAC can identify known and novel regulatory elements


Author(s):  
William Goh1 ◽  
Marek Mutwil1

Abstract Motivation There are now more than two million RNA sequencing experiments for plants, animals, bacteria and fungi publicly available, allowing us to study gene expression within and across species and kingdoms. However, the tools allowing the download, quality control and annotation of this data for more than one species at a time are currently missing. Results To remedy this, we present the Large-Scale Transcriptomic Analysis Pipeline in Kingdom of Life (LSTrAP-Kingdom) pipeline, which we used to process 134,521 RNA-seq samples, achieving ∼12,000 processed samples per day. Our pipeline generated quality-controlled, annotated gene expression matrices that rival the manually curated gene expression data in identifying functionally-related genes. Availability LSTrAP-Kingdom is available from: https://github.com/wirriamm/plants-pipeline and is fully implemented in Python and Bash. Supplementary information Supplementary data are available at Bioinformatics online.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 5256-5256
Author(s):  
Cynthie Wong ◽  
Vincent Funari ◽  
Maher Albitar

Abstract Introduction Flow cytometry is the gold standard for diagnosing hematologic cancers based on morphologic detection and analysis of a few expensive and delicate immunological markers. On the other hand, targeted RNA sequencing panels are not sample or marker limited; in fact, 50ng of RNA stored for up to 6 months could yield results for thousands of markers. We hypothesized that an RNA-Seq-based targeted immuno-oncology gene expression panel could recapitulate the FLOW diagnostic patterns of AML and CLL routinely used in a clinical laboratory. Methods A custom panel of 2207 genes was constructed including 58 typical FLOW markers and well-referenced immune and oncology markers. Housekeeping genes were added to normalize between batches. A total of 52 CLL, 15 AML, and 20 normal clinical samples were tested in parallel with a clinically validated leukemia/lymphoma flow cytometry panel and targeted RNA-Seq. Paired-end 76 x 76 cycles sequencing was performed using Illumina NextSeq. Bowtie analysis suite was performed to determine gene expression. Unsupervised analysis was first performed to identify patterns associated with clinical diagnosis or sequencing artifacts. Two-way hierarchical clustering of genes having a median expression of >1 fpkm and at least 2 fold differential expression than the median in 10% of the samples revealed a strong CLL profile and a less pervasive AML profile without any supervised analysis. To determine which genes in the profiles were significantly associated with AML or CLL, genes with >5 fold differential expression were assessed after Benjamini-Hochberg correction with single tailed T-tests. Further, each FLOW marker was individually tested using a 1-way ANOVA. Pathway analysis was performed on GO terms using the Fischer exact test. All corrected p-values <0.05 were considered significant. Results In general, the FLOW marker gene expression data highly correlated with protein marker expression and was adequate for rendering proper diagnosis. Overall, CLL had a strong immune-oncology pattern with 10+ flow markers including CD19, CD5, CD2, CD200, CD22, CD79, FCER2, IL3RA, IL2RA, PDCD1, and MS4A1 significantly associated with CLL. In addition, 295 other genes including immune targets like CD74, CD33, CD34, CD48, CD40, and gene targets like PAX5, BCL2, PARP3 further help classify CLL. Forty-three of these genes are involved in immune response pathway (p<1.9x10-17). In contrast, two markers used for FLOW (CD34 and CD52) could classify AML with RNA, and 218 other genes including immune (CD3E/G, CD23, CD48, CD6, CD33) and molecular markers (HOXA10, HOXA9, and TGFBR2) could be used to further classify AML. Interestingly, these genes were significantly enriched for T cell co-stimulation (p<2.0x10-18) and other T cell receptor signaling pathways. To determine whether other immune genes may be used to differentiate CLL or AML, hierarchical clustering of the top 200 genes significantly expressed in either CLL or AML was performed. We could clearly identify two clusters of genes which characterize CLL from other disease types: 1) 110 genes which were highly expressed in CLL, but expressed at low levels in both normal and AML samples, 2) 28 genes with low expression in CLL, but highly expressed in both normal and AML samples. Conversely, few genes were able to characterize AML from normal or CLL samples, including BMP1, NPM2, and FLT3, which were highly expressed in AML samples, but were expressed at low levels in both normal and CLL samples. Conclusion Based on our preliminary study, we have shown that protein marker expression determined by flow are reproduced by our RNA expression panel. Importantly, we are able to classify and diagnose CLL and AML samples based on their RNA expression profiles. Disclosures Wong: NeoGenomics: Employment. Funari:NeoGenomics: Employment.


2021 ◽  
Author(s):  
William Goh ◽  
Marek Mutwil

AbstractSummaryThere are now more than two million RNA sequencing experiments for plants, animals, bacteria and fungi publicly available, allowing us to study gene expression within and across species and kingdoms. However, the tools allowing the download, quality control and annotation of this data for more than one species at a time are currently missing. To remedy this, we present the Large-Scale Transcriptomic Analysis Pipeline in Kingdom of Life (LSTrAP-Kingdom) pipeline, which we used to process 134,521 RNA-seq samples, achieving ~12,000 processed samples per day. Our pipeline generated quality-controlled, annotated gene expression matrices that rival the manually curated gene expression data in identifying functionally-related genes.Availability and implementationLSTrAP-Kingdom is available from: https://github.com/wirriamm/plants-pipeline and is fully implemented in Python and Bash.


2021 ◽  
Author(s):  
Sheng Ju ◽  
Zihan Cui ◽  
yuanyuan hong ◽  
xiaoqing Wang ◽  
weina mu ◽  
...  

Currently, DNA and RNA are used separately to capture different types of gene mutations. DNA is commonly used for the detection of SNVs, indels and CNVs; RNA is used for analysis of gene fusion and gene expression. To perform both DNA sequencing (DNA-seq) and RNA-seq, material is divided into two copies, and two different proce-dures are required for sequencing. Due to overconsumption of samples and experimental process complexity, it is necessary to create an experimental method capable of analyzing SNVs, indels, fusions and expression. We developed an RNA-based hybridization capture panel targeting actionable driver oncogenes in solid tumors and corresponding sample preparation and bioinformatics workflows. Analytical validation with an RNA standard reference containing 16 known fusion mutations and 6 SNV mutations demonstrated a detection specificity of 100.0% [95% CI 88.7%~100.0%] for SNVs and 100.0% [95% CI 95.4%~100.0%] for fusions. The targeted RNA panel achieved a 0.73-2.63 copies/ng RNA lower limit of detection (LOD) for SNVs and 0.21-6.48 copies/ng RNA for fusions. Gene expression analysis revealed a correlation greater than 0.9 across all 15 cancer-related genes between the RNA-seq re-sults and targeted RNA panel. Among 1253 NSCLC FFPE tumor samples, multiple mutation types were called from DNA- and RNA-seq data and compared between the two assays. The DNA panel detected 103 fusions and 21 METex14 skipping events; 124 fusions and 26 METex14 skipping events were detected by the target RNA panel; 21 fusions and 4 METex14 skipping events were only detected by the target RNA panel. Among the 173 NSCLC samples negative for targetable mutations by DNA-seq, 15 (15/173, 8.67%) showed targetable gene fusions that may change clinical decisions with RNA-seq. In total, 226 tier I and tier II missense variants for NSCLC were analyzed at ge-nomic (DNA-seq) and transcriptomic (RNA-seq) levels. The positive percent agreement (PPA) was 97.8%, and the positive predictive value (PPV) was 98.6%. Interestingly, var-iant allele frequencies were generally higher at the RNA level than at the DNA level, suggesting relatively dominant expression of mutant alleles. PPA was 97.6% and PPV 99.38% for EGFR 19del and 20ins variants. We also explored the relationship of RNA expression with gene copy number and protein expression. The RPKM of EGFR transcripts assessed by the RNA panel showed a linear relationship with copy number quantified by the DNA panel, with an R of 0.8 in 1253 samples. In contrast, MET gene expression is regulated in a more complex manner. In IHC analysis, all 3+ samples exhibited higher RPKM levels; IHC level of 2+ and below showed lower RNA expression. Parallel DNA- and RNA-seq and systematic analysis demonstrated the accuracy and robustness of the RNA sequencing panel in identifying multiple types of variants for cancer therapy.


Biology ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1274
Author(s):  
Yunqing Liu ◽  
Xin Liao ◽  
Tingyu Han ◽  
Ao Su ◽  
Zhuojun Guo ◽  
...  

Coral–zooxanthellae holobionts are one of the most productive ecosystems in the ocean. With global warming and ocean acidification, coral ecosystems are facing unprecedented challenges. To save the coral ecosystems, we need to understand the symbiosis of coral–zooxanthellae. Although some Scleractinia (stony corals) transcriptomes have been sequenced, the reliable full-length transcriptome is still lacking due to the short-read length of second-generation sequencing and the uncertainty of the assembly results. Herein, PacBio Sequel II sequencing technology polished with the Illumina RNA-seq platform was used to obtain relatively complete scleractinian coral M. foliosa transcriptome data and to quantify M. foliosa gene expression. A total of 38,365 consensus sequences and 20,751 unique genes were identified. Seven databases were used for the gene function annotation, and 19,972 genes were annotated in at least one database. We found 131 zooxanthellae transcripts and 18,829 M. foliosa transcripts. A total of 6328 lncRNAs, 847 M. foliosa transcription factors (TFs), and 2 zooxanthellae TF were identified. In zooxanthellae we found pathways related to symbiosis, such as photosynthesis and nitrogen metabolism. Pathways related to symbiosis in M. foliosa include oxidative phosphorylation and nitrogen metabolism, etc. We summarized the isoforms and expression level of the symbiont recognition genes. Among the membrane proteins, we found three pathways of glycan biosynthesis, which may be involved in the organic matter storage and monosaccharide stabilization in M. foliosa. Our results provide better material for studying coral symbiosis.


mBio ◽  
2016 ◽  
Vol 7 (1) ◽  
Author(s):  
Ronan K. Carroll ◽  
Andy Weiss ◽  
William H. Broach ◽  
Richard E. Wiemels ◽  
Austin B. Mogen ◽  
...  

ABSTRACTInStaphylococcus aureus, hundreds of small regulatory or small RNAs (sRNAs) have been identified, yet this class of molecule remains poorly understood and severely understudied. sRNA genes are typically absent from genome annotation files, and as a consequence, their existence is often overlooked, particularly in global transcriptomic studies. To facilitate improved detection and analysis of sRNAs inS. aureus, we generated updated GenBank files for three commonly usedS. aureusstrains (MRSA252, NCTC 8325, and USA300), in which we added annotations for >260 previously identified sRNAs. These files, the first to include genome-wide annotation of sRNAs inS. aureus, were then used as a foundation to identify novel sRNAs in the community-associated methicillin-resistant strain USA300. This analysis led to the discovery of 39 previously unidentified sRNAs. Investigating the genomic loci of the newly identified sRNAs revealed a surprising degree of inconsistency in genome annotation inS. aureus, which may be hindering the analysis and functional exploration of these elements. Finally, using our newly created annotation files as a reference, we perform a global analysis of sRNA gene expression inS. aureusand demonstrate that the newly identifiedtsr25is the most highly upregulated sRNA in human serum. This study provides an invaluable resource to theS. aureusresearch community in the form of our newly generated annotation files, while at the same time presenting the first examination of differential sRNA expression in pathophysiologically relevant conditions.IMPORTANCEDespite a large number of studies identifying regulatory or small RNA (sRNA) genes inStaphylococcus aureus, their annotation is notably lacking in available genome files. In addition to this, there has been a considerable lack of cross-referencing in the wealth of studies identifying these elements, often leading to the same sRNA being identified multiple times and bearing multiple names. In this work, we have consolidated and curated known sRNA genes from the literature and mapped them to their position on theS. aureusgenome, creating new genome annotation files. These files can now be used by the scientific community at large in experiments to search for previously undiscovered sRNA genes and to monitor sRNA gene expression by transcriptome sequencing (RNA-seq). We demonstrate this application, identifying 39 new sRNAs and studying their expression duringS. aureusgrowth in human serum.


Sign in / Sign up

Export Citation Format

Share Document