P1015 The Caprinae Genome Database: Multispecies goats/sheep genome and incorporation of RNA-Seq data, and re-sequencing data to study comparative genomics and genome assistant breeding

2016 ◽  
Vol 94 (suppl_4) ◽  
pp. 21-22
Author(s):  
R. Su
2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Zeeshan Ahmed ◽  
Eduard Gibert Renart ◽  
Saman Zeeshan ◽  
XinQi Dong

Abstract Background Genetic disposition is considered critical for identifying subjects at high risk for disease development. Investigating disease-causing and high and low expressed genes can support finding the root causes of uncertainties in patient care. However, independent and timely high-throughput next-generation sequencing data analysis is still a challenge for non-computational biologists and geneticists. Results In this manuscript, we present a findable, accessible, interactive, and reusable (FAIR) bioinformatics platform, i.e., GVViZ (visualizing genes with disease-causing variants). GVViZ is a user-friendly, cross-platform, and database application for RNA-seq-driven variable and complex gene-disease data annotation and expression analysis with a dynamic heat map visualization. GVViZ has the potential to find patterns across millions of features and extract actionable information, which can support the early detection of complex disorders and the development of new therapies for personalized patient care. The execution of GVViZ is based on a set of simple instructions that users without a computational background can follow to design and perform customized data analysis. It can assimilate patients’ transcriptomics data with the public, proprietary, and our in-house developed gene-disease databases to query, easily explore, and access information on gene annotation and classified disease phenotypes with greater visibility and customization. To test its performance and understand the clinical and scientific impact of GVViZ, we present GVViZ analysis for different chronic diseases and conditions, including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders. The results are visualized using GVViZ and can be exported as image (PNF/TIFF) and text (CSV) files that include gene names, Ensembl (ENSG) IDs, quantified abundances, expressed transcript lengths, and annotated oncology and non-oncology diseases. Conclusions We emphasize that automated and interactive visualization should be an indispensable component of modern RNA-seq analysis, which is currently not the case. However, experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the transcriptomics data, making it a powerful tool to study the dynamics of gene expression and regulation. Furthermore, with successful deployment in clinical settings, GVViZ has the potential to enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data.


Viruses ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 343
Author(s):  
Manjin Li ◽  
Dan Xing ◽  
Duo Su ◽  
Di Wang ◽  
Heting Gao ◽  
...  

Dengue virus (DENV), a member of the Flavivirus genus of the Flaviviridae family, can cause dengue fever (DF) and more serious diseases and thus imposes a heavy burden worldwide. As the main vector of DENV, mosquitoes are a serious hazard. After infection, they induce a complex host–pathogen interaction mechanism. Our goal is to further study the interaction mechanism of viruses in homologous, sensitive, and repeatable C6/36 cell vectors. Transcriptome sequencing (RNA-Seq) technology was applied to the host transcript profiles of C6/36 cells infected with DENV2. Then, bioinformatics analysis was used to identify significant differentially expressed genes and the associated biological processes. Quantitative reverse transcription-polymerase chain reaction (qRT-PCR) was performed to verify the sequencing data. A total of 1239 DEGs were found by transcriptional analysis of Aedes albopictus C6/36 cells that were infected and uninfected with dengue virus, among which 1133 were upregulated and 106 were downregulated. Further bioinformatics analysis showed that the upregulated DEGs were significantly enriched in signaling pathways such as the MAPK, Hippo, FoxO, Wnt, mTOR, and Notch; metabolic pathways and cellular physiological processes such as autophagy, endocytosis, and apoptosis. Downregulated DEGs were mainly enriched in DNA replication, pyrimidine metabolism, and repair pathways, including BER, NER, and MMR. The qRT-PCR results showed that the concordance between the RNA-Seq and RT-qPCR data was very high (92.3%). The results of this study provide more information about DENV2 infection of C6/36 cells at the transcriptome level, laying a foundation for further research on mosquito vector–virus interactions. These data provide candidate antiviral genes that can be used for further functional verification in the future.


Genes ◽  
2020 ◽  
Vol 11 (4) ◽  
pp. 397
Author(s):  
Dadong Deng ◽  
Xihong Tan ◽  
Kun Han ◽  
Ruimin Ren ◽  
Jianhua Cao ◽  
...  

The development of the placental fold, which increases the maternal–fetal interacting surface area, is of primary importance for the growth of the fetus throughout the whole pregnancy. However, the mechanisms involved remain to be fully elucidated. Increasing evidence has revealed that long non-coding RNAs (lncRNAs) are a new class of RNAs with regulatory functions and could be epigenetically regulated by histone modifications. In this study, 141 lncRNAs (including 73 up-regulated and 68 down-regulated lncRNAs) were identified to be differentially expressed in the placentas of pigs during the establishment and expanding stages of placental fold development. The differentially expressed lncRNAs and genes (DElncRNA-DEgene) co-expression network analysis revealed that these differentially expressed lncRNAs (DElncRNAs) were mainly enriched in pathways of cell adhesion, cytoskeleton organization, epithelial cell differentiation and angiogenesis, indicating that the DElncRNAs are related to the major events that occur during placental fold development. In addition, we integrated the RNA-seq (RNA sequencing) data with the ChIP-seq (chromatin immunoprecipitation sequencing) data of H3K4me3/H3K27ac produced from the placental samples of pigs from the two stages (gestational days 50 and 95). The analysis revealed that the changes in H3K4me3 and/or H3K27ac levels were significantly associated with the changes in the expression levels of 37 DElncRNAs. Furthermore, several H3K4me3/H3K27ac-lncRNAs were characterized to be significantly correlated with genes functionally related to placental development. Thus, this study provides new insights into understanding the mechanisms for the placental development of pigs.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Xueyi Dong ◽  
Luyi Tian ◽  
Quentin Gouil ◽  
Hasaru Kariyawasam ◽  
Shian Su ◽  
...  

Abstract Application of Oxford Nanopore Technologies’ long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs (‘sequins’) as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.


2020 ◽  
Author(s):  
Maxim Ivanov ◽  
Albin Sandelin ◽  
Sebastian Marquardt

Abstract Background: The quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data. Results: We developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5' and 3' tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.Conclusions: Our proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.


2017 ◽  
Author(s):  
Irina Mohorianu

AbstractBackgroundRNA sequencing (RNA-seq) is widely used for RNA quantification across environmental, biological and medical sciences; it enables the description of genome-wide patterns of expression and the deduction of regulatory interactions and networks. The aim of computational analyses is to achieve an accurate output, i.e. rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite the variable levels of noise and biases present in sequencing data. The evaluation of sequencing quality and normalization are essential components of this process.ResultsWe investigate the discriminative power of existing approaches for the quality checking of mRNA-seq data and also propose additional, quantitative, quality checks. To accommodate the analysis of a nested, multi-level design using data on D. melanogaster, we incorporated the sample layout into the analysis. We describe a “subsampling without replacement”-based normalization and identification of DE that accounts for the experimental design i.e. the hierarchy and amplitude of effect sizes within samples. We also evaluate the differential expression call in comparison to existing approaches. To assess the broader applicability of these methods, we applied this series of steps to a published set of H. sapiens mRNA-seq samples.ConclusionsThe dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. Overall, the proposed approach offers the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments into the data analysis. 38


2019 ◽  
Author(s):  
Christina Huan Shi ◽  
Kevin Y. Yip

AbstractK-mer counting has many applications in sequencing data processing and analysis. However, sequencing errors can produce many false k-mers that substantially increase the memory requirement during counting. We propose a fast k-mer counting method, CQF-deNoise, which has a novel component for dynamically identifying and removing false k-mers while preserving counting accuracy. Compared with four state-of-the-art k-mer counting methods, CQF-deNoise consumed 49-76% less memory than the second best method, but still ran competitively fast. The k-mer counts from CQF-deNoise produced cell clusters from single-cell RNA-seq data highly consistent with CellRanger but required only 5% of the running time at the same memory consumption, suggesting that CQF-deNoise can be used for a preview of cell clusters for an early detection of potential data problems, before running a much more time-consuming full analysis pipeline.


2021 ◽  
Author(s):  
Yu-Sheng Chen ◽  
Shuaiyao Lu ◽  
Bing Zhang ◽  
Tingfu Du ◽  
Wen-Jie Li ◽  
...  

SARS-CoV-2, as the causation of severe epidemic of COVID-19, is one kind of positive single-stranded RNA virus with high transmissibility. However, whether or not SARS-CoV-2 can integrate into host genome needs thorough investigation. Here, we performed both RNA sequencing (RNA-seq) and whole genome sequencing on SARS-CoV-2 infected human and monkey cells, and investigated the presence of host-virus chimeric events. Through RNA-seq, we did detect the chimeric host-virus reads in the infected cells. But further analysis using mixed libraries of infected cells and uninfected zebrafish embryos demonstrated that these reads are falsely generated during library construction. In support, whole genome sequencing also didn't identify the existence of chimeric reads in their corresponding regions. Therefore, the evidence for SARS-CoV-2's integration into host genome is lacking.


2017 ◽  
Author(s):  
Christopher Wilks ◽  
Phani Gaddipati ◽  
Abhinav Nellore ◽  
Ben Langmead

AbstractAs more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. Snaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70,000 human RNA-seq samples. Queries can be tailored by constraining which junctions and samples to consider. Snaptron can also rank and score junctions according to tissue specificity or other criteria. Further, Snaptron can rank and score samples according to the relative frequency of different splicing patterns. We outline biological questions that can be explored with Snaptron queries, including a study of novel exons in annotated genes, of exonization of repetitive element loci, and of a recently discovered alternative transcription start site for the ALK gene. Web app and documentation are at http://snaptron.cs.jhu.edu. Source code is at https://github.com/ChristopherWilks/snaptron under the MIT license.


2014 ◽  
Author(s):  
Simon Anders ◽  
Paul Theodor Pyl ◽  
Wolfgang Huber

Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard work flows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data such as genomic coordinates, sequences, sequencing reads, alignments, gene model information, variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability: HTSeq is released as open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index, https://pypi.python.org/pypi/HTSeq


Sign in / Sign up

Export Citation Format

Share Document