sequencing data Latest Research Papers

Quantifying RNA synthesis at rate-limiting steps of transcription using nascent RNA-sequencing data

STAR Protocols ◽

10.1016/j.xpro.2021.101036 ◽

2022 ◽

Vol 3 (1) ◽

pp. 101036

Author(s):

Adelina Rabenius ◽

Sajitha Chandrakumaran ◽

Lea Sistonen ◽

Anniina Vihervaara

Keyword(s):

Rna Sequencing ◽

Rna Synthesis ◽

Sequencing Data ◽

Nascent Rna ◽

Rate Limiting

DNA Methylation-Driven Genes for Developing Survival Nomogram for Low-Grade Glioma

Frontiers in Oncology ◽

10.3389/fonc.2021.629521 ◽

2022 ◽

Vol 11 ◽

Author(s):

Yingyun Guo ◽

Yuan Li ◽

Jiao Li ◽

Weiping Tao ◽

Weiguo Dong

Keyword(s):

Dna Methylation ◽

The Cancer Genome Atlas ◽

Clinicopathological Parameters ◽

Low Grade ◽

Lasso Regression ◽

Sequencing Data ◽

Cancer Genome Atlas ◽

Auc Value ◽

User Friendly ◽

Genome Atlas

Low-grade gliomas (LGG) are heterogeneous, and the current predictive models for LGG are either unsatisfactory or not user-friendly. The objective of this study was to establish a nomogram based on methylation-driven genes, combined with clinicopathological parameters for predicting prognosis in LGG. Differential expression, methylation correlation, and survival analysis were performed in 516 LGG patients using RNA and methylation sequencing data, with accompanying clinicopathological parameters from The Cancer Genome Atlas. LASSO regression was further applied to select optimal prognosis-related genes. The final prognostic nomogram was implemented together with prognostic clinicopathological parameters. The predictive efficiency of the nomogram was internally validated in training and testing groups, and externally validated in the Chinese Glioma Genome Atlas database. Three DNA methylation-driven genes, ARL9, CMYA5, and STEAP3, were identified as independent prognostic factors. Together with IDH1 mutation status, age, and sex, the final prognostic nomogram achieved the highest AUC value of 0.930, and demonstrated stable consistency in both internal and external validations. The prognostic nomogram could predict personal survival probabilities for patients with LGG, and serve as a user-friendly tool for prognostic evaluation, optimizing therapeutic regimes, and managing LGG patients.

Cell-type-specific transcriptome architecture underlying the establishment and exacerbation of systemic lupus erythematosus

10.1101/2022.01.12.22269137 ◽

2022 ◽

Author(s):

Masahiro Nakano ◽

Mineto Ota ◽

Yusuke Takeshima ◽

Yukiko Iwasaki ◽

Hiroaki Hatano ◽

...

Keyword(s):

Systemic Lupus Erythematosus ◽

Disease Activity ◽

Lupus Erythematosus ◽

Large Scale ◽

Immune Cell ◽

Disease State ◽

Cell Type ◽

Sequencing Data ◽

Systemic Lupus ◽

Cell Type Specific

Systemic lupus erythematosus (SLE) is a complex and heterogeneous autoimmune disease involving multiple immune cells. A major hurdle to the elucidation of SLE pathogenesis is our limited understanding of dysregulated gene expression linked to various clinical statuses with a high cellular resolution. Here, we conducted a large-scale transcriptome study with 6,386 RNA sequencing data covering 27 immune cell types from 159 SLE and 89 healthy donors. We first profiled two distinct cell-type-specific transcriptomic signatures: disease-state and disease-activity signatures, reflecting disease establishment and exacerbation, respectively. We next identified candidate biological processes unique to each signature. This study suggested the clinical value of disease-activity signatures, which were associated with organ involvement and responses to therapeutic agents such as belimumab. However, disease-activity signatures were less enriched around SLE risk variants than disease-state signatures, suggesting that the genetic studies to date may not well capture clinically vital biology in SLE. Together, we identified comprehensive gene signatures of SLE, which will provide essential foundations for future genomic, genetic, and clinical studies.

Actinoporin-like Proteins Are Widely Distributed in the Phylum Porifera

Marine Drugs ◽

10.3390/md20010074 ◽

2022 ◽

Vol 20 (1) ◽

pp. 74

Author(s):

Kenneth Sandoval ◽

Grace P. McCormack

Keyword(s):

De Novo ◽

Sequence Similarity ◽

Phylogenetic Analyses ◽

Gene Copy ◽

Sequencing Data ◽

Sea Anemones ◽

Significant Sequence Similarity ◽

Genes Encoding ◽

Animal Phyla ◽

Membrane Attachment

Actinoporins are proteinaceous toxins known for their ability to bind to and create pores in cellular membranes. This quality has generated interest in their potential use as new tools, such as therapeutic immunotoxins. Isolated historically from sea anemones, genes encoding for similar actinoporin-like proteins have since been found in a small number of other animal phyla. Sequencing and de novo assembly of Irish Haliclona transcriptomes indicated that sponges also possess similar genes. An exhaustive analysis of publicly available sequencing data from other sponges showed that this is a potentially widespread feature of the Porifera. While many sponge proteins possess a sequence similarity of 27.70–59.06% to actinoporins, they show consistency in predicted structure. One gene copy from H. indistincta has significant sequence similarity to sea anemone actinoporins and possesses conserved residues associated with the fundamental roles of sphingomyelin recognition, membrane attachment, oligomerization, and pore formation, indicating that it may be an actinoporin. Phylogenetic analyses indicate frequent gene duplication, no distinct clade for sponge-derived proteins, and a stronger signal towards actinoporins than similar proteins from other phyla. Overall, this study provides evidence that a diverse array of Porifera represents a novel source of actinoporin-like proteins which may have biotechnological and pharmaceutical applications.

Metabolomic, photoprotective, and photosynthetic acclimatory responses to post-flowering drought in sorghum

10.1101/2022.01.14.476420 ◽

2022 ◽

Author(s):

christopher Baker ◽

Dhruv Patel ◽

Benjamin J. Cole ◽

Lindsey G. Ching ◽

Oliver Dautermann ◽

...

Keyword(s):

Drought Tolerance ◽

Metabolic Response ◽

Central Valley ◽

Grain Filling ◽

Water Dynamics ◽

Sequencing Data ◽

Stay Green ◽

Drought Tolerant ◽

Metabolomic Data ◽

High Yields

Climate change is globally affecting rainfall patterns, necessitating the improvement of drought tolerance in crops. Sorghum bicolor is a drought-tolerant cereal capable of producing high yields under water scarcity conditions. Functional stay-green sorghum genotypes can maintain green leaf area and efficient grain filling in terminal post-flowering water deprivation, a period of ~10 weeks. To obtain molecular insights into these characteristics, two drought-tolerant genotypes, BTx642 and RTx430, were grown in control and terminal post-flowering drought field plots in the Central Valley of California. Photosynthetic, photoprotective, water dynamics, and biomass traits were quantified and correlated with metabolomic data collected from leaves, stems, and roots at multiple timepoints during drought. Physiological and metabolomic data was then compared to longitudinal RNA sequencing data collected from these two genotypes. The metabolic response to drought highlights the uniqueness of the post-flowering drought acclimation relative to pre-flowering drought. The functional stay-green genotype BTx642 specifically induced photoprotective responses in post-flowering drought supporting a putative role for photoprotection in the molecular basis of the functional stay-green trait. Specific genes are highlighted that may contribute to post-flowering drought tolerance and that can be targeted in crops to maximize yields under limited water input conditions.

Improving the time and space complexity of the WFA algorithm and generalizing its scoring

10.1101/2022.01.12.476087 ◽

2022 ◽

Author(s):

Jordan M Eizenga ◽

Benedict Paten

Keyword(s):

Scoring System ◽

Space Complexity ◽

Genomic Sequencing ◽

Sequencing Data ◽

Time And Space ◽

Alignment Algorithms ◽

Run Time ◽

Modern Genomic ◽

Time And Space Complexity

Modern genomic sequencing data is trending toward longer sequences with higher accuracy. Many analyses using these data will center on alignments, but classical exact alignment algorithms are infeasible for long sequences. The recently proposed WFA algorithm demonstrated how to perform exact alignment for long, similar sequences in O(sN) time and O(s2) memory, where s is a score that is low for similar sequences (Marco-Sola et al., 2021). However, this algorithm still has infeasible memory requirements for longer sequences. Also, it uses an alternate scoring system that is unfamiliar to many bioinformaticians. We describe variants of WFA that improve its asymptotic memory use from O(s2) to O(s3/2) and its asymptotic run time from O(sN) to O(s2 + N). We expect the reduction in memory use to be particularly impactful, as it makes it practical to perform highly multithreaded megabase-scale exact alignments in common compute environments. In addition, we show how to fold WFA's alternate scoring into the broader literature on alignment scores.

ECCsplorer: a pipeline to detect extrachromosomal circular DNA (eccDNA) from next-generation sequencing data

BMC Bioinformatics ◽

10.1186/s12859-021-04545-2 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Ludwig Mann ◽

Kathrin M. Seibt ◽

Beatrice Weber ◽

Tony Heitkam

Keyword(s):

Next Generation Sequencing ◽

Transposable Elements ◽

Data Availability ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Circular Dna ◽

Wide Range ◽

Generation Sequencing

Abstract Background Extrachromosomal circular DNAs (eccDNAs) are ring-like DNA structures physically separated from the chromosomes with 100 bp to several megabasepairs in size. Apart from carrying tandemly repeated DNA, eccDNAs may also harbor extra copies of genes or recently activated transposable elements. As eccDNAs occur in all eukaryotes investigated so far and likely play roles in stress, cancer, and aging, they have been prime targets in recent research—with their investigation limited by the scarcity of computational tools. Results Here, we present the ECCsplorer, a bioinformatics pipeline to detect eccDNAs in any kind of organism or tissue using next-generation sequencing techniques. Following Illumina-sequencing of amplified circular DNA (circSeq), the ECCsplorer enables an easy and automated discovery of eccDNA candidates. The data analysis encompasses two major procedures: first, read mapping to the reference genome allows the detection of informative read distributions including high coverage, discordant mapping, and split reads. Second, reference-free comparison of read clusters from amplified eccDNA against control sample data reveals specifically enriched DNA circles. Both software parts can be run separately or jointly, depending on the individual aim or data availability. To illustrate the wide applicability of our approach, we analyzed semi-artificial and published circSeq data from the model organisms Homo sapiens and Arabidopsis thaliana, and generated circSeq reads from the non-model crop plant Beta vulgaris. We clearly identified eccDNA candidates from all datasets, with and without reference genomes. The ECCsplorer pipeline specifically detected mitochondrial mini-circles and retrotransposon activation, showcasing the ECCsplorer’s sensitivity and specificity. Conclusion The ECCsplorer (available online at https://github.com/crimBubble/ECCsplorer) is a bioinformatics pipeline to detect eccDNAs in any kind of organism or tissue using next-generation sequencing data. The derived eccDNA targets are valuable for a wide range of downstream investigations—from analysis of cancer-related eccDNAs over organelle genomics to identification of active transposable elements.

Machine Learning-Assisted Identification of Factors Contributing to the Technical Variability Between Bulk and Single-Cell RNA-Seq Experiments

10.21203/rs.3.rs-1247889/v1 ◽

2022 ◽

Author(s):

Sofya Lipnitskaya ◽

Yang Shen ◽

Stefan Legewie ◽

Holger Klein ◽

Kolja Becker

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Single Cell ◽

Rna Sequencing ◽

Quantitative Difference ◽

Rna Seq ◽

Sequencing Data ◽

Factors Affecting ◽

Expression Variability ◽

Technical Variability

Abstract Background: Recent studies in the area of transcriptomics performed on single-cell and population levels reveal noticeable variability in gene expression measurements provided by different RNA sequencing technologies. Due to increased noise and complexity of single-cell RNA-Seq (scRNA-Seq) data over the bulk experiment, there is a substantial number of variably-expressed genes and so-called dropouts, challenging the subsequent computational analysis and potentially leading to false positive discoveries. In order to investigate factors affecting technical variability between RNA sequencing experiments of different technologies, we performed a systematic assessment of single-cell and bulk RNA-Seq data, which have undergone the same pre-processing and sample preparation procedures. Results: Our analysis indicates that variability between gene expression measurements as well as dropout events are not exclusively caused by biological variability, low expression levels, or random variation. Furthermore, we propose FAVSeq, a machine learning-assisted pipeline for detection of factors contributing to gene expression variability in matched RNA-Seq data provided by two technologies. Based on the analysis of the matched bulk and single-cell dataset, we found the 3'-UTR and transcript lengths as the most relevant effectors of the observed variation between RNA-Seq experiments, while the same factors together with cellular compartments were shown to be associated with dropouts. Conclusions: Here, we investigated the sources of variation in RNA-Seq profiles of matched single-cell and bulk experiments. In addition, we proposed the FAVSeq pipeline for analyzing multimodal RNA sequencing data, which allowed to identify factors affecting quantitative difference in gene expression measurements as well as the presence of dropouts. Hereby, the derived knowledge can be employed further in order to improve the interpretation of RNA-Seq data and identify genes that can be affected by assay-based deviations. Source code is available under the MIT license at https://github.com/slipnitskaya/FAVSeq.

A comprehensive framework for analysis of microRNA sequencing data in metastatic colorectal cancer

NAR Cancer ◽

10.1093/narcan/zcab051 ◽

2022 ◽

Vol 4 (1) ◽

Author(s):

Eirik Høye ◽

Bastian Fromm ◽

Paul H M Böttger ◽

Diana Domanska ◽

Annette Torgunrud ◽

...

Keyword(s):

Colorectal Cancer ◽

Epithelial To Mesenchymal Transition ◽

Differential Expression Analysis ◽

Metastatic Progression ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Mesenchymal Transition ◽

Starting Point ◽

Microrna Sequencing ◽

Metastatic Sites

ABSTRACT Although microRNAs (miRNAs) contribute to all hallmarks of cancer, miRNA dysregulation in metastasis remains poorly understood. The aim of this work was to reliably identify miRNAs associated with metastatic progression of colorectal cancer (CRC) using novel and previously published next-generation sequencing (NGS) datasets generated from 268 samples of primary (pCRC) and metastatic CRC (mCRC; liver, lung and peritoneal metastases) and tumor adjacent tissues. Differential expression analysis was performed using a meticulous bioinformatics pipeline, including only bona fide miRNAs, and utilizing miRNA-tailored quality control and processing. Five miRNAs were identified as up-regulated at multiple metastatic sites Mir-210_3p, Mir-191_5p, Mir-8-P1b_3p [mir-141–3p], Mir-1307_5p and Mir-155_5p. Several have previously been implicated in metastasis through involvement in epithelial-to-mesenchymal transition and hypoxia, while other identified miRNAs represent novel findings. The use of a publicly available pipeline facilitates reproducibility and allows new datasets to be added as they become available. The set of miRNAs identified here provides a reliable starting-point for further research into the role of miRNAs in metastatic progression.

The Role of the Extracellular Matrix and Tumor-Infiltrating Immune Cells in the Prognostication of High-Grade Serous Ovarian Cancer

Cancers ◽

10.3390/cancers14020404 ◽

2022 ◽

Vol 14 (2) ◽

pp. 404

Author(s):

Yuri Belotti ◽

Elaine Hsuen Lim ◽

Chwee Teck Lim

Keyword(s):

Ovarian Cancer ◽

Immune Cells ◽

Risk Group ◽

Response To Treatment ◽

High Grade ◽

Sequencing Data ◽

Serous Ovarian Carcinoma ◽

Bioinformatic Approaches ◽

Gene Panels

Ovarian cancer is the eighth global leading cause of cancer-related death among women. The most common form is the high-grade serous ovarian carcinoma (HGSOC). No further improvements in the 5-year overall survival have been seen over the last 40 years since the adoption of platinum- and taxane-based chemotherapy. Hence, a better understanding of the mechanisms governing this aggressive phenotype would help identify better therapeutic strategies. Recent research linked onset, progression, and response to treatment with dysregulated components of the tumor microenvironment (TME) in many types of cancer. In this study, using bioinformatic approaches, we identified a 19-gene TME-related HGSOC prognostic genetic panel (19 prognostic genes (PLXNB2, HMCN2, NDNF, NTN1, TGFBI, CHAD, CLEC5A, PLXNA1, CST9, LOXL4, MMP17, PI3, PRSS1, SERPINA10, TLL1, CBLN2, IL26, NRG4, and WNT9A) by assessing the RNA sequencing data of 342 tumors available in the TCGA database. Using machine learning, we found that specific patterns of infiltrating immune cells characterized each risk group. Furthermore, we demonstrated the predictive potential of our risk score across different platforms and its improved prognostic performance compared with other gene panels.

sequencing data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Quantifying RNA synthesis at rate-limiting steps of transcription using nascent RNA-sequencing data

DNA Methylation-Driven Genes for Developing Survival Nomogram for Low-Grade Glioma

Cell-type-specific transcriptome architecture underlying the establishment and exacerbation of systemic lupus erythematosus

Actinoporin-like Proteins Are Widely Distributed in the Phylum Porifera

Metabolomic, photoprotective, and photosynthetic acclimatory responses to post-flowering drought in sorghum

Improving the time and space complexity of the WFA algorithm and generalizing its scoring

ECCsplorer: a pipeline to detect extrachromosomal circular DNA (eccDNA) from next-generation sequencing data

Machine Learning-Assisted Identification of Factors Contributing to the Technical Variability Between Bulk and Single-Cell RNA-Seq Experiments

A comprehensive framework for analysis of microRNA sequencing data in metastatic colorectal cancer

The Role of the Extracellular Matrix and Tumor-Infiltrating Immune Cells in the Prognostication of High-Grade Serous Ovarian Cancer

Export Citation Format

sequencing dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Quantifying RNA synthesis at rate-limiting steps of transcription using nascent RNA-sequencing data

DNA Methylation-Driven Genes for Developing Survival Nomogram for Low-Grade Glioma

Cell-type-specific transcriptome architecture underlying the establishment and exacerbation of systemic lupus erythematosus

Actinoporin-like Proteins Are Widely Distributed in the Phylum Porifera

Metabolomic, photoprotective, and photosynthetic acclimatory responses to post-flowering drought in sorghum

Improving the time and space complexity of the WFA algorithm and generalizing its scoring

ECCsplorer: a pipeline to detect extrachromosomal circular DNA (eccDNA) from next-generation sequencing data

Machine Learning-Assisted Identification of Factors Contributing to the Technical Variability Between Bulk and Single-Cell RNA-Seq Experiments

A comprehensive framework for analysis of microRNA sequencing data in metastatic colorectal cancer

The Role of the Extracellular Matrix and Tumor-Infiltrating Immune Cells in the Prognostication of High-Grade Serous Ovarian Cancer

sequencing data
Recently Published Documents