RDBMS and NOSQL Based Hybrid Technology for Transcriptome Data Structuring and Processing

The transcriptome sequencing experiment (RNA-seq) has become almost a routine procedure for studying both model organisms and crops. As a result of bioinformatics processing of such experimental output, huge heterogeneous data are obtained, representing nucleotide sequences of transcripts, amino acid sequences, and their structural and functional annotation. It is important to present the data obtained to a wide range of researchers in the form of databases. This article proposes a hybrid approach to creating molecular genetic databases that contain information about transcript sequences and their structural and functional annotation. The essence of the approach consists in the simultaneous storing both structured and weakly structured data in the database. The technology was used to implement a database of transcriptomes of agricultural plants. This paper discusses the features of implementing this approach and examples of generating both simple and complex queries to such a database in the SQL language. The OORT database is freely available at https://oort.cytogen.ru/.

Download Full-text

RASflow: An RNA-Seq Analysis Workflow with Snakemake

10.1101/839191 ◽

2019 ◽

Author(s):

Xiaokang Zhang ◽

Inge Jonassen

Keyword(s):

Gene Expression ◽

Management System ◽

Workflow Management ◽

Model Organisms ◽

Gene Transcript ◽

Rna Seq ◽

Public Data ◽

Wide Range ◽

Analysis Workflow ◽

Programming Skills

AbstractBackgroundWith the cost of DNA sequencing decreasing, increasing amounts of RNA-Seq data are being generated giving novel insight into gene expression and regulation. Prior to analysis of gene expression, the RNA-Seq data has to be processed through a number of steps resulting in a quantification of expression of each gene / transcript in each of the analyzed samples. A number of workflows are available to help researchers perform these steps on their own data, or on public data to take advantage of novel software or reference data in data re-analysis. However, many of the existing workflows are limited to specific types of studies. We therefore aimed to develop a maximally general workflow, applicable to a wide range of data and analysis approaches and at the same time support research on both model and non-model organisms. Furthermore, we aimed to make the workflow usable also for users with limited programming skills.ResultsUtilizing the workflow management system Snakemake and the package management system Conda, we have developed a modular, flexible and user-friendly RNA-Seq analysis pipeline: RNA-Seq Analysis Snakemake Workflow (RASflow). Utilizing Snakemake and Conda alleviates challenges with library dependencies and version conflicts and also supports reproducibility. To be applicable for a wide variety of applications, RASflow supports mapping of reads to both genomic and transcriptomic assemblies. RASflow has a broad range of potential users: it can be applied by researchers interested in any organism and since it requires no programming skills, it can be used by researchers with different backgrounds. RASflow is an open source tool and source code as well as documentation, tutorials and example data sets can be found on GitHub: https://github.com/zhxiaokang/RASflowConclusionsRASflow is a simple and reliable RNA-Seq analysis workflow which is a full pack of RNA-Seq analysis.

Download Full-text

Are there systematic biases in RNA-seq data analysis? A case study for Amphimedon queenslandica sponge as a model object

10.1101/2020.02.28.969642 ◽

2020 ◽

Author(s):

Sergey Feranchuk

Keyword(s):

Functional Annotation ◽

Rna Seq ◽

Expression Levels ◽

Model Object ◽

Wide Range ◽

Amphimedon Queenslandica ◽

Systematic Biases ◽

Series Of Experiments ◽

Notch Pathways

AbstractBACKGROUNDThe performance of a functional annotation approach for RNA-seq bioinformatics pipelines was to be compared with the method where groups of genes are generated with no relation to ontologes. Three publicly available RNA-Seq experiments for Amphimedon queenslandica sponge were used for the designed comparison. One of these experiments was referred in the publication where stages of embryo development were compared for a wide range of animal species.METHODSThe expression levels were re-calculated here for three independent series of experiments. The functional annotation of differential expression levels was than conducted. This allow to compare an applicability of the two approaches, and to re-evaluate the interpretation provided in the mentioned publication.RESULTSIt was confirmed by the conventional approach that Wnt and Notch pathways do operate in a development of a sponge embryo. The method of annotation which uses unbounded grouping of genes was effective in an ability to separate development stages of sponge embryo. In addition, the published results were by a suggestion distorted by an artifact, caused by a positive feedback in the stage of data processing.

Download Full-text

Glutton: large-scale integration of non-model organism transcriptome data for comparative analysis

10.1101/077511 ◽

2016 ◽

Cited By ~ 2

Author(s):

Alan Medlar ◽

Laura Laakso ◽

Andreia Miraldo ◽

Ari Löytynoja

Keyword(s):

Comparative Analysis ◽

Large Scale ◽

De Novo ◽

Sequence Data ◽

Model Organism ◽

Model Organisms ◽

Rna Seq ◽

Reference Species ◽

Wide Range ◽

The Impact

AbstractHigh-throughput RNA-seq data has become ubiquitous in the study of non-model organisms, but its use in comparative analysis remains a challenge. Without a reference genome for mapping, sequence data has to be de novo assembled, producing large numbers of short, highly redundant contigs. Preparing these assemblies for comparative analyses requires the removal of redundant isoforms, assignment of orthologs and converting fragmented transcripts into gene alignments. In this article we present Glutton, a novel tool to process transcriptome assemblies for downstream evolutionary analyses. Glutton takes as input a set of fragmented, possibly erroneous transcriptome assemblies. Utilising phylogeny-aware alignment and reference data from a closely related species, it reconstructs one transcript per gene, finds orthologous sequences and produces accurate multiple alignments of coding sequences. We present a comprehensive analysis of Glutton’s performance across a wide range of divergence times between study and reference species. We demonstrate the impact choice of assembler has on both the number of alignments and the correctness of ortholog assignment and show substantial improvements over heuristic methods, without sacrificing correctness. Finally, using inference of Darwinian selection as an example of downstream analysis, we show that Glutton-processed RNA-seq data give results comparable to those obtained from full length gene sequences even with distantly related reference species. Glutton is available from http://wasabiapp.org/software/glutton/ and is licensed under the GPLv3.

Download Full-text

Application of the Innovative and Non-Invasive Technique, Molecular Music Therapy (MMT), Bio-Frequency Therapy, for the Treatment of a Wide Range of Disorders and Pathologies, with Consequent Verification of Molecular Parameters by Using Bi-Digital O-Ring Test (BDORT)

Acupuncture & Electro-Therapeutics Research ◽

10.3727/036012920x15779969212928 ◽

2020 ◽

Vol 44 (3) ◽

pp. 177-189

Author(s):

Momir Dunjic ◽

Stefano Turini ◽

Dejan Krstic ◽

Katarina Dunjic ◽

Marija Dunjic ◽

...

Keyword(s):

Amino Acid ◽

Music Therapy ◽

Amino Acid Sequences ◽

Sirtuin 1 ◽

Ring Test ◽

Invasive Technique ◽

Specific Gene ◽

Radiofrequency Therapy ◽

Wide Range ◽

Clinical Pictures

Radiofrequency therapy is an unconventional method, already applied for some time, with numerous results in numerous clinical pictures. Our group has developed a software, later called SONGENPROT-SOLARIS, capable of directly converting nucleotide sequences (DNA and/or RNA) and amino acid sequences (polypeptides and proteins) into musical sequences, based on mathematic matrices, designed by the French physicist and musician Joel Sternheimer, which allows to associate a musical note with a nucleotide or an amino acid. Innovation in our software is that, in the algorithm that defines it, a variant is directly implemented that allows the reproduction of sounds, phase-shifted by 30 Hz, between one ear and another reproducing the phenomenon of Binaural Tones, capable of induce a specific brain activity and also the release of particles called solitons. Thanks to this software we have developed a technique called MMT (Molecular Music Therapy) and currently, we are in the phase of applying the technique on a cohort of 91 patients, with a high spectrum of clinical pictures, examining the same, using the technique Bi-Digital-ORing-Test (BDORT), before and after treatment with MMT. Aim of project is to stimulate the expression of a specific gene (the same genetic sequence that the patient listens to, translated into music), only through the use of sound sequences. We have concentrated our attention on three main molecules: Sirtuin-1, Telomers and TP-53. The results obtained with BDORT, after treatment with MMT, showed a significant increase in the values of the three molecules, on all the examined patients, demonstrating the operative efficacy of the technique and the its applicability to numerous diseases. In order to confirm the data obtained by BDORT, we propose, with the help of an accredited laboratory, to perform epigenetic tests on the three parameters listed above, paving the way to understanding how frequencies can influence gene expression.

Download Full-text

Automatic Hip Detection in Anteroposterior Pelvic Radiographs—A Labelless Practical Framework

Journal of Personalized Medicine ◽

10.3390/jpm11060522 ◽

2021 ◽

Vol 11 (6) ◽

pp. 522

Author(s):

Feng-Yu Liu ◽

Chih-Chi Chen ◽

Chi-Tung Cheng ◽

Cheng-Ta Wu ◽

Chih-Po Hsu ◽

...

Keyword(s):

Medical Image ◽

Image Annotation ◽

Region Of Interest ◽

Confidence Score ◽

Heterogeneous Data ◽

Single Shot ◽

Hip Joints ◽

Pelvic Radiographs ◽

Wide Range ◽

Average Confidence

Automated detection of the region of interest (ROI) is a critical step in the two-step classification system in several medical image applications. However, key information such as model parameter selection, image annotation rules, and ROI confidence score are essential but usually not reported. In this study, we proposed a practical framework of ROI detection by analyzing hip joints seen on 7399 anteroposterior pelvic radiographs (PXR) from three diverse sources. We presented a deep learning-based ROI detection framework utilizing a single-shot multi-box detector with a customized head structure based on the characteristics of the obtained datasets. Our method achieved average intersection over union (IoU) = 0.8115, average confidence = 0.9812, and average precision with threshold IoU = 0.5 (AP50) = 0.9901 in the independent testing set, suggesting that the detected hip regions appropriately covered the main features of the hip joints. The proposed approach featured flexible loose-fitting labeling, customized model design, and heterogeneous data testing. We demonstrated the feasibility of training a robust hip region detector for PXRs. This practical framework has a promising potential for a wide range of medical image applications.

Download Full-text

Relative Total Variation Structure Analysis-Based Fusion Method for Hyperspectral and LiDAR Data Classification

Remote Sensing ◽

10.3390/rs13061143 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1143

Author(s):

Yinghui Quan ◽

Yingping Tong ◽

Wei Feng ◽

Gabriel Dauphin ◽

Wenjiang Huang ◽

...

Keyword(s):

Feature Extraction ◽

Total Variation ◽

Structure Analysis ◽

Hyperspectral Image ◽

Gabor Filter ◽

Heterogeneous Data ◽

Lidar Data ◽

Fusion Method ◽

Wide Range ◽

Relative Total Variation

The fusion of the hyperspectral image (HSI) and the light detecting and ranging (LiDAR) data has a wide range of applications. This paper proposes a novel feature fusion method for urban area classification, namely the relative total variation structure analysis (RTVSA), to combine various features derived from HSI and LiDAR data. In the feature extraction stage, a variety of high-performance methods including the extended multi-attribute profile, Gabor filter, and local binary pattern are used to extract the features of the input data. The relative total variation is then applied to remove useless texture information of the processed data. Finally, nonparametric weighted feature extraction is adopted to reduce the dimensions. Random forest and convolutional neural networks are utilized to evaluate the fusion images. Experiments conducted on two urban Houston University datasets (including Houston 2012 and the training portion of Houston 2017) demonstrate that the proposed method can extract the structural correlation from heterogeneous data, withstand a noise well, and improve the land cover classification accuracy.

Download Full-text

mtDNAcombine: tools to combine sequences from multiple studies

BMC Bioinformatics ◽

10.1186/s12859-021-04048-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Eleanor F. Miller ◽

Andrea Manica

Keyword(s):

Sequence Data ◽

Data Extraction ◽

Bayesian Skyline Plot ◽

Model Organisms ◽

Data Sets ◽

Data Handling ◽

Online Database ◽

Genetic Studies ◽

Wide Range ◽

Existing Data

Abstract Background Today an unprecedented amount of genetic sequence data is stored in publicly available repositories. For decades now, mitochondrial DNA (mtDNA) has been the workhorse of genetic studies, and as a result, there is a large volume of mtDNA data available in these repositories for a wide range of species. Indeed, whilst whole genome sequencing is an exciting prospect for the future, for most non-model organisms’ classical markers such as mtDNA remain widely used. By compiling existing data from multiple original studies, it is possible to build powerful new datasets capable of exploring many questions in ecology, evolution and conservation biology. One key question that these data can help inform is what happened in a species’ demographic past. However, compiling data in this manner is not trivial, there are many complexities associated with data extraction, data quality and data handling. Results Here we present the mtDNAcombine package, a collection of tools developed to manage some of the major decisions associated with handling multi-study sequence data with a particular focus on preparing sequence data for Bayesian skyline plot demographic reconstructions. Conclusions There is now more genetic information available than ever before and large meta-data sets offer great opportunities to explore new and exciting avenues of research. However, compiling multi-study datasets still remains a technically challenging prospect. The mtDNAcombine package provides a pipeline to streamline the process of downloading, curating, and analysing sequence data, guiding the process of compiling data sets from the online database GenBank.

Download Full-text

Clustering Single-Cell RNA-Seq Data with Regularized Gaussian Graphical Model

Genes ◽

10.3390/genes12020311 ◽

2021 ◽

Vol 12 (2) ◽

pp. 311

Author(s):

Zhenqiu Liu

Keyword(s):

Single Cell ◽

Free Parameter ◽

Graphical Model ◽

Expression Patterns ◽

Information Criterion ◽

Log P ◽

Rna Seq ◽

Clustering Methods ◽

Wide Range ◽

Free Parameters

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.

Download Full-text

An In Vitro Cell Culture Model for Pyoverdine-Mediated Virulence

Pathogens ◽

10.3390/pathogens10010009 ◽

2020 ◽

Vol 10 (1) ◽

pp. 9

Author(s):

Donghoon Kang ◽

Natalia V. Kirienko

Keyword(s):

Cell Culture ◽

Virulence Factors ◽

Model Organisms ◽

Cell Culture Model ◽

Chronic Infections ◽

In Vitro Cell Culture ◽

Mitochondrial Homeostasis ◽

Culture Model ◽

Wide Range

Pseudomonas aeruginosa is a multidrug-resistant, opportunistic pathogen that utilizes a wide-range of virulence factors to cause acute, life-threatening infections in immunocompromised patients, especially those in intensive care units. It also causes debilitating chronic infections that shorten lives and worsen the quality of life for cystic fibrosis patients. One of the key virulence factors in P. aeruginosa is the siderophore pyoverdine, which provides the pathogen with iron during infection, regulates the production of secreted toxins, and disrupts host iron and mitochondrial homeostasis. These roles have been characterized in model organisms such as Caenorhabditis elegans and mice. However, an intermediary system, using cell culture to investigate the activity of this siderophore has been absent. In this report, we describe such a system, using murine macrophages treated with pyoverdine. We demonstrate that pyoverdine-rich filtrates from P. aeruginosa exhibit substantial cytotoxicity, and that the inhibition of pyoverdine production (genetic or chemical) is sufficient to mitigate virulence. Furthermore, consistent with previous observations made in C. elegans, pyoverdine translocates into cells and disrupts host mitochondrial homeostasis. Most importantly, we observe a strong correlation between pyoverdine production and virulence in P. aeruginosa clinical isolates, confirming pyoverdine’s value as a promising target for therapeutic intervention. This in vitro cell culture model will allow rapid validation of pyoverdine antivirulents in a simple but physiologically relevant manner.

Download Full-text

Revealing biophysical properties of KfrA-type proteins as a novel class of cytoskeletal, coiled-coil plasmid-encoded proteins

BMC Microbiology ◽

10.1186/s12866-020-02079-w ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

M. Adamczyk ◽

E. Lewicka ◽

R. Szatkowska ◽

H. Nieznanska ◽

J. Ludwiczak ◽

...

Keyword(s):

Dna Binding ◽

Host Range ◽

Coiled Coil ◽

Amino Acid Sequences ◽

Broad Host Range ◽

Biophysical Properties ◽

Dna Binding Sites ◽

In Silico Studies ◽

Wide Range

Abstract Background DNA binding KfrA-type proteins of broad-host-range bacterial plasmids belonging to IncP-1 and IncU incompatibility groups are characterized by globular N-terminal head domains and long alpha-helical coiled-coil tails. They have been shown to act as transcriptional auto-regulators. Results This study was focused on two members of the growing family of KfrA-type proteins encoded by the broad-host-range plasmids, R751 of IncP-1β and RA3 of IncU groups. Comparative in vitro and in silico studies on KfrAR751 and KfrARA3 confirmed their similar biophysical properties despite low conservation of the amino acid sequences. They form a wide range of oligomeric forms in vitro and, in the presence of their cognate DNA binding sites, they polymerize into the higher order filaments visualized as “threads” by negative staining electron microscopy. The studies revealed also temperature-dependent changes in the coiled-coil segment of KfrA proteins that is involved in the stabilization of dimers required for DNA interactions. Conclusion KfrAR751 and KfrARA3 are structural homologues. We postulate that KfrA type proteins have moonlighting activity. They not only act as transcriptional auto-regulators but form cytoskeletal structures, which might facilitate plasmid DNA delivery and positioning in the cells before cell division, involving thermal energy.

Download Full-text