Transcriptomics: Quantifying Non-Uniform Read Distribution Using MapReduce

RNA-seq is a high-throughput Next-sequencing technique for estimating the concentration of all transcripts in a transcriptome. The method involves complex preparatory and post-processing steps which can introduce bias, and the technique produces a large amount of data [7, 19]. Two important challenges in processing RNA-seq data are therefore the ability to process a vast amount of data, and methods to quantify the bias in public RNA-seq datasets. We describe a novel analysis method, based on analysing sequence motif correlations, that employs MapReduce on Apache Spark to quantify bias in Next-generation sequencing (NGS) data at the deep exon level. Our implementation is designed specifically for processing large datasets and allows for scalability and deployment on cloud service providers offering MapReduce. In investigating the wild and mutant organism types in the species D. melanogaster we have found that motifs with runs of Gs (or their complement) exhibit low motif-pair correlations in comparison with other motif-pairs. This is independent of the mean exon GC content in the wild type data, but there is a mild dependence in the mutant data. Hence, whilst both datasets show the same trends, there is however significant variation between the two samples.

Download Full-text

Transcriptome-wide analysis of microRNA-mRNA correlations in unperturbed tissue transcriptomes identifies microRNA targeting determinants.

10.1101/2021.12.22.473932 ◽

2021 ◽

Author(s):

Juan Manuel Trinidad ◽

Rafael Sebastian Fort ◽

Guillermo Trinidad ◽

Beatriz Garat ◽

Maria A Duhagon

Keyword(s):

Gene Expression ◽

Small Rna ◽

Current Knowledge ◽

Gc Content ◽

Sequence Motif ◽

Z Score ◽

Rna Seq ◽

Microrna Target ◽

Candidate Sequence ◽

Interaction Sites

MicroRNAs are small RNAs that regulate gene expression through complementary base pairing with their target mRNAs. Given the small size of the pairing region and the large number of mRNAs that each microRNA can control, the identification of biologically relevant targets is difficult. Since current knowledge of target recognition and repression has mainly relied on in vitro studies, we sought to determine if the interrogation of gene expression data of unperturbed tissues could yield new insight into these processes. The transcriptome-wide repression at the microRNA-mRNA canonical interaction sites (seed and 3'-supplementary region, identified by sole base complementarity) was calculated as a normalized Spearman correlation (Z-score) between the abundance of the transcripts in the PRAD-TCGA tissues (RNA-seq and small RNA-seq data of 546 samples). Using the repression values obtained we confirmed established properties or microRNA targeting efficacy, such as the preference for gene regions (3'UTR>CDS>5'UTR), the proportionality between repression and seed length (6mer<7mer<8mer) and the contribution to the repression exerted by the supplementary pairing at 13-16nt of the microRNA. Our results suggest that the 7mer-m8 seed could be more repressive than the 7mer-A1, while they have similar efficacy when they interact using the 3'-supplementary pairing. Strikingly, the 6mer+suppl sites yielded normalized Z-score of repression similar to the sole 7mer-m8 or 7mer-A1 seeds, which raise awareness of its potential biological relevance. We then used the approach to further characterize the 3'-supplementary pairing, using 39 microRNAs that hold repressive 3'-supplementary interactions. The analysis of the bridge between seed and 3'-supplementary pairing site confirmed the optimum +1 offset previously evidenced, but higher offsets appear to hold similar repressive strength. In addition, they show a low GC content at position 13-16, and base preferences that allow the selection of a candidate sequence motif. Overall, our study demonstrates that transcriptome-wide analysis of microRNA-mRNA correlations in large, matched RNA-seq and small-RNA-seq data has the power to uncover hints of microRNA targeting determinants operating in the in vivo unperturbed set. Finally, we made available a bioinformatic tool to analyze microRNA-target mRNA interactions using our approach.

Download Full-text

Interplay between gene nucleotide composition bias and splicing

10.1101/605832 ◽

2019 ◽

Author(s):

Sébastien Lemaire ◽

Nicolas Fontrodona ◽

Jean-Baptiste Claude ◽

Hélène Polvèche ◽

Fabien Aubé ◽

...

Keyword(s):

Gc Content ◽

Nucleotide Composition ◽

Chromatin Organization ◽

Splicing Factors ◽

Rna Seq ◽

Regulatory Processes ◽

U2 Snrnp ◽

Topologically Associated Domains ◽

Associated Proteins ◽

Exon Level

AbstractTo characterize the rules governing exon recognition during splicing, we analysed dozens of RNA-seq datasets and identified ~3,200 GC-rich exons and ~4,000 AT-rich exons whose inclusion depends on different sets of splicing factors. We show that GC-rich exons have predicted RNA secondary structures at 5’-ss, and are dependent on U1 snRNP–associated proteins. In contrast, AT-rich exons have a large number of branchpoints and SF1-or U2AF2-binding sites and are dependent on U2 snRNP–associated proteins. Nucleotide composition bias also influences local chromatin organization, with consequences for exon recognition during splicing. As the GC content of exons correlates with that of their hosting genes, isochores and topologically-associated domains, we propose that regional nucleotide composition bias leaves a local footprint at the exon level and induces constraints during splicing that can be alleviated by local chromatin organization and recruitment of specific splicing factors. Therefore, nucleotide composition bias establishes a direct link between genome organization and local regulatory processes, like alternative splicing.

Download Full-text

A Practical Conflicting Role-based Cloud Security Risk Evaluation Method

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666191204145140 ◽

2019 ◽

Vol 13 ◽

Author(s):

Jin Han ◽

Jing Zhan ◽

Xiaoqing Xia ◽

Xue Fan

Keyword(s):

Risk Evaluation ◽

Evaluation Method ◽

Service Providers ◽

Risk Preferences ◽

Cloud Service ◽

Cloud Security ◽

Third Party ◽

Security Evaluation ◽

Security Risk ◽

Cloud Users

Background: Currently, Cloud Service Provider (CSP) or third party usually proposes principles and methods for cloud security risk evaluation, while cloud users have no choice but accept them. However, since cloud users and cloud service providers have conflicts of interests, cloud users may not trust the results of security evaluation performed by the CSP. Also, different cloud users may have different security risk preferences, which makes it difficult for third party to consider all users' needs during evaluation. In addition, current security evaluation indexes for cloud are too impractical to test (e.g., indexes like interoperability, transparency, portability are not easy to be evaluated). Methods: To solve the above problems, this paper proposes a practical cloud security risk evaluation method of decision-making based on conflicting roles by using the Analytic Hierarchy Process (AHP) with Aggregation of Individual priorities (AIP). Results: Not only can our method bring forward a new index system based on risk source for cloud security and corresponding practical testing methods, but also can obtain the evaluation result with the risk preferences of conflicting roles, namely CSP and cloud users, which can lay a foundation for improving mutual trusts between the CSP and cloud users. The experiments show that the method can effectively assess the security risk of cloud platforms and in the case where the number of clouds increased by 100% and 200%, the evaluation time using our methodology increased by only by 12% and 30%. Conclusion: Our method can achieve consistent decision based on conflicting roles, high scalability and practicability for cloud security risk evaluation.

Download Full-text

Improved Proofs Of Retrievability And Replication For Data Availability In Cloud Storage

The Computer Journal ◽

10.1093/comjnl/bxz151 ◽

2020 ◽

Vol 63 (8) ◽

pp. 1216-1230 ◽

Cited By ~ 2

Author(s):

Wei Guo ◽

Sujuan Qin ◽

Jun Lu ◽

Fei Gao ◽

Zhengping Jin ◽

...

Keyword(s):

Service Providers ◽

Cloud Service ◽

Data Availability ◽

Forgery Attack ◽

Common Strategy ◽

Comparable Performance ◽

Replication Scheme ◽

High Level ◽

Cloud Users ◽

Proofs Of Retrievability

Abstract For a high level of data availability and reliability, a common strategy for cloud service providers is to rely on replication, i.e. storing several replicas onto different servers. To provide cloud users with a strong guarantee that all replicas required by them are actually stored, many multi-replica integrity auditing schemes were proposed. However, most existing solutions are not resource economical since users need to create and upload replicas of their files by themselves. A multi-replica solution called Mirror is presented to overcome the problems, but we find that it is vulnerable to storage saving attack, by which a dishonest provider can considerably save storage costs compared to the costs of storing all the replicas honestly—while still can pass any challenge successfully. In addition, we also find that Mirror is easily subject to substitution attack and forgery attack, which pose new security risks for cloud users. To address the problems, we propose some simple yet effective countermeasures and an improved proofs of retrievability and replication scheme, which can resist the aforesaid attacks and maintain the advantages of Mirror, such as economical bandwidth and efficient verification. Experimental results show that our scheme exhibits comparable performance with Mirror while achieving high security.

Download Full-text

A Generic Model for Identifying QoS Parameters Interrelations in Cloud Services Selection Ontology during Runtime

Symmetry ◽

10.3390/sym13040563 ◽

2021 ◽

Vol 13 (4) ◽

pp. 563

Author(s):

Babu Rajendiran ◽

Jayashree Kanniappan

Keyword(s):

Service Providers ◽

Cloud Service ◽

Cloud Services ◽

Operating Costs ◽

Generic Model ◽

Business Organizations ◽

Cloud Environment ◽

Cloud Service Provider ◽

Qos Parameters ◽

Computer Based

Nowadays, many business organizations are operating on the cloud environment in order to diminish their operating costs and to select the best service from many cloud providers. The increasing number of Cloud Services available on the market encourages the cloud consumer to be conscious in selecting the most apt Cloud Service Provider that satisfies functionality, as well as QoS parameters. Many disciplines of computer-based applications use standardized ontology to represent information in their fields that indicate the necessity of an ontology-based representation. The proposed generic model can help service consumers to identify QoS parameters interrelations in the cloud services selection ontology during run-time, and for service providers to enhance their business by interpreting the various relations. The ontology has been developed using the intended attributes of QoS from various service providers. A generic model has been developed and it is tested with the developed ontology.

Download Full-text

Exon-level estimates improve the detection of differentially expressed genes in RNA-seq studies

RNA Biology ◽

10.1080/15476286.2020.1868151 ◽

2021 ◽

pp. 1-8

Author(s):

Arfa Mehmood ◽

Asta Laiho ◽

Laura L. Elo

Keyword(s):

Differentially Expressed Genes ◽

Differentially Expressed ◽

Rna Seq ◽

Exon Level

Download Full-text

scSNV: accurate dscRNA-seq SNV co-expression analysis using duplicate tag collapsing

Genome Biology ◽

10.1186/s13059-021-02364-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Gavin W. Wilson ◽

Mathieu Derouet ◽

Gail E. Darling ◽

Jonathan C. Yeung

Keyword(s):

Genetic Variants ◽

False Positive ◽

Variant Calling ◽

Call Rate ◽

Rna Seq ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Variant Call ◽

Two Samples ◽

Co Detection

AbstractIdentifying single nucleotide variants has become common practice for droplet-based single-cell RNA-seq experiments; however, presently, a pipeline does not exist to maximize variant calling accuracy. Furthermore, molecular duplicates generated in these experiments have not been utilized to optimally detect variant co-expression. Herein, we introduce scSNV designed from the ground up to “collapse” molecular duplicates and accurately identify variants and their co-expression. We demonstrate that scSNV is fast, with a reduced false-positive variant call rate, and enables the co-detection of genetic variants and A>G RNA edits across twenty-two samples.

Download Full-text

Addressing Semantics Standards for Cloud Portability and Interoperability in Multi Cloud Environment

Symmetry ◽

10.3390/sym13020317 ◽

2021 ◽

Vol 13 (2) ◽

pp. 317

Author(s):

Chithambaramani Ramalingam ◽

Prakash Mohan

Keyword(s):

Service Providers ◽

Research Work ◽

Cloud Service ◽

Cloud Services ◽

Cloud Environment ◽

Cloud Service Providers ◽

Application Programming ◽

Increasing Demand ◽

Programming Interfaces ◽

Multi Cloud

The increasing demand for cloud computing has shifted business toward a huge demand for cloud services, which offer platform, software, and infrastructure for the day-to-day use of cloud consumers. Numerous new cloud service providers have been introduced to the market with unique features that assist service developers collaborate and migrate services among multiple cloud service providers to address the varying requirements of cloud consumers. Many interfaces and proprietary application programming interfaces (API) are available for migration and collaboration services among cloud providers, but lack standardization efforts. The target of the research work was to summarize the issues involved in semantic cloud portability and interoperability in the multi-cloud environment and define the standardization effort imminently needed for migrating and collaborating services in the multi-cloud environment.

Download Full-text