scholarly journals Another look at matrix correlations

2019 ◽  
Vol 35 (22) ◽  
pp. 4748-4753 ◽  
Author(s):  
Ahmad Borzou ◽  
Razie Yousefi ◽  
Rovshan G Sadygov

Abstract Motivation High throughput technologies are widely employed in modern biomedical research. They yield measurements of a large number of biomolecules in a single experiment. The number of experiments usually is much smaller than the number of measurements in each experiment. The simultaneous measurements of biomolecules provide a basis for a comprehensive, systems view for describing relevant biological processes. Often it is necessary to determine correlations between the data matrices under different conditions or pathways. However, the techniques for analyzing the data with a low number of samples for possible correlations within or between conditions are still in development. Earlier developed correlative measures, such as the RV coefficient, use the trace of the product of data matrices as the most relevant characteristic. However, a recent study has shown that the RV coefficient consistently overestimates the correlations in the case of low sample numbers. To correct for this bias, it was suggested to discard the diagonal elements of the outer products of each data matrix. In this work, a principled approach based on the matrix decomposition generates three trace-independent parts for every matrix. These components are unique, and they are used to determine different aspects of correlations between the original datasets. Results Simulations show that the decomposition results in the removal of high correlation bias and the dependence on the sample number intrinsic to the RV coefficient. We then use the correlations to analyze a real proteomics dataset. Availability and implementation The python code can be downloaded from http://dynamic-proteome.utmb.edu/MatrixCorrelations.aspx. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Irzam Sarfraz ◽  
Muhammad Asif ◽  
Joshua D Campbell

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Keyao Wang ◽  
Jun Wang ◽  
Carlotta Domeniconi ◽  
Xiangliang Zhang ◽  
Guoxian Yu

Abstract Motivation Isoforms are alternatively spliced mRNAs of genes. They can be translated into different functional proteoforms, and thus greatly increase the functional diversity of protein variants (or proteoforms). Differentiating the functions of isoforms (or proteoforms) helps understanding the underlying pathology of various complex diseases at a deeper granularity. Since existing functional genomic databases uniformly record the annotations at the gene-level, and rarely record the annotations at the isoform-level, differentiating isoform functions is more challenging than the traditional gene-level function prediction. Results Several approaches have been proposed to differentiate the functions of isoforms. They generally follow the multi-instance learning paradigm by viewing each gene as a bag and the spliced isoforms as its instances, and push functions of bags onto instances. These approaches implicitly assume the collected annotations of genes are complete and only integrate multiple RNA-seq datasets. As such, they have compromised performance. We propose a data integrative solution (called DisoFun) to Differentiate isoform Functions with collaborative matrix factorization. DisoFun assumes the functional annotations of genes are aggregated from those of key isoforms. It collaboratively factorizes the isoform data matrix and gene-term data matrix (storing Gene Ontology (GO) annotations of genes) into low-rank matrices to simultaneously explore the latent key isoforms, and achieve function prediction by aggregating predictions to their originating genes. In addition, it leverages the PPI network and GO structure to further coordinate the matrix factorization. Extensive experimental results show that DisoFun improves the AUROC (area under the receiver-operating characteristic curve) and AUPRC (area under the precision-recall curve) of existing solutions by at least 7.7% and 28.9%, respectively. We further investigate DisoFun on four exemplar genes (LMNA, ADAM15, BCL2L1, and CFLAR) with known functions at the isoform-level, and observed that DisoFun can differentiate functions of their isoforms with 90.5% accuracy. Availability The code of DisoFun is available at mlda.swu.edu.cn/codes.php?name=DisoFun. Supplementary information Supplementary data are available at Bioinformatics online.


Sensors ◽  
2018 ◽  
Vol 18 (10) ◽  
pp. 3461 ◽  
Author(s):  
Jingwei Yin ◽  
Bing Liu ◽  
Guangping Zhu ◽  
Zhinan Xie

It is challenging to detect a moving target in the reverberant environment for a long time. In recent years, a kind of method based on low-rank and sparse theory was developed to study this problem. The multiframe data containing the target echo and reverberation are arranged in a matrix, and then, the detection is achieved by low-rank and sparse decomposition of the data matrix. In this paper, we introduce a new method for the matrix decomposition using dynamic mode decomposition (DMD). DMD is usually used to calculate eigenmodes of an approximate linear model. We divided the eigenmodes into two categories to realize low-rank and sparse decomposition such that we detected the target from the sparse component. Compared with the previous methods based on low-rank and sparse theory, our method improves the computation speed by approximately 4–90-times at the expense of a slight loss of detection gain. The efficient method has a big advantage for real-time processing. This method can spare time for other stages of processing to improve the detection performance. We have validated the method with three sets of underwater acoustic data.


2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Yishu Wang ◽  
Dejie Yang ◽  
Minghua Deng

Background. Epistatic miniarray profile (EMAP) studies have enabled the mapping of large-scale genetic interaction networks and generated large amounts of data in model organisms. One approach to analyze EMAP data is to identify gene modules with densely interacting genes. In addition, genetic interaction score (Sscore) reflects the degree of synergizing or mitigating effect of two mutants, which is also informative. Statistical approaches that exploit both modularity and the pairwise interactions may provide more insight into the underlying biology. However, the high missing rate in EMAP data hinders the development of such approaches. To address the above problem, we adopted the matrix decomposition methodology “low-rank and sparse decomposition” (LRSDec) to decompose EMAP data matrix into low-rank part and sparse part.Results. LRSDec has been demonstrated as an effective technique for analyzing EMAP data. We applied a synthetic dataset and an EMAP dataset studying RNA-related processes inSaccharomyces cerevisiae. Global views of the genetic cross talk between different RNA-related protein complexes and processes have been structured, and novel functions of genes have been predicted.


1991 ◽  
Vol 124 (1) ◽  
pp. K11-K14 ◽  
Author(s):  
C. Dos Santos Lourenço ◽  
M. Cilense ◽  
W. Garlipp

Author(s):  
David Barber

Finding clusters of well-connected nodes in a graph is a problem common to many domains, including social networks, the Internet and bioinformatics. From a computational viewpoint, finding these clusters or graph communities is a difficult problem. We use a clique matrix decomposition based on a statistical description that encourages clusters to be well connected and few in number. The formal intractability of inferring the clusters is addressed using a variational approximation inspired by mean-field theories in statistical mechanics. Clique matrices also play a natural role in parametrizing positive definite matrices under zero constraints on elements of the matrix. We show that clique matrices can parametrize all positive definite matrices restricted according to a decomposable graph and form a structured factor analysis approximation in the non-decomposable case. Extensions to conjugate Bayesian covariance priors and more general non-Gaussian independence models are briefly discussed.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7430
Author(s):  
Kumiko Matsui ◽  
Takanobu Tsuihiji

Background Desmostylia is a clade of extinct aquatic mammals with no living members. Today, this clade is considered belonging to either Afrotheria or Perissodactyla. In the currently-accepted taxonomic scheme, Desmostylia includes two families, 10 to 12 genera, and 13–14 species. There have been relatively few phylogenetic analyses published on desmostylian interrelationship compared to other vertebrate taxa, and two main, alternative phylogenetic hypotheses have been proposed in previous studies. One major problem with those previous studies is that the numbers of characters and OTUs were small. Methods In this study, we analyzed the phylogenetic interrelationship of Desmostylia based on a new data matrix that includes larger numbers of characters and taxa than in any previous studies. The new data matrix was compiled mainly based on data matrices of previous studies and included three outgroups and 13 desmostylian ingroup taxa. Analyses were carried out using five kinds of parsimonious methods. Results Strict consensus trees of the most parsimonious topologies obtained in all analyses supported the monophyly of Desmostylidae and paraphyly of traditional Paleoparadoxiidae. Based on these results, we propose phylogenetic definitions of the clades Desmostylidae and Paleoparadoxiidae based on common ancestry.


Author(s):  
Alexey Ovchinnikov ◽  
Isabel Pérez Verona ◽  
Gleb Pogudin ◽  
Mirco Tribastone

Abstract Motivation Detailed mechanistic models of biological processes can pose significant challenges for analysis and parameter estimations due to the large number of equations used to track the dynamics of all distinct configurations in which each involved biochemical species can be found. Model reduction can help tame such complexity by providing a lower-dimensional model in which each macro-variable can be directly related to the original variables. Results We present CLUE, an algorithm for exact model reduction of systems of polynomial differential equations by constrained linear lumping. It computes the smallest dimensional reduction as a linear mapping of the state space such that the reduced model preserves the dynamics of user-specified linear combinations of the original variables. Even though CLUE works with nonlinear differential equations, it is based on linear algebra tools, which makes it applicable to high-dimensional models. Using case studies from the literature, we show how CLUE can substantially lower model dimensionality and help extract biologically intelligible insights from the reduction. Availability An implementation of the algorithm and relevant resources to replicate the experiments herein reported are freely available for download at https://github.com/pogudingleb/CLUE. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Xiao-min Chen ◽  
Jun-xu Su ◽  
Qiu-ming Zhu ◽  
Xu-jun Hu ◽  
Zhu Fang

The aim of this paper is to investigate a linear precoding scheme design for a multiple-input multiple-output two-way relay system with imperfect channel state information. The scheme design is simplified as an optimal problem with precoding matrix variables, which is deduced with the maximum power constraint at the relay station based on the minimum mean square error criterion. With channel feedback delay at both ends of the channel and the channel estimation errors being taken into account, we propose a matrix decomposition scheme and a joint iterative scheme to minimize the average sum mean square error. The matrix decomposition method is used to derive the closed form of the relay matrix, and the joint iterative algorithm is used to optimize the precoding matrix and the processing matrix. According to numerical simulation results, the matrix decomposition scheme reduces the system bit error rate (BER) effectively and the joint iterative scheme achieves the best performance of BER against existing methods.


Sign in / Sign up

Export Citation Format

Share Document