Reconstructing Sample-Specific Networks using LIONESS

Mapping Intimacies ◽

10.1101/2021.09.27.461954 ◽

2021 ◽

Author(s):

Marieke Lydia Kuijjer ◽

Kimberly Glass

Keyword(s):

Data Preprocessing ◽

Network Reconstruction ◽

Single Sample ◽

Reconstruction Algorithms ◽

Data Types ◽

Aggregate Network ◽

Expected Outcomes

We recently developed LIONESS, a method to estimate sample-specific networks based on the output of an aggregate network reconstruction approach. In this manuscript, we describe how to apply LIONESS to different network reconstruction algorithms and data types. We highlight how decisions related to data preprocessing may affect the output networks, discuss expected outcomes, and give examples of how to analyze and compare single sample networks.

Download Full-text

lionessR: single-sample network reconstruction in R

10.1101/582098 ◽

2019 ◽

Author(s):

Marieke L. Kuijjer ◽

John Quackenbush ◽

Kimberly Glass

Keyword(s):

Regulatory Network ◽

Bone Cancer ◽

Linear Interpolation ◽

R Package ◽

Network Reconstruction ◽

Single Sample ◽

Reconstruction Method ◽

Reconstruction Algorithms ◽

Cancer Dataset ◽

Weighted Adjacency Matrix

SummaryWe recently developed LIONESS (Linear Interpolation to Obtain Network Estimates for Single Samples), a method that can be used together with network reconstruction algorithms to extract networks for individual samples in a population. LIONESS was originally made available as a function within the PANDA (Passing Attributes between Networks for Data Assimilation) regulatory network reconstruction framework. In this application note, we describe lionessR, an R implementation of LIONESS that can be applied to any network reconstruction method in R that outputs a complete, weighted adjacency matrix. As an example, we use lionessR to model single-sample co-expression networks on a bone cancer dataset, and show how lionessR can be used to identify differential co-expression between two groups of patients.Availability and implementationThe lionessR open source R package, which includes a vignette of the application, is freely available athttps://github.com/mararie/[email protected]

Download Full-text

Audio and Speech Processing for Data Mining

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch017 ◽

2011 ◽

pp. 98-103 ◽

Cited By ~ 1

Author(s):

Zheng-Hua Tan

Keyword(s):

Data Mining ◽

Speech Processing ◽

Large Data ◽

Data Preprocessing ◽

Multimedia Data ◽

Data Sets ◽

Data Types ◽

Multimedia Data Mining ◽

Customer Preferences ◽

And Storage

The explosive increase in computing power, network bandwidth and storage capacity has largely facilitated the production, transmission and storage of multimedia data. Compared to alpha-numeric database, non-text media such as audio, image and video are different in that they are unstructured by nature, and although containing rich information, they are not quite as expressive from the viewpoint of a contemporary computer. As a consequence, an overwhelming amount of data is created and then left unstructured and inaccessible, boosting the desire for efficient content management of these data. This has become a driving force of multimedia research and development, and has lead to a new field termed multimedia data mining. While text mining is relatively mature, mining information from non-text media is still in its infancy, but holds much promise for the future. In general, data mining the process of applying analytical approaches to large data sets to discover implicit, previously unknown, and potentially useful information. This process often involves three steps: data preprocessing, data mining and postprocessing (Tan, Steinbach, & Kumar, 2005). The first step is to transform the raw data into a more suitable format for subsequent data mining. The second step conducts the actual mining while the last one is implemented to validate and interpret the mining results. Data preprocessing is a broad area and is the part in data mining where essential techniques are highly dependent on data types. Different from textual data, which is typically based on a written language, image, video and some audio are inherently non-linguistic. Speech as a spoken language lies in between and often provides valuable information about the subjects, topics and concepts of multimedia content (Lee & Chen, 2005). The language nature of speech makes information extraction from speech less complicated yet more precise and accurate than from image and video. This fact motivates content based speech analysis for multimedia data mining and retrieval where audio and speech processing is a key, enabling technology (Ohtsuki, Bessho, Matsuo, Matsunaga, & Kayashi, 2006). Progress in this area can impact numerous business and government applications (Gilbert, Moore, & Zweig, 2005). Examples are discovering patterns and generating alarms for intelligence organizations as well as for call centers, analyzing customer preferences, and searching through vast audio warehouses.

Download Full-text

Performance Assessment of the Network Reconstruction Approaches on Various Interactomes

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.666705 ◽

2021 ◽

Vol 8 ◽

Author(s):

M. Kaan Arici ◽

Nurcan Tuncbag

Keyword(s):

Protein Interactions ◽

Structural Information ◽

Network Reconstruction ◽

Heat Diffusion ◽

Reconstruction Algorithms ◽

Personalized Pagerank ◽

Cancer Driver ◽

Pathway Reconstruction ◽

Main Challenge ◽

Steiner Forest

Beyond the list of molecules, there is a necessity to collectively consider multiple sets of omic data and to reconstruct the connections between the molecules. Especially, pathway reconstruction is crucial to understanding disease biology because abnormal cellular signaling may be pathological. The main challenge is how to integrate the data together in an accurate way. In this study, we aim to comparatively analyze the performance of a set of network reconstruction algorithms on multiple reference interactomes. We first explored several human protein interactomes, including PathwayCommons, OmniPath, HIPPIE, iRefWeb, STRING, and ConsensusPathDB. The comparison is based on the coverage of each interactome in terms of cancer driver proteins, structural information of protein interactions, and the bias toward well-studied proteins. We next used these interactomes to evaluate the performance of network reconstruction algorithms including all-pair shortest path, heat diffusion with flux, personalized PageRank with flux, and prize-collecting Steiner forest (PCSF) approaches. Each approach has its own merits and weaknesses. Among them, PCSF had the most balanced performance in terms of precision and recall scores when 28 pathways from NetPath were reconstructed using the listed algorithms. Additionally, the reference interactome affects the performance of the network reconstruction approaches. The coverage and disease- or tissue-specificity of each interactome may vary, which may result in differences in the reconstructed networks.

Download Full-text

Addressing confounding artifacts in reconstruction of gene co-expression networks

10.1101/202903 ◽

2017 ◽

Cited By ~ 4

Author(s):

Princy Parsana ◽

Claire Ruberman ◽

Andrew E. Jaffe ◽

Michael C. Schatz ◽

Alexis Battle ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Network Inference ◽

Principal Component ◽

Network Reconstruction ◽

Common Interest ◽

Expression Data ◽

Reconstruction Algorithms ◽

Wide Range ◽

False Discoveries

AbstractBackgroundGene co-expression networks capture diverse biological relationships between genes, and are important tools in predicting gene function and understanding disease mechanisms. Functional interactions between genes have not been fully characterized for most organisms, and therefore reconstruction of gene co-expression networks has been of common interest in a variety of settings. However, methods routinely used for reconstruction of gene co-expression networks do not account for confounding artifacts known to affect high dimensional gene expression measurements.ResultsIn this study, we show that artifacts such as batch effects in gene expression data confound commonly used network reconstruction algorithms. Both theoretically and empirically, we demonstrate that removing the effects of top principal components from gene expression measurements prior to network inference can reduce false discoveries, especially when well annotated technical covariates are not available. Using expression data from the GTEx project in multiple tissues and hundreds of individuals, we show that this latent factor residualization approach often reduces false discoveries in the reconstructed networks.ConclusionNetwork reconstruction is susceptible to confounders that affect measurements of gene expression. Even controlling for major individual known technical covariates fails to fully eliminate confounding variation from the data. In studies where a wide range of annotated technical factors are measured and available, correcting gene expression data with multiple covariates can also improve network reconstruction, but such extensive annotations are not always available. Our study shows that principal component correction, which does not depend on study design or annotation of all relevant confounders, removes patterns of artifactual variation and improves network reconstruction in both simulated data, and gene expression data from GTEx project. We have implemented our PC correction approach in the Bioconductor package sva which can be used prior to network reconstruction with a range of methods.

Download Full-text

Comparative Study of Computational Methods for Reconstructing Genetic Networks of Cancer-Related Pathways

Cancer Informatics ◽

10.4137/cin.s13781 ◽

2014 ◽

Vol 13s2 ◽

pp. CIN.S13781

Author(s):

Nafiseh Sedaghat ◽

Takumi Saegusa ◽

Timothy Randolph ◽

Ali Shojaie

Keyword(s):

Biological Networks ◽

Genetic Interaction ◽

Empirical Evaluation ◽

Network Reconstruction ◽

Data Types ◽

Reconstruction Methods ◽

Cancer Initiation ◽

Diverse Data ◽

Substantial Heterogeneity ◽

Insight Into

Network reconstruction is an important yet challenging task in systems biology. While many methods have been recently proposed for reconstructing biological networks from diverse data types, properties of estimated networks and differences between reconstruction methods are not well understood. In this paper, we conduct a comprehensive empirical evaluation of seven existing network reconstruction methods, by comparing the estimated networks with different sparsity levels for both normal and tumor samples. The results suggest substantial heterogeneity in networks reconstructed using different reconstruction methods. Our findings also provide evidence for significant differences between networks of normal and tumor samples, even after accounting for the considerable variability in structures of networks estimated using different reconstruction methods. These differences can offer new insight into changes in mechanisms of genetic interaction associated with cancer initiation and progression.

Download Full-text

ANAT 3.0: a framework for elucidating functional protein subnetworks using graph-theoretic and machine learning approaches

BMC Bioinformatics ◽

10.1186/s12859-021-04449-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

L. F. Signorini ◽

T. Almozlino ◽

R. Sharan

Keyword(s):

Machine Learning ◽

Protein Interaction ◽

Network Reconstruction ◽

Protein Interaction Networks ◽

Interaction Networks ◽

Reconstruction Algorithms ◽

Functional Protein ◽

Protein Protein Interaction ◽

Graphical Tool ◽

Protein Protein Interaction Networks

Abstract Background ANAT is a Cytoscape plugin for the inference of functional protein–protein interaction networks in yeast and human. It is a flexible graphical tool for scientists to explore and elucidate the protein–protein interaction pathways of a process under study. Results Here we present ANAT3.0, which comes with updated PPI network databases of 544,455 (human) and 155,504 (yeast) interactions, and a new machine-learning layer for refined network elucidation. Together they improve network reconstruction to more than twofold increase in the quality of reconstructing known signaling pathways from KEGG. Conclusions ANAT3.0 includes improved network reconstruction algorithms and more comprehensive protein–protein interaction networks than previous versions. ANAT is available for download on the Cytoscape Appstore and at https://www.cs.tau.ac.il/~bnet/ANAT/.

Download Full-text

Dimensionality Reduction Methods Used in Machine Learning

Műszaki Tudományos Közlemények ◽

10.33894/mtk-2020.13.27 ◽

2020 ◽

Vol 13 (1) ◽

pp. 148-151

Author(s):

Kristóf Muhi ◽

Zsolt Csaba Johanyák

Keyword(s):

Machine Learning ◽

Missing Data ◽

Dimensionality Reduction ◽

Feature Space ◽

Data Preprocessing ◽

Short Review ◽

High Dimensional ◽

Data Types ◽

Reduction Methods ◽

The Individual

AbstractIn most cases, a dataset obtained through observation, measurement, etc. cannot be directly used for the training of a machine learning based system due to the unavoidable existence of missing data, inconsistencies and high dimensional feature space. Additionally, the individual features can contain quite different data types and ranges. For this reason, a data preprocessing step is nearly always necessary before the data can be used. This paper gives a short review of the typical methods applicable in the preprocessing and dimensionality reduction of raw data.

Download Full-text

Fundamental limitations of network reconstruction from temporal data

Journal of The Royal Society Interface ◽

10.1098/rsif.2016.0966 ◽

2017 ◽

Vol 14 (127) ◽

pp. 20160966 ◽

Cited By ~ 30

Author(s):

Marco Tulio Angulo ◽

Jaime A. Moreno ◽

Gabor Lippner ◽

Albert-László Barabási ◽

Yang-Yu Liu

Keyword(s):

Necessary Conditions ◽

Degree Sequence ◽

Network Reconstruction ◽

Interaction Matrix ◽

Temporal Data ◽

Sign Pattern ◽

Reconstruction Algorithms ◽

Reconstruction Problem ◽

Networked System ◽

Fundamental Limitations

Inferring properties of the interaction matrix that characterizes how nodes in a networked system directly interact with each other is a well-known network reconstruction problem. Despite a decade of extensive studies, network reconstruction remains an outstanding challenge. The fundamental limitations governing which properties of the interaction matrix (e.g. adjacency pattern, sign pattern or degree sequence) can be inferred from given temporal data of individual nodes remain unknown. Here, we rigorously derive the necessary conditions to reconstruct any property of the interaction matrix. Counterintuitively, we find that reconstructing any property of the interaction matrix is generically as difficult as reconstructing the interaction matrix itself, requiring equally informative temporal data. Revealing these fundamental limitations sheds light on the design of better network reconstruction algorithms that offer practical improvements over existing methods.

Download Full-text

Three dimensional deblurring of transmitted-light brightfield micrographs

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100148654 ◽

1993 ◽

Vol 51 ◽

pp. 564-565

Author(s):

Santosh Bhattacharyya

Keyword(s):

Image Reconstruction ◽

Three Dimensional ◽

Transmitted Light ◽

Reconstruction Algorithms ◽

3D Image Reconstruction ◽

Structural Details ◽

Spread Function ◽

Early Testing ◽

Linearized Model ◽

Image Reconstruction Algorithms

Three dimensional microscopic structures play an important role in the understanding of various biological and physiological phenomena. Structural details of neurons, such as the density, caliber and volumes of dendrites, are important in understanding physiological and pathological functioning of nervous systems. Even so, many of the widely used stains in biology and neurophysiology are absorbing stains, such as horseradish peroxidase (HRP), and yet most of the iterative, constrained 3D optical image reconstruction research has concentrated on fluorescence microscopy. It is clear that iterative, constrained 3D image reconstruction methodologies are needed for transmitted light brightfield (TLB) imaging as well. One of the difficulties in doing so, in the past, has been in determining the point spread function of the system.We have been developing several variations of iterative, constrained image reconstruction algorithms for TLB imaging. Some of our early testing with one of them was reported previously. These algorithms are based on a linearized model of TLB imaging.

Download Full-text