scholarly journals Integrative Analysis of Gene Networks and Their Application to Lung Adenocarcinoma Studies

2017 ◽  
Vol 16 ◽  
pp. 117693511769077
Author(s):  
Sangin Lee ◽  
Faming Liang ◽  
Ling Cai ◽  
Guanghua Xiao

The construction of gene regulatory networks (GRNs) is an essential component of biomedical research to determine disease mechanisms and identify treatment targets. Gaussian graphical models (GGMs) have been widely used for constructing GRNs by inferring conditional dependence among a set of gene expressions. In practice, GRNs obtained by the analysis of a single data set may not be reliable due to sample limitations. Therefore, it is important to integrate multiple data sets from comparable studies to improve the construction of a GRN. In this article, we introduce an equivalent measure of partial correlation coefficients in GGMs and then extend the method to construct a GRN by combining the equivalent measures from different sources. Furthermore, we develop a method for multiple data sets with a natural missing mechanism to accommodate the differences among different platforms in multiple sources of data. Simulation results show that this integrative analysis outperforms the standard methods and can detect hub genes in the true network. The proposed integrative method was applied to 12 lung adenocarcinoma data sets collected from different studies. The constructed network is consistent with the current biological knowledge and reveals new insights about lung adenocarcinoma.

Author(s):  
Chris Goller ◽  
James Simek ◽  
Jed Ludlow

The purpose of this paper is to present a non-traditional pipeline mechanical damage ranking system using multiple-data-set in-line inspection (ILI) tools. Mechanical damage continues to be a major factor in reportable incidents for hazardous liquid and gas pipelines. While several ongoing programs seek to limit damage incidents through public awareness, encroachment monitoring, and one-call systems, others have focused efforts on the quantification of mechanical damage severity through modeling, the use of ILI tools, and subsequent feature assessment at locations selected for excavation. Current generation ILI tools capable of acquiring multiple-data-sets in a single survey may provide an improved assessment of the severity of damaged zones using methods developed in earlier research programs as well as currently reported information. For magnetic flux leakage (MFL) type tools, using multiple field levels, varied field directions, and high accuracy deformation sensors enables detection and provides the data necessary for enhanced severity assessments. This paper will provide a review of multiple-data-set ILI results from several pipe joints with simulated mechanical damage locations created mimicing right-of-way encroachment events in addition to field results from ILI surveys using multiple-data-set tools.


2018 ◽  
Vol 11 (7) ◽  
pp. 4239-4260 ◽  
Author(s):  
Richard Anthes ◽  
Therese Rieckh

Abstract. In this paper we show how multiple data sets, including observations and models, can be combined using the “three-cornered hat” (3CH) method to estimate vertical profiles of the errors of each system. Using data from 2007, we estimate the error variances of radio occultation (RO), radiosondes, ERA-Interim, and Global Forecast System (GFS) model data sets at four radiosonde locations in the tropics and subtropics. A key assumption is the neglect of error covariances among the different data sets, and we examine the consequences of this assumption on the resulting error estimates. Our results show that different combinations of the four data sets yield similar relative and specific humidity, temperature, and refractivity error variance profiles at the four stations, and these estimates are consistent with previous estimates where available. These results thus indicate that the correlations of the errors among all data sets are small and the 3CH method yields realistic error variance profiles. The estimated error variances of the ERA-Interim data set are smallest, a reasonable result considering the excellent model and data assimilation system and assimilation of high-quality observations. For the four locations studied, RO has smaller error variances than radiosondes, in agreement with previous studies. Part of the larger error variance of the radiosondes is associated with representativeness differences because radiosondes are point measurements, while the other data sets represent horizontal averages over scales of ∼ 100 km.


2021 ◽  
Author(s):  
By Huan Chen ◽  
Brian Caffo ◽  
Genevieve Stein-O’Brien ◽  
Jinrui Liu ◽  
Ben Langmead ◽  
...  

SummaryIntegrative analysis of multiple data sets has the potential of fully leveraging the vast amount of high throughput biological data being generated. In particular such analysis will be powerful in making inference from publicly available collections of genetic, transcriptomic and epigenetic data sets which are designed to study shared biological processes, but which vary in their target measurements, biological variation, unwanted noise, and batch variation. Thus, methods that enable the joint analysis of multiple data sets are needed to gain insights into shared biological processes that would otherwise be hidden by unwanted intra-data set variation. Here, we propose a method called two-stage linked component analysis (2s-LCA) to jointly decompose multiple biologically related experimental data sets with biological and technological relationships that can be structured into the decomposition. The consistency of the proposed method is established and its empirical performance is evaluated via simulation studies. We apply 2s-LCA to jointly analyze four data sets focused on human brain development and identify meaningful patterns of gene expression in human neurogenesis that have shared structure across these data sets. The code to conduct 2s-LCA has been complied into an R package “PJD”, which is available at https://github.com/CHuanSite/PJD.


Metabolomics ◽  
2019 ◽  
Vol 16 (1) ◽  
Author(s):  
Masoumeh Alinaghi ◽  
Hanne Christine Bertram ◽  
Anders Brunse ◽  
Age K. Smilde ◽  
Johan A. Westerhuis

Abstract Introduction Integrative analysis of multiple data sets can provide complementary information about the studied biological system. However, data fusion of multiple biological data sets can be complicated as data sets might contain different sources of variation due to underlying experimental factors. Therefore, taking the experimental design of data sets into account could be of importance in data fusion concept. Objectives In the present work, we aim to incorporate the experimental design information in the integrative analysis of multiple designed data sets. Methods Here we describe penalized exponential ANOVA simultaneous component analysis (PE-ASCA), a new method for integrative analysis of data sets from multiple compartments or analytical platforms with the same underlying experimental design. Results Using two simulated cases, the result of simultaneous component analysis (SCA), penalized exponential simultaneous component analysis (P-ESCA) and ANOVA-simultaneous component analysis (ASCA) are compared with the proposed method. Furthermore, real metabolomics data obtained from NMR analysis of two different brains tissues (hypothalamus and midbrain) from the same piglets with an underlying experimental design is investigated by PE-ASCA. Conclusions This method provides an improved understanding of the common and distinct variation in response to different experimental factors.


2007 ◽  
Vol 3 ◽  
pp. 117693510700300 ◽  
Author(s):  
Xutao Deng ◽  
Huimin Geng ◽  
Hesham H. Ali

Many studies showed inconsistent cancer biomarkers due to bioinformatics artifacts. In this paper we use multiple data sets from microarrays, mass spectrometry, protein sequences, and other biological knowledge in order to improve the reliability of cancer biomarkers. We present a novel Bayesian network (BN) model which integrates and cross-annotates multiple data sets related to prostate cancer. The main contribution of this study is that we provide a method that is designed to find cancer biomarkers whose presence is supported by multiple data sources and biological knowledge. Relevant biological knowledge is explicitly encoded into the model parameters, and the biomarker finding problem is formulated as a Bayesian inference problem. Besides diagnostic accuracy, we introduce reliability as another quality measurement of the biological relevance of biomarkers. Based on the proposed BN model, we develop an empirical scoring scheme and a simulation algorithm for inferring biomarkers. Fourteen genes/proteins including prostate specific antigen (PSA) are identified as reliable serum biomarkers which are insensitive to the model assumptions. The computational results show that our method is able to find biologically relevant biomarkers with highest reliability while maintaining competitive predictive power. In addition, by combining biological knowledge and data from multiple platforms, the number of putative biomarkers is greatly reduced to allow more-focused clinical studies.


Author(s):  
Ping Li ◽  
Hua-Liang Wei ◽  
Stephen A. Billings ◽  
Michael A. Balikhin ◽  
Richard Boynton

A basic assumption on the data used for nonlinear dynamic model identification is that the data points are continuously collected in chronological order. However, there are situations in practice where this assumption does not hold and we end up with an identification problem from multiple data sets. The problem is addressed in this paper and a new cross-validation-based orthogonal search algorithm for NARMAX model identification from multiple data sets is proposed. The algorithm aims at identifying a single model from multiple data sets so as to extend the applicability of the standard method in the cases, such as the data sets for identification are obtained from multiple tests or a series of experiments, or the data set is discontinuous because of missing data points. The proposed method can also be viewed as a way to improve the performance of the standard orthogonal search method for model identification by making full use of all the available data segments in hand. Simulated and real data are used in this paper to illustrate the operation and to demonstrate the effectiveness of the proposed method.


Author(s):  
Pengcheng Zeng ◽  
Jiaxuan Wangwu ◽  
Zhixiang Lin

Abstract Unsupervised methods, such as clustering methods, are essential to the analysis of single-cell genomic data. The most current clustering methods are designed for one data type only, such as single-cell RNA sequencing (scRNA-seq), single-cell ATAC sequencing (scATAC-seq) or sc-methylation data alone, and a few are developed for the integrative analysis of multiple data types. The integrative analysis of multimodal single-cell genomic data sets leverages the power in multiple data sets and can deepen the biological insight. In this paper, we propose a coupled co-clustering-based unsupervised transfer learning algorithm (coupleCoC) for the integrative analysis of multimodal single-cell data. Our proposed coupleCoC builds upon the information theoretic co-clustering framework. In co-clustering, both the cells and the genomic features are simultaneously clustered. Clustering similar genomic features reduces the noise in single-cell data and facilitates transfer of knowledge across single-cell datasets. We applied coupleCoC for the integrative analysis of scATAC-seq and scRNA-seq data, sc-methylation and scRNA-seq data and scRNA-seq data from mouse and human. We demonstrate that coupleCoC improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. Our method coupleCoC is also computationally efficient and can scale up to large datasets. Availability: The software and datasets are available at https://github.com/cuhklinlab/coupleCoC.


2020 ◽  
Vol 493 (1) ◽  
pp. 48-54
Author(s):  
Chris Koen

ABSTRACT Large monitoring campaigns, particularly those using multiple filters, have produced replicated time series of observations for literally millions of stars. The search for periodicities in such replicated data can be facilitated by comparing the periodograms of the various time series. In particular, frequency spectra can be searched for common peaks. The sensitivity of this procedure to various parameters (e.g. the time base of the data, length of the frequency interval searched, number of replicate series, etc.) is explored. Two additional statistics that could sharpen results are also discussed: the closeness (in frequency) of peaks identified as common to all data sets, and the sum of the ranks of the peaks. Analytical expressions for the distributions of these two statistics are presented. The method is illustrated by showing that a ‘dubious’ periodicity in an 'Asteroid Terrestrial-impact Last Alert System' data set is highly significant.


2018 ◽  
Author(s):  
James M. Holton

AbstractA synthetic data set demonstrating a particularly challenging case of indexing ambiguity in the context of radiation damage was generated in order to serve as a standard benchmark and reference point for the ongoing development of new methods and new approaches to solving this problem. Of the 100 short wedges of data only the first 71 are currently necessary to solve the structure by “cheating”, or using the correct reference structure as a guide. The total wall-clock time and number of wedges required to solve the structure without cheating is proposed as a metric for the efficacy and efficiency of a given multi-crystal automation pipeline.SynopsisA synthetic dataset demonstrating the challenges of combining multiple data sets with indexing ambiguity in the context of heavy radiation damage in multi-crystal macromolecular crystallography was generated and described, and the problems encountered using contemporary data processing programs were summarized.


Sign in / Sign up

Export Citation Format

Share Document