scholarly journals Identification of active modules in interaction networks using node2vec network embedding

2021 ◽  
Author(s):  
Claude Pasquier ◽  
Vincent Guerlais ◽  
Denis Pallez ◽  
Raphael Rapetti-Mauss ◽  
Olivier Soriani

The identification of condition-specific gene sets from transcriptomic experiments is important to reveal regulatory and signaling mechanisms associated with a given cellular response. Statistical approaches using only expression data allow the identification of genes whose expression is most altered between different conditions. However, a phenotype is rarely a direct consequence of the activity of a single gene, but rather reflects the interplay of several genes to carry out certain molecular processes. Many methods have been proposed to analyze the activity of genes in light of our knowledge of their molecular interactions. However, existing methods have many limitations that make them of limited use to biologists: they detect modules that are too large, too small, or they require the users to specify a priori the size of the modules they are looking for. We propose AMINE (Active Module Identification through Network Embedding), an efficient method for the identification of active modules. Experiments carried out on artificial data sets show that the results obtained are more reliable than many available methods. Moreover, the size of the modules to be identified is not a fixed parameter of the method and does not need to be specified; rather, it adjusts according to the size of the modules to be found. The applications carried out on real datasets show that the method enables to find important genes already highlighted by approaches solely based on gene variations, but also to identify new groups of genes of high interest. In addition, AMINE method can be used as a web service on your own data (http://amine.i3s.unice.fr).

2020 ◽  
Author(s):  
Ali Hadizadeh Esfahani ◽  
Janina Maß ◽  
Asis Hallab ◽  
Bernhard M. Schuldt ◽  
David Nevarez ◽  
...  

AbstractGeneralization of transcriptomics results can be achieved by comparison across experiments, which is based on integration of interrelated transcriptomics studies into a compendium. Both characterization of the fate of the organism under study as well as distinguishing between generic and specific responses can be gained in such a broader context. We have built such a compendium for plant stress response, which is based on integrating publicly available data sets for plant stress response to generalize results across studies and extract the most robust and meaningful information possible from them.There are numerous methods and tools to analyze such data sets, most focusing on gene-wise dimension reduction of data to obtain marker genes and gene sets, e.g. for pathway analysis. Relying only on isolated biological modules might lead to missing of important confounders and relevant context. Therefore, we have chosen a different approach: Our novel tool, which we called Plant PhysioSpace, provides the ability to compute experimental conditions across species and platforms without a priori reducing the reference information to specific gene-sets. It extracts physiologically relevant signatures from a reference data set, a collection of public data sets, by integrating and transforming heterogeneous reference gene expression data into a set of physiology-specific patterns, called PhysioSpace. New experimental data can be mapped to these PhysioSpaces, resulting in similarity scores, providing quantitative similarity of the new experiment to an a priori compendium.Here we report the implementation of two R packages, one software and one data package, and a shiny web application, which provides plant biologists convenient ways to access the method and a precomputed compendium of more than 900 PhysioSpace basis vectors from 4 different species (Arabidopsis thaliana, Oryza sativa, Glycine max, and Triticum aestivum).The tool reduces the dimensionality of data sample-wise (and not gene-wise), which results in a vector containing all genes. This method is very robust against noise and change of platform while still being sensitive. Plant PhysioSpace can therefore be used as an inter-species or cross-platform similarity measure. We demonstrate that Plant PhysioSpace can successfully translate stress responses between different species and platforms (including single cell technologies).


2015 ◽  
Vol 32 (6) ◽  
pp. 943-945 ◽  
Author(s):  
Wentao Yang ◽  
Katja Dierking ◽  
Hinrich Schulenburg

Abstract Motivation: A particular challenge of the current omics age is to make sense of the inferred differential expression of genes and proteins. The most common approach is to perform a gene ontology (GO) enrichment analysis, thereby relying on a database that has been extracted from a variety of organisms and that can therefore only yield reliable information on evolutionary conserved functions. Results: We here present a web-based application for a taxon-specific gene set exploration and enrichment analysis, which is expected to yield novel functional insights into newly determined gene sets. The approach is based on the complete collection of curated high-throughput gene expression data sets for the model nematode Caenorhabditis elegans, including 1786 gene sets from more than 350 studies. Availability and implementation: WormExp is available at http://wormexp.zoologie.uni-kiel.de. Contacts: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


2021 ◽  
pp. 000276422110216
Author(s):  
Kazimierz M. Slomczynski ◽  
Irina Tomescu-Dubrow ◽  
Ilona Wysmulek

This article proposes a new approach to analyze protest participation measured in surveys of uneven quality. Because single international survey projects cover only a fraction of the world’s nations in specific periods, researchers increasingly turn to ex-post harmonization of different survey data sets not a priori designed as comparable. However, very few scholars systematically examine the impact of the survey data quality on substantive results. We argue that the variation in source data, especially deviations from standards of survey documentation, data processing, and computer files—proposed by methodologists of Total Survey Error, Survey Quality Monitoring, and Fitness for Intended Use—is important for analyzing protest behavior. In particular, we apply the Survey Data Recycling framework to investigate the extent to which indicators of attending demonstrations and signing petitions in 1,184 national survey projects are associated with measures of data quality, controlling for variability in the questionnaire items. We demonstrate that the null hypothesis of no impact of measures of survey quality on indicators of protest participation must be rejected. Measures of survey documentation, data processing, and computer records, taken together, explain over 5% of the intersurvey variance in the proportions of the populations attending demonstrations or signing petitions.


2021 ◽  
pp. 002203452110120
Author(s):  
C. Gluck ◽  
S. Min ◽  
A. Oyelakin ◽  
M. Che ◽  
E. Horeth ◽  
...  

The parotid, submandibular, and sublingual glands represent a trio of oral secretory glands whose primary function is to produce saliva, facilitate digestion of food, provide protection against microbes, and maintain oral health. While recent studies have begun to shed light on the global gene expression patterns and profiles of salivary glands, particularly those of mice, relatively little is known about the location and identity of transcriptional control elements. Here we have established the epigenomic landscape of the mouse submandibular salivary gland (SMG) by performing chromatin immunoprecipitation sequencing experiments for 4 key histone marks. Our analysis of the comprehensive SMG data sets and comparisons with those from other adult organs have identified critical enhancers and super-enhancers of the mouse SMG. By further integrating these findings with complementary RNA-sequencing based gene expression data, we have unearthed a number of molecular regulators such as members of the Fox family of transcription factors that are enriched and likely to be functionally relevant for SMG biology. Overall, our studies provide a powerful atlas of cis-regulatory elements that can be leveraged for better understanding the transcriptional control mechanisms of the mouse SMG, discovery of novel genetic switches, and modulating tissue-specific gene expression in a targeted fashion.


2015 ◽  
Vol 2015 ◽  
pp. 1-13
Author(s):  
Jianwei Ding ◽  
Yingbo Liu ◽  
Li Zhang ◽  
Jianmin Wang

Condition monitoring systems are widely used to monitor the working condition of equipment, generating a vast amount and variety of telemetry data in the process. The main task of surveillance focuses on analyzing these routinely collected telemetry data to help analyze the working condition in the equipment. However, with the rapid increase in the volume of telemetry data, it is a nontrivial task to analyze all the telemetry data to understand the working condition of the equipment without any a priori knowledge. In this paper, we proposed a probabilistic generative model called working condition model (WCM), which is capable of simulating the process of event sequence data generated and depicting the working condition of equipment at runtime. With the help of WCM, we are able to analyze how the event sequence data behave in different working modes and meanwhile to detect the working mode of an event sequence (working condition diagnosis). Furthermore, we have applied WCM to illustrative applications like automated detection of an anomalous event sequence for the runtime of equipment. Our experimental results on the real data sets demonstrate the effectiveness of the model.


2014 ◽  
Vol 14 (23) ◽  
pp. 12613-12629 ◽  
Author(s):  
P. Eriksson ◽  
B. Rydberg ◽  
H. Sagawa ◽  
M. S. Johnston ◽  
Y. Kasai

Abstract. Retrievals of cloud ice mass and humidity from the Superconducting Submillimeter-Wave Limb-Emission Sounder (SMILES) and the Odin-SMR (Sub-Millimetre Radiometer) limb sounder are presented and example applications of the data are given. SMILES data give an unprecedented view of the diurnal variation of cloud ice mass. Mean regional diurnal cycles are reported and compared to some global climate models. Some improvements in the models regarding diurnal timing and relative amplitude were noted, but the models' mean ice mass around 250 hPa is still low compared to the observations. The influence of the ENSO (El Niño–Southern Oscillation) state on the upper troposphere is demonstrated using 12 years of Odin-SMR data. The same retrieval scheme is applied for both sensors, and gives low systematic differences between the two data sets. A special feature of this Bayesian retrieval scheme, of Monte Carlo integration type, is that values are produced for all measurements but for some atmospheric states retrieved values only reflect a priori assumptions. However, this "all-weather" capability allows a direct statistical comparison to model data, in contrast to many other satellite data sets. Another strength of the retrievals is the detailed treatment of "beam filling" that otherwise would cause large systematic biases for these passive cloud ice mass retrievals. The main retrieval inputs are spectra around 635/525 GHz from tangent altitudes below 8/9 km for SMILES/Odin-SMR, respectively. For both sensors, the data cover the upper troposphere between 30° S and 30° N. Humidity is reported as both relative humidity and volume mixing ratio. The vertical coverage of SMILES is restricted to a single layer, while Odin-SMR gives some profiling capability between 300 and 150 hPa. Ice mass is given as the partial ice water path above 260 hPa, but for Odin-SMR ice water content, estimates are also provided. Besides a smaller contrast between most dry and wet cases, the agreement with Aura MLS (Microwave Limb Sounder) humidity data is good. In terms of tropical mean humidity, all three data sets agree within 3.5 %RHi. Mean ice mass is about a factor of 2 lower compared to CloudSat. This deviation is caused by the fact that different particle size distributions are assumed, combined with saturation and a priori influences in the SMILES and Odin-SMR data.


2020 ◽  
Author(s):  
Shen Pan ◽  
Yunhong Zhan ◽  
Xiaonan Chen ◽  
Bin Wu ◽  
Bitian Liu

Abstract Background T1G3 shows a higher chance of recurrence and progression among early bladder cancer types and the available treatment option is controversial. High recurrence and progression are the problems that need to be explored and solved. Changes in the internal signals of bladder cancer cells and differential genes may be the root cause of these problems. Methods GSE120736, GSE19915, GSE19423, GSE32548 and GSE37815 datasets were obtained from Gene Expression Omnibus (GEO ) to identify differentially expressed genes (DEGs). Bladder cancer transcript data from The Cancer Genome Atlas (TCGA) were clustered into different cell-specific gene sets according to weighted gene co-expression network analysis (WGCNA). Multiple sets of databases were used for gene expression comparison, functional enrichment, and protein interaction analysis, including The Human Protein Atlas, Cancer Dependency Map, Metascape, Gene set enrichment analysis, and DisNor. Results DEGs were obtained through GEO data comparison and intersection. After WGCNA was proven to recognise cell-specific gene sets, candidate DEGs were selected and shown to be specifically expressed in cancer cells. Candidate DEGs were related to mitosis and cell cycle. Further, 12 functional candidate markers were identified from the sequencing data of 30 bladder cancer cell lines. These genes were all up-regulated and previously shown to be closely related to bladder cancer progression. Conclusions Twelve functional genes with specific differential expression in bladder cancer cells were identified. WGCNA can identify the relatively specific expression sets of different cells in bladder cancer with greater tumour heterogeneity, which provides new perspectives for future cancer research.


2021 ◽  
Vol 8 ◽  
Author(s):  
Alba Frias-De-Diego ◽  
Manuel Jara ◽  
Brittany M. Pecoraro ◽  
Elisa Crisci

Diversity, ecology, and evolution of viruses are commonly determined through phylogenetics, an accurate tool for the identification and study of lineages with different pathological characteristics within the same species. In the case of PRRSV, evolutionary research has divided into two main branches based on the use of a specific gene (i.e., ORF5) or whole genome sequences as the input used to produce the phylogeny. In this study, we performed a review on PRRSV phylogenetic literature and characterized the spatiotemporal trends in research of single gene vs. whole genome evolutionary approaches. Finally, using publicly available data, we produced a Bayesian phylodynamic analysis following each research branch and compared the results to determine the pros and cons of each particular approach. This study provides an exploration of the two main phylogenetic research lines applied for PRRSV evolution, as well as an example of the differences found when both methods are applied to the same database. We expect that our results will serve as a guidance for future PRRSV phylogenetic research.


2021 ◽  
Author(s):  
Kezia Lange ◽  
Andreas C. Meier ◽  
Michel Van Roozendael ◽  
Thomas Wagner ◽  
Thomas Ruhtz ◽  
...  

<p>Airborne imaging DOAS and ground-based stationary and mobile DOAS measurements were conducted during the ESA funded S5P-VAL-DE-Ruhr campaign in September 2020 in the Ruhr area. The Ruhr area is located in Western Germany and is a pollution hotspot in Europe with urban character as well as large industrial emitters. The measurements are used to validate data from the Sentinel-5P TROPOspheric Monitoring Instrument (TROPOMI) with focus on the NO<sub>2</sub> tropospheric vertical column product.</p><p>Seven flights were performed with the airborne imaging DOAS instrument, AirMAP, providing continuous maps of NO<sub>2</sub> in the layers below the aircraft. These flights cover many S5P ground pixels within an area of about 40 km side length and were accompanied by ground-based stationary measurements and three mobile car DOAS instruments. Stationary measurements were conducted by two Pandora, two zenith-sky and two MAX-DOAS instruments distributed over three target areas, partly as long-term measurements over a one-year period.</p><p>Airborne and ground-based measurements were compared to evaluate the representativeness of the measurements in time and space. With a resolution of about 100 x 30 m<sup>2</sup>, the AirMAP data creates a link between the ground-based and the TROPOMI measurements with a resolution of 3.5 x 5.5 km<sup>2</sup> and is therefore well suited to validate TROPOMI's tropospheric NO<sub>2</sub> vertical column.</p><p>The measurements on the seven flight days show strong variability depending on the different target areas, the weekday and meteorological conditions. We found an overall low bias of the TROPOMI operational NO<sub>2</sub> data for all three target areas but with varying magnitude for different days. The campaign data set is compared to custom TROPOMI NO<sub>2</sub> products, using different auxiliary data, such as albedo or a priori vertical profiles to evaluate the influence on the TROPOMI data product. Analyzing and comparing the different data sets provides more insight into the high spatial and temporal heterogeneity in NO<sub>2</sub> and its impact on satellite observations and their validation.</p>


Sign in / Sign up

Export Citation Format

Share Document