scholarly journals Revealing biases in the sampling of ecological interaction networks

2018 ◽  
Author(s):  
Marcus A. M. de Aguiar ◽  
Erica A. Newman ◽  
Mathias M. Pires ◽  
Justin D. Yeakel ◽  
David H. Hembry ◽  
...  

AbstractThe structure of ecological interactions is commonly understood through analyses of interaction networks. However, these analyses may be sensitive to sampling biases in both the interactors (the nodes of the network) and interactions (the links between nodes), because the detectability of species and their interactions is highly heterogeneous. These issues may affect the accuracy of empirically constructed ecological networks. Yet statistical biases introduced by sampling error are difficult to quantify in the absence of full knowledge of the underlying ecological network’s structure. To explore properties of large-scale modular networks, we developed EcoNetGen, which constructs and samples networks with predetermined topologies. These networks may represent a wide variety of communities that vary in size and types of ecological interactions. We sampled these networks with different sampling designs that may be employed in field observations. The observed networks generated by each sampling process were then analyzed with respect to the number of components, size of components and other network metrics. We show that the sampling effort needed to estimate underlying network properties accurately depends both on the sampling design and on the underlying network topology. In particular, networks with random or scale-free modules require more complete sampling to reveal their structure, compared to networks whose modules are nested or bipartite. Overall, the modules with nested structure were the easiest to detect, regardless of sampling design. Sampling according to species degree (number of interactions) was consistently found to be the most accurate strategy to estimate network structure. Conversely, sampling according to module (representing different interaction types or taxa) results in a rather complete view of certain modules, but fails to provide a complete picture of the underlying network. We recommend that these findings be incorporated into field sampling design of projects aiming to characterize large species interactions networks to reduce sampling biases.Author SummaryEcological interactions are commonly modeled as interaction networks. Analyses of such networks may be sensitive to sampling biases and detection issues in both the interactors and interactions (nodes and links). Yet, statistical biases introduced by sampling error are difficult to quantify in the absence of full knowledge of the underlying network’s structure. For insight into ecological networks, we developed software EcoNetGen (available in R and Python). These allow the generation and sampling of several types of large-scale modular networks with predetermined topologies, representing a wide variety of communities and types of ecological interactions. Networks can be sampled according to designs employed in field observations. We demonstrate, through first uses of this software, that underlying network topology interacts strongly with empirical sampling design, and that constructing empirical networks by starting with highly connected species may be the give the best representation of the underlying network.

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7566 ◽  
Author(s):  
Marcus A.M. de Aguiar ◽  
Erica A. Newman ◽  
Mathias M. Pires ◽  
Justin D. Yeakel ◽  
Carl Boettiger ◽  
...  

The structure of ecological interactions is commonly understood through analyses of interaction networks. However, these analyses may be sensitive to sampling biases with respect to both the interactors (the nodes of the network) and interactions (the links between nodes), because the detectability of species and their interactions is highly heterogeneous. These ecological and statistical issues directly affect ecologists’ abilities to accurately construct ecological networks. However, statistical biases introduced by sampling are difficult to quantify in the absence of full knowledge of the underlying ecological network’s structure. To explore properties of large-scale ecological networks, we developed the software EcoNetGen, which constructs and samples networks with predetermined topologies. These networks may represent a wide variety of communities that vary in size and types of ecological interactions. We sampled these networks with different mathematical sampling designs that correspond to methods used in field observations. The observed networks generated by each sampling process were then analyzed with respect to the number of components, size of components and other network metrics. We show that the sampling effort needed to estimate underlying network properties depends strongly both on the sampling design and on the underlying network topology. In particular, networks with random or scale-free modules require more complete sampling to reveal their structure, compared to networks whose modules are nested or bipartite. Overall, modules with nested structure were the easiest to detect, regardless of the sampling design used. Sampling a network starting with any species that had a high degree (e.g., abundant generalist species) was consistently found to be the most accurate strategy to estimate network structure. Because high-degree species tend to be generalists, abundant in natural communities relative to specialists, and connected to each other, sampling by degree may therefore be common but unintentional in empirical sampling of networks. Conversely, sampling according to module (representing different interaction types or taxa) results in a rather complete view of certain modules, but fails to provide a complete picture of the underlying network. To reduce biases introduced by sampling methods, we recommend that these findings be incorporated into field design considerations for projects aiming to characterize large species interaction networks.


2020 ◽  
Vol 15 (7) ◽  
pp. 750-757
Author(s):  
Jihong Wang ◽  
Yue Shi ◽  
Xiaodan Wang ◽  
Huiyou Chang

Background: At present, using computer methods to predict drug-target interactions (DTIs) is a very important step in the discovery of new drugs and drug relocation processes. The potential DTIs identified by machine learning methods can provide guidance in biochemical or clinical experiments. Objective: The goal of this article is to combine the latest network representation learning methods for drug-target prediction research, improve model prediction capabilities, and promote new drug development. Methods: We use large-scale information network embedding (LINE) method to extract network topology features of drugs, targets, diseases, etc., integrate features obtained from heterogeneous networks, construct binary classification samples, and use random forest (RF) method to predict DTIs. Results: The experiments in this paper compare the common classifiers of RF, LR, and SVM, as well as the typical network representation learning methods of LINE, Node2Vec, and DeepWalk. It can be seen that the combined method LINE-RF achieves the best results, reaching an AUC of 0.9349 and an AUPR of 0.9016. Conclusion: The learning method based on LINE network can effectively learn drugs, targets, diseases and other hidden features from the network topology. The combination of features learned through multiple networks can enhance the expression ability. RF is an effective method of supervised learning. Therefore, the Line-RF combination method is a widely applicable method.


Biology ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 107
Author(s):  
Apurva Badkas ◽  
Thanh-Phuong Nguyen ◽  
Laura Caberlotto ◽  
Jochen G. Schneider ◽  
Sébastien De Landtsheer ◽  
...  

A large percentage of the global population is currently afflicted by metabolic diseases (MD), and the incidence is likely to double in the next decades. MD associated co-morbidities such as non-alcoholic fatty liver disease (NAFLD) and cardiomyopathy contribute significantly to impaired health. MD are complex, polygenic, with many genes involved in its aetiology. A popular approach to investigate genetic contributions to disease aetiology is biological network analysis. However, data dependence introduces a bias (noise, false positives, over-publication) in the outcome. While several approaches have been proposed to overcome these biases, many of them have constraints, including data integration issues, dependence on arbitrary parameters, database dependent outcomes, and computational complexity. Network topology is also a critical factor affecting the outcomes. Here, we propose a simple, parameter-free method, that takes into account database dependence and network topology, to identify central genes in the MD network. Among them, we infer novel candidates that have not yet been annotated as MD genes and show their relevance by highlighting their differential expression in public datasets and carefully examining the literature. The method contributes to uncovering connections in the MD mechanisms and highlights several candidates for in-depth study of their contribution to MD and its co-morbidities.


2014 ◽  
Vol 2 (1) ◽  
pp. 26-65 ◽  
Author(s):  
MANUEL GOMEZ RODRIGUEZ ◽  
JURE LESKOVEC ◽  
DAVID BALDUZZI ◽  
BERNHARD SCHÖLKOPF

AbstractTime plays an essential role in the diffusion of information, influence, and disease over networks. In many cases we can only observe when a node is activated by a contagion—when a node learns about a piece of information, makes a decision, adopts a new behavior, or becomes infected with a disease. However, the underlying network connectivity and transmission rates between nodes are unknown. Inferring the underlying diffusion dynamics is important because it leads to new insights and enables forecasting, as well as influencing or containing information propagation. In this paper we model diffusion as a continuous temporal process occurring at different rates over a latent, unobserved network that may change over time. Given information diffusion data, we infer the edges and dynamics of the underlying network. Our model naturally imposes sparse solutions and requires no parameter tuning. We develop an efficient inference algorithm that uses stochastic convex optimization to compute online estimates of the edges and transmission rates. We evaluate our method by tracking information diffusion among 3.3 million mainstream media sites and blogs, and experiment with more than 179 million different instances of information spreading over the network in a one-year period. We apply our network inference algorithm to the top 5,000 media sites and blogs and report several interesting observations. First, information pathways for general recurrent topics are more stable across time than for on-going news events. Second, clusters of news media sites and blogs often emerge and vanish in a matter of days for on-going news events. Finally, major events, for example, large scale civil unrest as in the Libyan civil war or Syrian uprising, increase the number of information pathways among blogs, and also increase the network centrality of blogs and social media sites.


2017 ◽  
Author(s):  
Vladimir Gligorijević ◽  
Meet Barot ◽  
Richard Bonneau

AbstractThe prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provide a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that cannot capture complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity.AvailabilitydeepNF is freely available at: https://github.com/VGligorijevic/deepNF


2016 ◽  
Author(s):  
Kenta Suzuki ◽  
Katsuhiko Yoshida ◽  
Yumiko Nakanishi ◽  
Shinji Fukuda

AbstractMapping the network of ecological interactions is key to understanding the composition, stability, function and dynamics of microbial communities. In recent years various approaches have been used to reveal microbial interaction networks from metagenomic sequencing data, such as time-series analysis, machine learning and statistical techniques. Despite these efforts it is still not possible to capture details of the ecological interactions behind complex microbial dynamics.We developed the sparse S-map method (SSM), which generates a sparse interaction network from a multivariate ecological time-series without presuming any mathematical formulation for the underlying microbial processes. The advantage of the SSM over alternative methodologies is that it fully utilizes the observed data using a framework of empirical dynamic modelling. This makes the SSM robust to non-equilibrium dynamics and underlying complexity (nonlinearity) in microbial processes.We showed that an increase in dataset size or a decrease in observational error improved the accuracy of SSM whereas, the accuracy of a comparative equation-based method was almost unchanged for both cases and equivalent to the SSM at best. Hence, the SSM outperformed a comparative equation-based method when datasets were large and the magnitude of observational errors were small. The results were robust to the magnitude of process noise and the functional forms of inter-specific interactions that we tested. We applied the method to a microbiome data of six mice and found that there were different microbial interaction regimes between young to middle age (4-40 week-old) and middle to old age (36-72 week-old) mice.The complexity of microbial relationships impedes detailed equation-based modeling. Our method provides a powerful alternative framework to infer ecological interaction networks of microbial communities in various environments and will be improved by further developments in metagenomics sequencing technologies leading to increased dataset size and improved accuracy and precision.


2021 ◽  
Vol 4 ◽  
Author(s):  
Ondrej Vargovčík ◽  
Zuzana Čiamporová-Zaťovičová ◽  
Fedor Čiampor Jr

State of ecosystems and biodiversity protection are becoming the key interests for modern society due to climate change and negative human impacts (Leese 2018). Environmental changes in freshwaters are indicated also by benthic communities, especially in sensitive ecosystems like alpine lakes (Fjellheim 2009). Moreover, remoteness and isolation of alpine lakes make them a source of biodiversity, which is worth conserving (Hamerlík 2014). A promising tool for efficient large-scale monitoring of aquatic communities is DNA metabarcoding (Leese 2018). In this study, we applied metabarcoding to analyse macrozoobenthos of 12 lakes in the Tatra Mountains, using benthic bulk samples and eDNA filtered from water (Fig. 1). In compliance with recent publications, eDNA amplified with BF3/BR2 primers resulted in high percentage of non-invertebrate reads (Leese 2021). Based on in silico tests with the obtained sequences, we confirm that the recently developed EPTDr2n primer enables minimizing non-target amplification even with eDNA filtered from alpine-lake water (Elbrecht and Leese 2017). This ability is facilitated by 3’ end of the primer and we observed the two important mismatches in non-target sequences from our study (Leese 2021). Thus, our future analyses of eDNA (and bulk-sample fixative) will benefit from the new primer. Concerning bulk samples, a wide range of invertebrate taxa was assigned to the OTUs and they showed good congruence with previous studies using morphological determination (e.g. Krno 2006). Certain differences with (and among) the previous records per lake were observed, which could suggest ecological changes, but at the moment the influence of sampling error cannot be excluded. In eDNA, several taxa were congruent with the previous records, but their amount and read abundance was considerably lower due to non-target amplification. Apart from that, filling gaps in barcoding databases remains one of our priorities, as identification to species or genus level was not yet possible for some invertebrate OTUs. In addition, we subjected the NGS data to denoising and abundance-filtering in order to explore haplotype-level diversity (Andújar 2021). Although more comprehensive conclusions will be possible only after obtaining data from more lakes and years, already the two metabarcoding experiments presented here enabled us to efficiently detect within-species genetic diversity and identify a large variety of taxa, including groups that would otherwise be omitted or very challenging to identify. This underlines the potential of DNA methods to provide valuable ecological and biodiversity data across the tree of life for modern biomonitoring. This study was realized with support from VEGA 2/0030/17 and VEGA 2/0084/21.


2020 ◽  
Author(s):  
Diogo Borges Lima ◽  
Ying Zhu ◽  
Fan Liu

ABSTRACTSoftware tools that allow visualization and analysis of protein interaction networks are essential for studies in systems biology. One of the most popular network visualization tools in biology is Cytoscape, which offers a large selection of plugins for interpretation of protein interaction data. Chemical cross-linking coupled to mass spectrometry (XL-MS) is an increasingly important source for such interaction data, but there are currently no Cytoscape tools to analyze XL-MS results. In light of the suitability of Cytoscape platform but also to expand its toolbox, here we introduce XlinkCyNET, an open-source Cytoscape Java plugin for exploring large-scale XL-MS-based protein interaction networks. XlinkCyNET offers rapid and easy visualization of intra and intermolecular cross-links and the locations of protein domains in a rectangular bar style, allowing subdomain-level interrogation of the interaction network. XlinkCyNET is freely available from the Cytoscape app store: http://apps.cytoscape.org/apps/xlinkcynet and at https://www.theliulab.com/software/xlinkcynet.


2013 ◽  
Vol 42 (D1) ◽  
pp. D92-D97 ◽  
Author(s):  
Jun-Hao Li ◽  
Shun Liu ◽  
Hui Zhou ◽  
Liang-Hu Qu ◽  
Jian-Hua Yang

Sign in / Sign up

Export Citation Format

Share Document