scholarly journals MultiPaths: a Python framework for analyzing multi-layer biological networks using diffusion algorithms

Author(s):  
Josep Marín-Llaó ◽  
Sarah Mubeen ◽  
Alexandre Perera-Lluna ◽  
Martin Hofmann-Apitius ◽  
Sergio Picart-Armada ◽  
...  

Abstract Summary High-throughput screening yields vast amounts of biological data which can be highly challenging to interpret. In response, knowledge-driven approaches emerged as possible solutions to analyze large datasets by leveraging prior knowledge of biomolecular interactions represented in the form of biological networks. Nonetheless, given their size and complexity, their manual investigation quickly becomes impractical. Thus, computational approaches, such as diffusion algorithms, are often employed to interpret and contextualize the results of high-throughput experiments. Here, we present MultiPaths, a framework consisting of two independent Python packages for network analysis. While the first package, DiffuPy, comprises numerous commonly used diffusion algorithms applicable to any generic network, the second, DiffuPath, enables the application of these algorithms on multi-layer biological networks. To facilitate its usability, the framework includes a command line interface, reproducible examples and documentation. To demonstrate the framework, we conducted several diffusion experiments on three independent multi-omics datasets over disparate networks generated from pathway databases, thus, highlighting the ability of multi-layer networks to integrate multiple modalities. Finally, the results of these experiments demonstrate how the generation of harmonized networks from disparate databases can improve predictive performance with respect to individual resources. Availability and implementation DiffuPy and DiffuPath are publicly available under the Apache License 2.0 at https://github.com/multipaths. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Author(s):  
Josep Marín-Llaó ◽  
Sarah Mubeen ◽  
Alexandre Perera-Lluna ◽  
Martin Hofmann-Apitius ◽  
Sergio Picart-Armada ◽  
...  

AbstractSummaryHigh-throughput screening yields vast amounts of biological data which can be highly challenging to interpret. In response, knowledge-driven approaches emerged as possible solutions to analyze large datasets by leveraging prior knowledge of biomolecular interactions represented in the form of biological networks. Nonetheless, given their size and complexity, their manual investigation quickly becomes impractical. Thus, computational approaches, such as diffusion algorithms, are often employed to interpret and contextualize the results of high-throughput experiments. Here, we present MultiPaths, a framework consisting of two independent Python packages for network analysis. While the first package, DiffuPy, comprises numerous state-of-the-art diffusion algorithms applicable to any generic network, the second, DiffuPath, enables the application of these algorithms on multi-layer biological networks. To facilitate its usability, the framework includes a command line interface, reproducible examples, and documentation. To demonstrate the framework, we conducted several diffusion experiments on three independent multi-omics datasets over disparate networks generated from pathway databases, thus, highlighting the ability of multi-layer networks to integrate multiple modalities. Finally, the results of these experiments demonstrate how the generation of harmonized networks from disparate databases can improve predictive performance with respect to individual resources.AvailabilityDiffuPy and DiffuPath are publicly available under the Apache License 2.0 at https://github.com/[email protected] and [email protected]


Blood ◽  
2008 ◽  
Vol 112 (11) ◽  
pp. sci-51-sci-51
Author(s):  
Todd R. Golub

Genomics holds particular potential for the elucidation of biological networks that underlie disease. For example, gene expression profiles have been used to classify human cancers, and have more recently been used to predict graft rejection following organ transplantation. Such signatures thus hold promise both as diagnostic approaches and as tools with which to dissect biological mechanism. Such systems-based approaches are also beginning to impact the drug discovery process. For example, it is now feasible to measure gene expression signatures at low cost and high throughput, thereby allowing for the screening libraries of small molecule libraries in order to identify compounds capable of perturbing a signature of interest (even if the critical drivers of that signature are not yet known). This approach, known as Gene Expression-Based High Throughput Screening (GE-HTS), has been shown to identify candidate therapeutic approaches in AML, Ewing sarcoma, and neuroblastoma, and has identified tool compounds capable of inhibiting PDGF receptor signaling. A related approach, known as the Connectivity Map (www.broad.mit.edu/cmap) attempts to use gene expression profiles as a universal language with which to connect cellular states, gene product function, and drug action. In this manner, a gene expression signature of interest is used to computationally query a database of gene expression profiles of cells systematically treated with a large number of compounds (e.g., all off-patent FDA-approved drugs), thereby identifying potential new applications for existing drugs. Such systems level approaches thus seek chemical modulators of cellular states, even when the molecular basis of such altered states is unknown.


Author(s):  
Xiaohua Douglas Zhang ◽  
Dandan Wang ◽  
Shixue Sun ◽  
Heping Zhang

Abstract Motivation High-throughput screening (HTS) is a vital automation technology in biomedical research in both industry and academia. The well-known Z-factor has been widely used as a gatekeeper to assure assay quality in an HTS study. However, many researchers and users may not have realized that Z-factor has major issues. Results In this article, the following four major issues are explored and demonstrated so that researchers may use the Z-factor appropriately. First, the Z-factor violates the Pythagorean theorem of statistics. Second, there is no adjustment of sampling error in the application of the Z-factor for quality control (QC) in HTS studies. Third, the expectation of the sample-based Z-factor does not exist. Fourth, the thresholds in the Z-factor-based criterion lack a theoretical basis. Here, an approach to avoid these issues was proposed and new QC criteria under homoscedasticity were constructed so that researchers can choose a statistically grounded criterion for QC in the HTS studies. We implemented this approach in an R package and demonstrated its utility in multiple CRISPR/CAS9 or siRNA HTS studies. Availability and implementation The R package qcSSMDhomo is freely available from GitHub: https://github.com/Karena6688/qcSSMDhomo. The file qcSSMDhomo_1.0.0.tar.gz (for Windows) containing qcSSMDhomo is also available at Bioinformatics online. qcSSMDhomo is distributed under the GNU General Public License. Supplementary information Supplementary data are available at Bioinformatics online.


2005 ◽  
Vol 10 (5) ◽  
pp. 419-426 ◽  
Author(s):  
Tudor I. Oprea ◽  
Cristian G. Bologa ◽  
Bruce S. Edwards ◽  
Eric R. Prossnitz ◽  
Larry A. Sklar

An empirical scheme to evaluate and prioritize screening hits from high-throughput screening (HTS) is proposed. Negative scores are given when chemotypes found in the HTS hits are present in annotated databases such as MDDR and WOMBAT or for testing positive in toxicity-related experiments reported in TOXNET. Positive scores were given for higher measured biological activities, for testing negative in toxicity-related literature, and for good overlap when profiled against drug-related properties. Particular emphasis is placed on estimating aqueous solubility to prioritize in vivo experiments. This empirical scheme is given as an illustration to assist the decision-making process in selecting chemotypes and individual compounds for further experimentation, when confronted with multiple hits from high-throughput experiments. The decision-making process is discussed for a set of G-protein coupled receptor antagonists and validated on a literature example for dihydrofolate reductase inhibition.


2003 ◽  
Vol 8 (1) ◽  
pp. 19-33 ◽  
Author(s):  
Ulrich Haupts ◽  
Martin Rüdiger ◽  
Stephen Ashman ◽  
Sandra Turconi ◽  
Ryan Bingham ◽  
...  

Single-molecule detection technologies are becoming a powerful readout format to support ultra-high-throughput screening. These methods are based on the analysis of fluorescence intensity fluctuations detected from a small confocal volume element. The fluctuating signal contains information about the mass and brightness of the different species in a mixture. The authors demonstrate a number of applications of fluorescence intensity distribution analysis (FIDA), which discriminates molecules by their specific brightness. Examples for assays based on brightness changes induced by quenching/dequenching of fluorescence, fluorescence energy transfer, and multiple-binding stoichiometry are given for important drug targets such as kinases and proteases. FIDA also provides a powerful method to extract correct biological data in the presence of compound fluorescence. ( Journal of Biomolecular Screening 2003:19-33)


2017 ◽  
Vol 22 (6) ◽  
pp. 655-666 ◽  
Author(s):  
Yanli Wang ◽  
Tiejun Cheng ◽  
Stephen H. Bryant

High-throughput screening (HTS) is now routinely conducted for drug discovery by both pharmaceutical companies and screening centers at academic institutions and universities. Rapid advance in assay development, robot automation, and computer technology has led to the generation of terabytes of data in screening laboratories. Despite the technology development toward HTS productivity, fewer efforts were devoted to HTS data integration and sharing. As a result, the huge amount of HTS data was rarely made available to the public. To fill this gap, the PubChem BioAssay database ( https://www.ncbi.nlm.nih.gov/pcassay/ ) was set up in 2004 to provide open access to the screening results tested on chemicals and RNAi reagents. With more than 10 years’ development and contributions from the community, PubChem has now become the largest public repository for chemical structures and biological data, which provides an information platform to worldwide researchers supporting drug development, medicinal chemistry study, and chemical biology research. This work presents a review of the HTS data content in the PubChem BioAssay database and the progress of data deposition to stimulate knowledge discovery and data sharing. It also provides a description of the database’s data standard and basic utilities facilitating information access and use for new users.


2008 ◽  
Vol 13 (6) ◽  
pp. 443-448 ◽  
Author(s):  
Lorenz M. Mayr ◽  
Peter Fuerst

High-throughput screening (HTS) is a well-established process in lead discovery for pharma and biotech companies and is now also being set up for basic and applied research in academia and some research hospitals. Since its first advent in the early to mid-1990s, the field of HTS has seen not only a continuous change in technology and processes but also an adaptation to various needs in lead discovery. HTS has now evolved into a quite mature discipline of modern drug discovery. Whereas in previous years, much emphasis has been put toward a steady increase in capacity (“quantitative increase”) via various strategies in the fields of automation and miniaturization, the past years have seen a steady shift toward higher content and quality (“quality increase”) for these biological test systems. Today, many experts in the field see HTS at the crossroads with the need to decide either toward further increase in throughput or more focus toward relevance of biological data. In this article, the authors describe the development of HTS over the past decade and point out their own ideas for future directions of HTS in biomedical research. They predict that the trend toward further miniaturization will slow down with the implementation of 384-well, 1536-well, and 384 low-volume-well plates. The authors predict that, ultimately, each hit-finding strategy will be much more project related, tailor-made, and better integrated into the broader drug discovery efforts. ( Journal of Biomolecular Screening 2008:443-448)


2016 ◽  
Author(s):  
Aaron Wise ◽  
Murat Can Cobanoglu

AbstractMotivation: Cancer is a complex and evolving disease, making it difficult to discover effective treatments. Traditional drug discovery relies on high-throughput screening on reductionist models in order to enable the testing of 105 or 106 compounds. These assays lack the complexity of the human disease. Functional assays overcome this limitation by testing drugs on human tumors, however they can only test few drugs, and remain restricted to diagnostic use. An algorithm that identifies hits with fewer experiments could enable the use of functional assays for de novo drug discovery.Results: We developed a novel approach that we termed ‘algorithmic ideation’ (AI) to select experiments, and demonstrated that this approach discovers hits 104 times more effectively than brute-force screening. The algorithm trains on known drug-target-disease associations assembled as a tensor, built from the (public) TCGA and STITCH databases and predicts novel associations. We evaluated our tensor completion approach using a temporal cutoff with data prior to 2012 used as training data, and data from 2012 to 2015 used as testing data. Our approach achieved 104-fold more efficient hit discovery than the traditional brute-force high-throughput screening. We further tested the method in a sparse, low data regime by removing up to 90% of the training data, and demonstrated the robustness of the approach. Finally we test predictive performance on drugs with no previously known interactions, and the algorithm demonstrates 103-fold improvement in this challenging problem. Thus algorithmic ideation can potentially enable targeted antineoplastic discovery on functional assays.Availability: Freely accessible at https://bitbucket.org/aiinc/drugx.Contact:[email protected], [email protected]


2020 ◽  
Vol 36 (11) ◽  
pp. 3602-3604 ◽  
Author(s):  
Swapnil Potdar ◽  
Aleksandr Ianevski ◽  
John-Patrick Mpindi ◽  
Dmitrii Bychkov ◽  
Clément Fiere ◽  
...  

Abstract Summary High-throughput screening (HTS) enables systematic testing of thousands of chemical compounds for potential use as investigational and therapeutic agents. HTS experiments are often conducted in multi-well plates that inherently bear technical and experimental sources of error. Thus, HTS data processing requires the use of robust quality control procedures before analysis and interpretation. Here, we have implemented an open-source analysis application, Breeze, an integrated quality control and data analysis application for HTS data. Furthermore, Breeze enables a reliable way to identify individual drug sensitivity and resistance patterns in cell lines or patient-derived samples for functional precision medicine applications. The Breeze application provides a complete solution for data quality assessment, dose–response curve fitting and quantification of the drug responses along with interactive visualization of the results. Availability and implementation The Breeze application with video tutorial and technical documentation is accessible at https://breeze.fimm.fi; the R source code is publicly available at https://github.com/potdarswapnil/Breeze under GNU General Public License v3.0. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Entropy ◽  
2020 ◽  
Vol 22 (11) ◽  
pp. 1238
Author(s):  
Krzysztof Gogolewski ◽  
Marcin Kostecki ◽  
Anna Gambin

The constantly and rapidly increasing amount of the biological data gained from many different high-throughput experiments opens up new possibilities for data- and model-driven inference. Yet, alongside, emerges a problem of risks related to data integration techniques. The latter are not so widely taken account of. Especially, the approaches based on the flux balance analysis (FBA) are sensitive to the structure of a metabolic network for which the low-entropy clusters can prevent the inference from the activity of the metabolic reactions. In the following article, we set forth problems that may arise during the integration of metabolomic data with gene expression datasets. We analyze common pitfalls, provide their possible solutions, and exemplify them by a case study of the renal cell carcinoma (RCC). Using the proposed approach we provide a metabolic description of the known morphological RCC subtypes and suggest a possible existence of the poor-prognosis cluster of patients, which are commonly characterized by the low activity of the drug transporting enzymes crucial in the chemotherapy. This discovery suits and extends the already known poor-prognosis characteristics of RCC. Finally, the goal of this work is also to point out the problem that arises from the integration of high-throughput data with the inherently nonuniform, manually curated low-throughput data. In such cases, the over-represented information may potentially overshadow the non-trivial discoveries.


Sign in / Sign up

Export Citation Format

Share Document