scholarly journals Identifying significantly impacted pathways: a comprehensive review and assessment

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Tuan-Minh Nguyen ◽  
Adib Shafi ◽  
Tin Nguyen ◽  
Sorin Draghici

Abstract Background Many high-throughput experiments compare two phenotypes such as disease vs. healthy, with the goal of understanding the underlying biological phenomena characterizing the given phenotype. Because of the importance of this type of analysis, more than 70 pathway analysis methods have been proposed so far. These can be categorized into two main categories: non-topology-based (non-TB) and topology-based (TB). Although some review papers discuss this topic from different aspects, there is no systematic, large-scale assessment of such methods. Furthermore, the majority of the pathway analysis approaches rely on the assumption of uniformity of p values under the null hypothesis, which is often not true. Results This article presents the most comprehensive comparative study on pathway analysis methods available to date. We compare the actual performance of 13 widely used pathway analysis methods in over 1085 analyses. These comparisons were performed using 2601 samples from 75 human disease data sets and 121 samples from 11 knockout mouse data sets. In addition, we investigate the extent to which each method is biased under the null hypothesis. Together, these data and results constitute a reliable benchmark against which future pathway analysis methods could and should be tested. Conclusion Overall, the result shows that no method is perfect. In general, TB methods appear to perform better than non-TB methods. This is somewhat expected since the TB methods take into consideration the structure of the pathway which is meant to describe the underlying phenomena. We also discover that most, if not all, listed approaches are biased and can produce skewed results under the null.

2020 ◽  
Vol 32 (3) ◽  
pp. 763-778
Author(s):  
Zhuqi Miao ◽  
Balabhaskar Balasundaram

A γ-quasi-clique in a simple undirected graph refers to a subset of vertices that induces a subgraph with edge density at least γ. When γ equals one, this definition corresponds to a classical clique. When γ is less than one, it relaxes the requirement of all possible edges by the clique definition. Quasi-clique detection has been used in graph-based data mining to find dense clusters, especially in large-scale error-prone data sets in which the clique model can be overly restrictive. The maximum γ-quasi-clique problem, seeking a γ-quasi-clique of maximum cardinality in the given graph, can be formulated as an optimization problem with a linear objective function and a single quadratic constraint in binary variables. This article investigates the Lagrangian dual of this formulation and develops an upper-bounding technique using the geometry of ellipsoids to bound the Lagrangian dual. The tightness of the upper bound is compared with those obtained from multiple mixed-integer programming formulations of the problem via experiments on benchmark instances.


2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Plamen V. Mirazchiyski

AbstractThis paper presents the R Analyzer for Large-Scale Assessments (), a newly developed package for analyzing data from studies using complex sampling and assessment designs. Such studies are, for example, the IEA’s Trends in International Mathematics and Science Study and the OECD’s Programme for International Student Assessment. The package covers all cycles from a broad range of studies. The paper presents the architecture of the package, the overall workflow and illustrates some basic analyses using it. The package is open-source and free of charge. Other software packages for analyzing large-scale assessment data exist, some of them are proprietary, others are open-source. However, is the first comprehensive package, designed for the user experience and has some distinctive features. One innovation is that the package can convert SPSS data from large scale assessments into native data sets. It can also do so for PISA data from cycles prior to 2015, where the data is provided in tab-delimited text files along with SPSS control syntax files. Another feature is the availability of a graphical user interface, which is also written in and operates in any operating system where a full copy of can be installed. The output from any analysis function is written into an MS Excel workbook with multiple sheets for the estimates, model statistics, analysis information and the calling syntax itself for reproducing the analysis in future. The flexible design of allows for the quick addition of new studies, analysis types and features to the existing ones.


2017 ◽  
Vol 69 (5) ◽  
pp. 545-556 ◽  
Author(s):  
Philippe Mongeon ◽  
Nicolas Robinson-Garcia ◽  
Wei Jeng ◽  
Rodrigo Costas

Purpose It is widely recognized that sharing data is beneficial not only for science but also for the common good, and researchers are increasingly expected to share their data. However, many researchers are still not making their data available, one of the reasons being that this activity is not adequately recognized in the current reward system of science. Since the attribution of data sets to individual researchers is necessary if we are to include them in research evaluation processes, the purpose of this paper is to explore the feasibility of linking data set records from DataCite to the authors of articles indexed in the Web of Science. Design/methodology/approach DataCite and WoS records are linked together based on the similarity between the names of the data sets’ creators and the articles’ authors, as well as the similarity between the noun phrases in the titles of the data sets and the titles and abstract of the articles. Findings The authors report that a large number of DataCite records can be attributed to specific authors in WoS, and the authors demonstrate that the prevalence of data sharing varies greatly depending on the research discipline. Originality/value It is yet unclear how data sharing can provide adequate recognition for individual researchers. Bibliometric indicators are commonly used for research evaluation, but to date no large-scale assessment of individual researchers’ data sharing activities has been carried out.


Author(s):  
Peter Ghazal

An increasing number of biological experiments and more recently clinical based studies are being conducted using large-scale genomic, proteomic and metabolomic techniques which generate high-dimensional data sets. Such approaches require the adoption of both hypothesis and data driven strategies in the analysis and interpretation of results. In particular, data-mining and pattern recognition methodologies have proven particularly useful in this field. The increasing amount of information available from high-throughput experiments has initiated a move from focussed, single gene and protein investigations abstract Systems biology provides a new approach to studying, analyzing, and ultimately controlling biological processes. Biological pathways represent a key sub-system level of organization that seamlessly perform complex information processing and control tasks. The aim of pathway biology is to map and understand the cause-effect relationships and dependencies associated with the complex interactions of biological networks and systems. Drugs that therapeutically modulate the biological processes of disease are often developed with limited knowledge of the underlying complexity of their specific targets. Considering the combinatorial complexity from the outset might help identify potential causal relationships that could lead to a better understanding of the drug-target biology as well as provide new biomarkers for modelling diagnosis and treatment response in patients. This chapter discusses the use of a pathway biology approach to modelling biological processes and providing a new framework for experimental medicine in the post-genomic era.


2013 ◽  
Author(s):  
Laura S. Hamilton ◽  
Stephen P. Klein ◽  
William Lorie

Author(s):  
Christina Schindler ◽  
Hannah Baumann ◽  
Andreas Blum ◽  
Dietrich Böse ◽  
Hans-Peter Buchstaller ◽  
...  

Here we present an evaluation of the binding affinity prediction accuracy of the free energy calculation method FEP+ on internal active drug discovery projects and on a large new public benchmark set.<br>


2021 ◽  
Vol 29 ◽  
pp. 115-124
Author(s):  
Xinlu Wang ◽  
Ahmed A.F. Saif ◽  
Dayou Liu ◽  
Yungang Zhu ◽  
Jon Atli Benediktsson

BACKGROUND: DNA sequence alignment is one of the most fundamental and important operation to identify which gene family may contain this sequence, pattern matching for DNA sequence has been a fundamental issue in biomedical engineering, biotechnology and health informatics. OBJECTIVE: To solve this problem, this study proposes an optimal multi pattern matching with wildcards for DNA sequence. METHODS: This proposed method packs the patterns and a sliding window of texts, and the window slides along the given packed text, matching against stored packed patterns. RESULTS: Three data sets are used to test the performance of the proposed algorithm, and the algorithm was seen to be more efficient than the competitors because its operation is close to machine language. CONCLUSIONS: Theoretical analysis and experimental results both demonstrate that the proposed method outperforms the state-of-the-art methods and is especially effective for the DNA sequence.


Author(s):  
Lior Shamir

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .


Sign in / Sign up

Export Citation Format

Share Document