scholarly journals CausalMGM: an interactive web-based causal discovery tool

2020 ◽  
Vol 48 (W1) ◽  
pp. W597-W602 ◽  
Author(s):  
Xiaoyu Ge ◽  
Vineet K Raghu ◽  
Panos K Chrysanthis ◽  
Panayiotis V Benos

Abstract High-throughput sequencing and the availability of large online data repositories (e.g. The Cancer Genome Atlas and Trans-Omics for Precision Medicine) have the potential to revolutionize systems biology by enabling researchers to study interactions between data from different modalities (i.e. genetic, genomic, clinical, behavioral, etc.). Currently, data mining and statistical approaches are confined to identifying correlates in these datasets, but researchers are often interested in identifying cause-and-effect relationships. Causal discovery methods were developed to infer such cause-and-effect relationships from observational data. Though these algorithms have had demonstrated successes in several biomedical applications, they are difficult to use for non-experts. So, there is a need for web-based tools to make causal discovery methods accessible. Here, we present CausalMGM (http://causalmgm.org/), the first web-based causal discovery tool that enables researchers to find cause-and-effect relationships from observational data. Web-based CausalMGM consists of three data analysis tools: (i) feature selection and clustering; (ii) automated identification of cause-and-effect relationships via a graphical model; and (iii) interactive visualization of the learned causal (directed) graph. We demonstrate how CausalMGM enables an end-to-end exploratory analysis of biomedical datasets, giving researchers a clearer picture of its capabilities.

GigaScience ◽  
2021 ◽  
Vol 10 (2) ◽  
Author(s):  
Guilhem Sempéré ◽  
Adrien Pétel ◽  
Magsen Abbé ◽  
Pierre Lefeuvre ◽  
Philippe Roumagnac ◽  
...  

Abstract Background Efficiently managing large, heterogeneous data in a structured yet flexible way is a challenge to research laboratories working with genomic data. Specifically regarding both shotgun- and metabarcoding-based metagenomics, while online reference databases and user-friendly tools exist for running various types of analyses (e.g., Qiime, Mothur, Megan, IMG/VR, Anvi'o, Qiita, MetaVir), scientists lack comprehensive software for easily building scalable, searchable, online data repositories on which they can rely during their ongoing research. Results metaXplor is a scalable, distributable, fully web-interfaced application for managing, sharing, and exploring metagenomic data. Being based on a flexible NoSQL data model, it has few constraints regarding dataset contents and thus proves useful for handling outputs from both shotgun and metabarcoding techniques. By supporting incremental data feeding and providing means to combine filters on all imported fields, it allows for exhaustive content browsing, as well as rapid narrowing to find specific records. The application also features various interactive data visualization tools, ways to query contents by BLASTing external sequences, and an integrated pipeline to enrich assignments with phylogenetic placements. The project home page provides the URL of a live instance allowing users to test the system on public data. Conclusion metaXplor allows efficient management and exploration of metagenomic data. Its availability as a set of Docker containers, making it easy to deploy on academic servers, on the cloud, or even on personal computers, will facilitate its adoption.


2021 ◽  
Vol 11 (8) ◽  
pp. 1288-1298
Author(s):  
Liang Wang ◽  
Fengxia Xue

Endometrial cancer is one of the most common gynecological malignancies, and DNA methylation plays a vital role in its occurrence and development. In this study, we collected the relevant data on endometrial cancer from the Cancer Genome Atlas database and UCSC website. By screening and processing the data, we obtained 410 samples and 16,381 methylation sites. Endometrial carcinoma can be divided into seven molecular subtypes using consensus clustering method. Based on the analysis of the differences among subtypes, the methylation degree of different sites was obtained, and the prognosis model of methylation sites was established. Based on the median value of the train group, the train and test groups were divided into high and low-risk groups. The survival between the high and low-risk groups was different. It also showed that this model can predict the survival of patients, with better accuracy. In conclusion, the tumor subtypes based on methylation sites can provide a better guidance for treatment, relapse, and prognosis of endometrial cancer. In this study, magnetic nanoparticles can be used to extract genomic DNA and total RNA due to their paramagnetism and biocompatibility, then transcriptome high-throughput sequencing was performed. It may serve as potential cancer immune biomarker targets for developing future oncological treatments.


2021 ◽  
Author(s):  
Victoria Leong ◽  
Kausar Raheel ◽  
Sim Jia Yi ◽  
Kriti Kacker ◽  
Vasilis M. Karlaftis ◽  
...  

Background. The global COVID-19 pandemic has triggered a fundamental reexamination of how human psychological research can be conducted both safely and robustly in a new era of digital working and physical distancing. Online web-based testing has risen to the fore as a promising solution for rapid mass collection of cognitive data without requiring human contact. However, a long-standing debate exists over the data quality and validity of web-based studies. Here, we examine the opportunities and challenges afforded by the societal shift toward web-based testing, highlight an urgent need to establish a standard data quality assurance framework for online studies, and develop and validate a new supervised online testing methodology, remote guided testing (RGT). Methods. A total of 85 healthy young adults were tested on 10 cognitive tasks assessing executive functioning (flexibility, memory and inhibition) and learning. Tasks were administered either face-to-face in the laboratory (N=41) or online using remote guided testing (N=44), delivered using identical web-based platforms (CANTAB, Inquisit and i-ABC). Data quality was assessed using detailed trial-level measures (missed trials, outlying and excluded responses, response times), as well as overall task performance measures. Results. The results indicated that, across all measures of data quality and performance, RGT data was statistically-equivalent to data collected in person in the lab. Moreover, RGT participants out-performed the lab group on measured verbal intelligence, which could reflect test environment differences, including possible effects of mask-wearing on communication. Conclusions. These data suggest that the RGT methodology could help to ameliorate concerns regarding online data quality and - particularly for studies involving high-risk or rare cohorts - offer an alternative for collecting high-quality human cognitive data without requiring in-person physical attendance.


2019 ◽  
Author(s):  
Wikum Dinalankara ◽  
Qian Ke ◽  
Donald Geman ◽  
Luigi Marchionni

AbstractGiven the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with sample high throughput sequencing data from the Cancer Genome Atlas.


2020 ◽  
Author(s):  
Andreas Gerhardus ◽  
Jakob Runge

<p>Scientific inquiry seeks to understand natural phenomena by understanding their underlying processes, i.e., by identifying cause and effect. In addition to mere scientific curiosity, an understanding of cause and effect relationships is necessary to predict the effect of changing dynamical regimes and for the attribution of extreme events to potential causes. It is thus an important question to ask how, in cases where controlled experiments are not feasible, causation can still be inferred from the statistical dependencies in observed time series.</p><p>A central obstacle for such an inference is the potential existence of unobserved causally relevant variables. Arguably, this is more likely to be the case than not, for example unmeasured deep oceanic variables in atmospheric processes. Unobserved variables can act as confounders (meaning they are a common cause of two or more observed variables) and thus introduce spurious, i.e., non-causal dependencies. Despite these complications, the last three decades have seen the development of so-called causal discovery algorithms (an example being FCI by Spirtes et al., 1999) that are often able to identify spurious associations and to distinguish them from genuine causation. This opens the possibility for a data-driven approach to infer cause and effect relationships among climate variables, thereby contributing to a better understanding of Earth's complex climate system.</p><p>These methods are, however, not yet well adapted to some specific challenges that climate time series often come with, e.g. strong autocorrelation, time lags and nonlinearities. To close this methodological gap, we generalize the ideas of the recent PCMCI causal discovery algorithm (Runge et al., 2019) to time series where unobserved causally relevant variables may exist (in contrast, PCMCI made the assumption of no confounding). Further, we present preliminary applications to modes of climate variability.</p>


2021 ◽  
Author(s):  
Victoria Leong ◽  
Kausar Raheel ◽  
Jia Yi Sim ◽  
Kriti Kacker ◽  
Vasilis M Karlaftis ◽  
...  

BACKGROUND The global COVID-19 pandemic has triggered a fundamental reexamination of how human psychological research can be conducted both safely and robustly in a new era of digital working and physical distancing. Online web-based testing has risen to the fore as a promising solution for rapid mass collection of cognitive data without requiring human contact. However, a long-standing debate exists over the data quality and validity of web-based studies. OBJECTIVE Here, we examine the opportunities and challenges afforded by the societal shift toward web-based testing, highlight an urgent need to establish a standard data quality assurance framework for online studies, and develop and validate a new supervised online testing methodology, remote guided testing (RGT). METHODS A total of 85 healthy young adults were tested on 10 cognitive tasks assessing executive functioning (flexibility, memory and inhibition) and learning. Tasks were administered either face-to-face in the laboratory (N=41) or online using remote guided testing (N=44), delivered using identical web-based platforms (CANTAB, Inquisit and i-ABC). Data quality was assessed using detailed trial-level measures (missed trials, outlying and excluded responses, response times), as well as overall task performance measures. RESULTS The results indicated that, across all measures of data quality and performance, RGT data was statistically-equivalent to data collected in person in the lab. Moreover, RGT participants out-performed the lab group on measured verbal intelligence, which could reflect test environment differences, including possible effects of mask-wearing on communication. CONCLUSIONS These data suggest that the RGT methodology could help to ameliorate concerns regarding online data quality and - particularly for studies involving high-risk or rare cohorts - offer an alternative for collecting high-quality human cognitive data without requiring in-person physical attendance. CLINICALTRIAL N.A.


Author(s):  
Gordana Collier ◽  
Andy Augousti ◽  
Andrzej Ordys

The continual development of technology represents a challenge when preparing engineering students for future employment. At the same time, the way students interact in everyday life is evolving: their extra-curricular life is filled with an enormous amount of stimulus, from online data to rich Web-based social interaction. This chapter provides an assessment of various learning technology-driven methods for enhancing both teaching and learning in the science and engineering disciplines. It describes the past, present, and future drivers for the implementation of hands-on teaching methods, incorporating industry standard software and hardware and the evolution of learning experiments into all-encompassing online environments that include socializing, learning, entertainment, and any other aspect of student life when studying science and engineering.


2019 ◽  
pp. 801-823
Author(s):  
Gordana Collier ◽  
Andy Augousti ◽  
Andrzej Ordys

The continual development of technology represents a challenge when preparing engineering students for future employment. At the same time, the way students interact in everyday life is evolving: their extra-curricular life is filled with an enormous amount of stimulus, from online data to rich Web-based social interaction. This chapter provides an assessment of various learning technology-driven methods for enhancing both teaching and learning in the science and engineering disciplines. It describes the past, present, and future drivers for the implementation of hands-on teaching methods, incorporating industry standard software and hardware and the evolution of learning experiments into all-encompassing online environments that include socializing, learning, entertainment, and any other aspect of student life when studying science and engineering.


2003 ◽  
pp. 266-297
Author(s):  
Zahir Tari ◽  
Abdelkamel Tari ◽  
Surya Setiawan

Connecting heterogeneous databases through the World Wide Web (WWW) is crucial for most business organizations. The underlying complex problem is the handling of heterogeneity and communication between different data repositories (or database systems). Such interoperability is crucial as it enables the integration of business processes across different business organizations, and therefore becomes a key issue within the new generation of Web-based business applications (called Web Services). CORBA (Common Object Request Broker Architecture) provides protocols and components that allow interoperability between different software platforms (Tari & Bukhres, 2001), such as C++ and Java. However, CORBA does not deal with WWW-based interoperability. In this paper we propose an extension of one of the core elements of CORBA, called Portable Object Adapter (POA), to deal with persistency of business information. The proposed extension, called CODAR, manages the whole life cycle of persistent objects, including activation, deactivation, instantiation, and deletion. At the end of this paper we describe an extension of CODAR to deal with performance by including advanced caching and prefetching techniques.


Sign in / Sign up

Export Citation Format

Share Document