scholarly journals BIOMEX: an interactive workflow for (single cell) omics data interpretation and visualization

2020 ◽  
Vol 48 (W1) ◽  
pp. W385-W394
Author(s):  
Federico Taverna ◽  
Jermaine Goveia ◽  
Tobias K Karakach ◽  
Shawez Khan ◽  
Katerina Rohlenova ◽  
...  

Abstract The amount of biological data, generated with (single cell) omics technologies, is rapidly increasing, thereby exacerbating bottlenecks in the data analysis and interpretation of omics experiments. Data mining platforms that facilitate non-bioinformatician experimental scientists to analyze a wide range of experimental designs and data types can alleviate such bottlenecks, aiding in the exploration of (newly generated or publicly available) omics datasets. Here, we present BIOMEX, a browser-based software, designed to facilitate the Biological Interpretation Of Multi-omics EXperiments by bench scientists. BIOMEX integrates state-of-the-art statistical tools and field-tested algorithms into a flexible but well-defined workflow that accommodates metabolomics, transcriptomics, proteomics, mass cytometry and single cell data from different platforms and organisms. The BIOMEX workflow is accompanied by a manual and video tutorials that provide the necessary background to navigate the interface and get acquainted with the employed methods. BIOMEX guides the user through omics-tailored analyses, such as data pretreatment and normalization, dimensionality reduction, differential and enrichment analysis, pathway mapping, clustering, marker analysis, trajectory inference, meta-analysis and others. BIOMEX is fully interactive, allowing users to easily change parameters and generate customized plots exportable as high-quality publication-ready figures. BIOMEX is open source and freely available at https://www.vibcancer.be/software-tools/biomex.

Author(s):  
Mufti Mahmud ◽  
M. Shamim Kaiser ◽  
T. Martin McGinnity ◽  
Amir Hussain

AbstractRecent technological advancements in data acquisition tools allowed life scientists to acquire multimodal data from different biological application domains. Categorized in three broad types (i.e. images, signals, and sequences), these data are huge in amount and complex in nature. Mining such enormous amount of data for pattern recognition is a big challenge and requires sophisticated data-intensive machine learning techniques. Artificial neural network-based learning systems are well known for their pattern recognition capabilities, and lately their deep architectures—known as deep learning (DL)—have been successfully applied to solve many complex pattern recognition problems. To investigate how DL—especially its different architectures—has contributed and been utilized in the mining of biological data pertaining to those three types, a meta-analysis has been performed and the resulting resources have been critically analysed. Focusing on the use of DL to analyse patterns in data from diverse biological domains, this work investigates different DL architectures’ applications to these data. This is followed by an exploration of available open access data sources pertaining to the three data types along with popular open-source DL tools applicable to these data. Also, comparative investigations of these tools from qualitative, quantitative, and benchmarking perspectives are provided. Finally, some open research challenges in using DL to mine biological data are outlined and a number of possible future perspectives are put forward.


2020 ◽  
Author(s):  
Zihan Zheng ◽  
Qiu Xin ◽  
Haiyang Wu ◽  
Ling Chang ◽  
Xiangyu Tang ◽  
...  

Recent advances in bioinformatics analyses have led to the development of novel tools enabling the capture and trajectory mapping of single-cell RNA sequencing (scRNAseq) data. However, there is a lack of methods to assess the contributions of biological pathways and transcription factors to an overall developmental trajectory mapped from scRNAseq data. In this manuscript, we present a simplified approach for trajectory inference of pathway significance (TIPS) that leverages existing knowledgebases of functional pathways and transcription factor targets to enable further mechanistic insights into a biological process. TIPS returns both the key pathways whose changes are associated with the process of interest, as well as the individual genes that best reflect these changes. TIPS also provides insight into the relative timing of pathway changes, as well as a suite of visualizations to enable simplified data interpretation of scRNAseq libraries generated using a wide range of techniques. The TIPS package can be run through either a web server, or downloaded as a user-friendly GUI run in R, and may serve as a useful tool to help biologists perform deeper functional analyses and visualization of their single-cell and/or large cohort RNAseq data.


2021 ◽  
Author(s):  
Mark S Keller ◽  
Ilan Gold ◽  
Chuck McCallum ◽  
Trevor Manz ◽  
Peter V Kharchenko ◽  
...  

Vitessce is an open-source interactive visualization framework for exploration of multi-modal and spatially-resolved single-cell data, with a modular architecture compatible with transcriptomic, proteomic, genome-mapped, and imaging data types. Its modular, coordinated multiple view implementation facilitates a wide range of visualization tasks to support all common single-cell assays. Vitessce is a client-side web application designed to be integrated with computational analysis tools and data resources and does not require specialized server infrastructure. The software is available at http://vitessce.io.


Author(s):  
George Vavougios ◽  
Marianthi Breza ◽  
Sofia Nikou ◽  
Karen Krogfelt

Introduction IFITM3, an innate immune protein linked to COVID-19 severity, has recently been identified as a novel γ-secretase modulator. Independent research has shown that IFITM3 may facilitate SARS-CoV-2 neurotropism in an ACE2-independent manner. In a previous study, we had detected perturbations in IFITM3 networks in both the CNS and peripheral immune cells donated by AD patients.The purpose of this study is to explore the transcriptomic evidence of the SARS-CoV-2 / IFITM3 / AD interplay, validating previous findings from our group. Methods Exploratory analyses involved meta-analysis of bulk and single cell RNA data for IFITM3 and FYN differential expression. For confirmatory analyses, we performed gene set enrichment analysis (GSEA) on an AD gene signature from AD Consensus transcriptomics; using the Enrichr platform, we scrutinized COVID-19 datasets for significant, overlapping enriched biological networks. Results Bulk RNA data analysis revealed that IFITM3 and FYN were differentially expressed in two CNS regions in AD: the temporal cortex (AD vs. Controls, adj.p-value=1.3e-6) and the parahippocampal cortex (AD vs. controls, adj.p-value=0.012). Correspondingly, single cell RNA analysis of IFITM3 and FYN revealed that it was differentially expressed in neuronal cells donated from AD patients (astrocytes, microglia and oligodendrocyte precursor cells), when compared to controls. Discussion IFITM3 and by extent FYN were found as interactors within biological networks overlapping between AD and SARS-CoV-2 infection. SARS-CoV-2 SARS-CoV-2-mediated IFITM3 induction would mechanistically result in increased Aβ production. FYN recruitment by viral processes results in abrogation of both fusion of IFITM3 vesicles with lysosomes; immunoevasion, by FYN-mediated impairment of autophagy would then serve to promote impaired detoxification from Aβ, while propagating Tau pathology in an IFITM3-independent manner.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 136
Author(s):  
Rutger A. Vos ◽  
Toshiaki Katayama ◽  
Hiroyuki Mishima ◽  
Shin Kawano ◽  
Shuichi Kawashima ◽  
...  

We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jack Cheng ◽  
Hsin-Ping Liu ◽  
Wei-Yong Lin ◽  
Fuu-Jen Tsai

AbstractAlzheimer’s disease (AD) is a neurodegenerative disorder causing 70% of dementia cases. However, the mechanism of disease development is still elusive. Despite the availability of a wide range of biological data, a comprehensive understanding of AD's mechanism from machine learning (ML) is so far unrealized, majorly due to the lack of needed data density. To harness the AD mechanism's knowledge from the expression profiles of postmortem prefrontal cortex samples of 310 AD and 157 controls, we used seven predictive operators or combinations of RapidMiner Studio operators to establish predictive models from the input matrix and to assign a weight to each attribute. Besides, conventional fold-change methods were also applied as controls. The identified genes were further submitted to enrichment analysis for KEGG pathways. The average accuracy of ML models ranges from 86.30% to 91.22%. The overlap ratio of the identified genes between ML and conventional methods ranges from 19.7% to 21.3%. ML exclusively identified oxidative phosphorylation genes in the AD pathway. Our results highlighted the deficiency of oxidative phosphorylation in AD and suggest that ML should be considered as complementary to the conventional fold-change methods in transcriptome studies.


2020 ◽  
Author(s):  
Jorge Oscanoa ◽  
Lavanya Sivapalan ◽  
Maryam Abdollahyan ◽  
Emanuela Gadaleta ◽  
Claude Chelala

ABSTRACTThe severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has demanded an unprecedented scientific response, with researchers collaborating on a global scale to better understand how host genetics can influence susceptibility to coronavirus infection and the severity of COVID-19 symptoms. The number of projects directed towards sequencing patients’ genomes has increased rapidly during this time with the rate of data generation outpacing the resources available for analysis and biological interpretation of these datasets. SNPnexus COVID is a cutting-edge web-based analytical platform that allows researchers to analyse and interpret the functional implications of genetic variants in COVID-19 patient genomes and to prioritise those that demonstrate clinical utility for the prevention, management and/or treatment of COVID-19. Our resource links to diverse multifactorial datasets and information resources that would require substantial time and computational power to otherwise mine independently. This streamlines biological data interpretation and allows researchers to better understand the multidimensional characteristics of their data. Importantly, SNPnexus COVID is powered by the SNPnexus software and follows its intuitive infrastructure, which precludes the need for programmatic experience in its users.SNPnexus COVID is freely available at https://www.snp-nexus.org/v4/covid/


2020 ◽  
Author(s):  
Jialin Liu ◽  
Chao Gao ◽  
Joshua Sodicoff ◽  
Velina Kozareva ◽  
Evan Z. Macosko ◽  
...  

AbstractHigh-throughput single-cell sequencing technologies hold tremendous potential for defining cell types in an unbiased fashion using gene expression and epigenomic state. A key challenge in realizing this potential is integrating single-cell datasets from multiple protocols, biological contexts, and data modalities into a joint definition of cellular identity. We previously developed an approach called Linked Inference of Genomic Experimental Relationships (LIGER) that uses integrative nonnegative matrix factorization to address this challenge. Here, we provide a step-by-step protocol for using LIGER to jointly define cell types from multiple single-cell datasets. The main steps of the protocol include data preprocessing and normalization, joint factorization, quantile normalization and joint clustering, and visualization. We describe how to jointly define cell types from single-cell RNA-seq and single-nucleus ATAC-seq data, but similar steps apply across a wide range of other settings and data types, including cross-species analysis, single-nucleus DNA methylation, and spatial transcriptomics. Our protocol contains examples of expected results, describes common pitfalls, and relies only on our freely available, open-source R implementation of LIGER. We also provide Rmarkdown tutorials showing the outputs from each individual code segment. The analysis process can be performed in 1 - 4 h depending on dataset size and assumes no specialized bioinformatics training.


2018 ◽  
Vol 20 (4) ◽  
pp. 1450-1465 ◽  
Author(s):  
Juan Xie ◽  
Anjun Ma ◽  
Anne Fennell ◽  
Qin Ma ◽  
Jing Zhao

Abstract Biclustering is a powerful data mining technique that allows clustering of rows and columns, simultaneously, in a matrix-format data set. It was first applied to gene expression data in 2000, aiming to identify co-expressed genes under a subset of all the conditions/samples. During the past 17 years, tens of biclustering algorithms and tools have been developed to enhance the ability to make sense out of large data sets generated in the wake of high-throughput omics technologies. These algorithms and tools have been applied to a wide variety of data types, including but not limited to, genomes, transcriptomes, exomes, epigenomes, phenomes and pharmacogenomes. However, there is still a considerable gap between biclustering methodology development and comprehensive data interpretation, mainly because of the lack of knowledge for the selection of appropriate biclustering tools and further supporting computational techniques in specific studies. Here, we first deliver a brief introduction to the existing biclustering algorithms and tools in public domain, and then systematically summarize the basic applications of biclustering for biological data and more advanced applications of biclustering for biomedical data. This review will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency.


Author(s):  
Max Lam ◽  
Swapnil Awasthi ◽  
Hunna J Watson ◽  
Jackie Goldstein ◽  
Georgia Panagiotaropoulou ◽  
...  

Abstract Summary Genome-wide association study (GWAS) analyses, at sufficient sample sizes and power, have successfully revealed biological insights for several complex traits. RICOPILI, an open-sourced Perl-based pipeline was developed to address the challenges of rapidly processing large-scale multi-cohort GWAS studies including quality control (QC), imputation and downstream analyses. The pipeline is computationally efficient with portability to a wide range of high-performance computing environments. RICOPILI was created as the Psychiatric Genomics Consortium pipeline for GWAS and adopted by other users. The pipeline features (i) technical and genomic QC in case-control and trio cohorts, (ii) genome-wide phasing and imputation, (iv) association analysis, (v) meta-analysis, (vi) polygenic risk scoring and (vii) replication analysis. Notably, a major differentiator from other GWAS pipelines, RICOPILI leverages on automated parallelization and cluster job management approaches for rapid production of imputed genome-wide data. A comprehensive meta-analysis of simulated GWAS data has been incorporated demonstrating each step of the pipeline. This includes all the associated visualization plots, to allow ease of data interpretation and manuscript preparation. Simulated GWAS datasets are also packaged with the pipeline for user training tutorials and developer work. Availability and implementation RICOPILI has a flexible architecture to allow for ongoing development and incorporation of newer available algorithms and is adaptable to various HPC environments (QSUB, BSUB, SLURM and others). Specific links for genomic resources are either directly provided in this paper or via tutorials and external links. The central location hosting scripts and tutorials is found at this URL: https://sites.google.com/a/broadinstitute.org/RICOPILI/home Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document