scholarly journals Integrated simultaneous analysis of different biomedical data types with exact weighted bi-cluster editing

2012 ◽  
Vol 9 (2) ◽  
pp. 53-67 ◽  
Author(s):  
Peng Sun ◽  
Jiong Guo ◽  
Jan Baumbach

Summary The explosion of biological data has largely influenced the focus of today’s biology research. Integrating and analysing large quantity of data to provide meaningful insights has become the main challenge to biologists and bioinformaticians. One major problem is the combined data analysis of data from different types, such as phenotypes and genotypes. This data is modelled as bi-partite graphs where nodes correspond to the different data points, mutations and diseases for instance, and weighted edges relate to associations between them. Bi-clustering is a special case of clustering designed for partitioning two different types of data simultaneously. We present a bi-clustering approach that solves the NP-hard weighted bi-cluster editing problem by transforming a given bi-partite graph into a disjoint union of bi-cliques. Here we contribute with an exact algorithm that is based on fixed-parameter tractability. We evaluated its performance on artificial graphs first. Afterwards we exemplarily applied our Java implementation to data of genome-wide association studies (GWAS) data aiming for discovering new, previously unobserved geno-to-pheno associations. We believe that our results will serve as guidelines for further wet lab investigations. Generally our software can be applied to any kind of data that can be modelled as bi-partite graphs. To our knowledge it is the fastest exact method for weighted bi-cluster editing problem.

2020 ◽  
Author(s):  
Mike A. Nalls ◽  
Cornelis Blauwendraat ◽  
Lana Sargent ◽  
Dan Vitale ◽  
Hampton Leonard ◽  
...  

SUMMARYBackgroundPrevious research using genome wide association studies (GWAS) has identified variants that may contribute to lifetime risk of multiple neurodegenerative diseases. However, whether there are common mechanisms that link neurodegenerative diseases is uncertain. Here, we focus on one gene, GRN, encoding progranulin, and the potential mechanistic interplay between genetic risk, gene expression in the brain and inflammation across multiple common neurodegenerative diseases.MethodsWe utilized GWAS, expression quantitative trait locus (eQTL) mapping and Bayesian colocalization analyses to evaluate potential causal and mechanistic inferences. We integrate various molecular data types from public resources to infer disease connectivity and shared mechanisms using a data driven process.FindingseQTL analyses combined with GWAS identified significant functional associations between increasing genetic risk in the GRN region and decreased expression of the gene in Parkinson’s, Alzheimer’s and amyotrophic lateral sclerosis. Additionally, colocalization analyses show a connection between blood based inflammatory biomarkers relating to platelets and GRN expression in the frontal cortex.InterpretationGRN expression mediates neuroinflammation function related to general neurodegeneration. This analysis suggests shared mechanisms for Parkinson’s, Alzheimer’s and amyotrophic lateral sclerosis.FundingNational Institute on Aging, National Institute of Neurological Disorders and Stroke, and the Michael J. Fox Foundation.


2018 ◽  
Author(s):  
Junjie Zhu ◽  
Qian Zhao ◽  
Eugene Katsevich ◽  
Chiara Sabatti

AbstractThe Gene Ontology (GO) is a central resource for functional-genomics research. Scientists rely on the functional annotations in the GO for hypothesis generation and couple it with high-throughput biological data to enhance interpretation of results. At the same time, the sheer number of concepts (>30,000) and relationships (>70,000) presents a challenge: it can be difficult to draw a comprehensive picture of how certain concepts of interest might relate with the rest of the ontology structure. Here we present new visualization strategies to facilitate the exploration and use of the information in the GO. We rely on novel graphical display and software architecture that allow significant interaction. To illustrate the potential of our strategies, we provide examples from high-throughput genomic analyses, including chromatin immunoprecipitation experiments and genome-wide association studies. The scientist can also use our visualizations to identify gene sets that likely experience coordinated changes in their expression and use them to simulate biologically-grounded single cell RNA sequencing data, or conduct power studies for differential gene expression studies using our built-in pipeline. Our software and documentation are available at http://aegis.stanford.edu.


Author(s):  
Kate Langley

This chapter reviews the evidence suggesting that there is a strong genetic component to ADHD and the efforts to identify the specific genetic factors that might be involved. It discusses the different types of genetic contributions, from common to rare variants, and the evidence that these are involved in the aetiology of the disorder. An overview of the methodological strategies employed, including genome-wide association studies (GWAS), polygenic risk score, and copy number variant (CNV) analyses, is undertaken, as well as discussion of the strengths and pitfalls of such work. The contradictory findings in the field and controversies that arise as a result are also explored. Finally, this chapter considers how the heritability of ADHD and specific genetic factors involved need to be examined in the context of clinical factors such as comorbidity and how these factors affect investigations into the genetics of ADHD.


2019 ◽  
Vol 35 (24) ◽  
pp. 5182-5190 ◽  
Author(s):  
Luis G Leal ◽  
Alessia David ◽  
Marjo-Riita Jarvelin ◽  
Sylvain Sebert ◽  
Minna Männikkö ◽  
...  

Abstract Motivation Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. Results We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs. Availability and implementation An R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. Supplementary information Supplementary data are available at Bioinformatics online.


Entropy ◽  
2018 ◽  
Vol 20 (10) ◽  
pp. 764 ◽  
Author(s):  
John McCamley ◽  
William Denton ◽  
Andrew Arnold ◽  
Peter Raffalt ◽  
Jennifer Yentes

Sample entropy (SE) has relative consistency using biologically-derived, discrete data >500 data points. For certain populations, collecting this quantity is not feasible and continuous data has been used. The effect of using continuous versus discrete data on SE is unknown, nor are the relative effects of sampling rate and input parameters m (comparison vector length) and r (tolerance). Eleven subjects walked for 10-minutes and continuous joint angles (480 Hz) were calculated for each lower-extremity joint. Data were downsampled (240, 120, 60 Hz) and discrete range-of-motion was calculated. SE was quantified for angles and range-of-motion at all sampling rates and multiple combinations of parameters. A differential relationship between joints was observed between range-of-motion and joint angles. Range-of-motion SE showed no difference; whereas, joint angle SE significantly decreased from ankle to knee to hip. To confirm findings from biological data, continuous signals with manipulations to frequency, amplitude, and both were generated and underwent similar analysis to the biological data. In general, changes to m, r, and sampling rate had a greater effect on continuous compared to discrete data. Discrete data was robust to sampling rate and m. It is recommended that different data types not be compared and discrete data be used for SE.


Author(s):  
Yanli Zou ◽  
Jing Jing Li ◽  
Wei Xue ◽  
Xiangbin Kong ◽  
Hucheng Duan ◽  
...  

Uveitis is a sight-threatening intraocular inflammation, and the exact pathogenesis of uveitis is not yet clear. Recent studies, including multiple genome-wide association studies (GWASs), have identified genetic variations associated with the onset and progression of different types of uveitis, such as Vogt–Koyanagi–Harada (VKH) disease and Behcet’s disease (BD). However, epigenetic regulation has been shown to play key roles in the immunoregulation of uveitis, and epigenetic therapies are promising treatments for intraocular inflammation. In this review, we summarize recent advances in identifying epigenetic programs that cooperate with the physiology of intraocular immune responses and the pathology of intraocular inflammation. These attempts to understand the epigenetic mechanisms of uveitis may provide hope for the future development of epigenetic therapies for these devastating intraocular inflammatory conditions.


2017 ◽  
Author(s):  
Kyoko Watanabe ◽  
Erdogan Taskesen ◽  
Bochoven Arjen van ◽  
Bochoven Arjen van ◽  
Danielle Posthuma

ABSTRACTA main challenge in genome-wide association studies (GWAS) is to prioritize genetic variants and identify potential causal mechanisms of human diseases. Although multiple bioinformatics resources are available for functional annotation and prioritization, a standard, integrative approach is lacking. We developed FUMA: a web-based platform to facilitate functional annotation of GWAS results, prioritization of genes and interactive visualization of annotated results by incorporating information from multiple state-of-the-art biological databases.


2021 ◽  
Author(s):  
Xinpei Wang ◽  
Jinzhu Jia ◽  
Tao Huang

Abstract Purpose: To explore whether coffee intake is associated with cardiac metabolic risks from a genetic perspective, and whether this association remains the same among different types of coffee consumers.Methods: We utilised the summary-level results of 28 genome-wide association studies (total sample size: ~5,000,000). First, we used linkage disequilibrium score regression and cross-phenotypic association analysis to estimate the genetic correlation and identify shared genes between coffee intake and various cardiac metabolic risks. Second, we used Mendelian randomization (MR) analysis to test whether there was a significant genetically predicted causal association between coffee intake and cardiac metabolic risks. For all the analyses above, we also conducted a separate analysis for different types of coffee consumers, in addition to total coffee intake.Results: Genetically, coffee intake and choice for decaffeinated/instant coffee had significant positive correlation with body mass index (BMI) and some other cardiac metabolic risks, while choice for ground coffee was significantly negatively associated with these risks. Between these genetically related phenotypes, there were 1708 genomic shared regions, of which 139 loci were novel. Enrichment analysis showed that these shared genes were significantly enriched in antigen processing related biological processes. MR analysis indicated that higher genetically proxied coffee intake may increase BMI (b: 0.35, p-value: 1.80ⅹ10-05), while genetically proxied choice for ground coffee can reduce BMI (b: -0.08, p-value: 6.50ⅹ10-05), and the risk of T2D (T2D: b: -0.2, p-value: 4.70ⅹ10-10; T2D adjusted for BMI: b: -0.11, p-value: 4.60ⅹ10-05).Conclusions: Compared with other types of coffee, ground coffee has a significant negative genetic and genetically predicated causal relationship with cardiac metabolic risks. And this association is likely to be mediated by immunity. The effect of different coffee types on cardiac metabolic risks is not equal, researchers on coffee should pay more attention to distinguishing between coffee types.


F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 48 ◽  
Author(s):  
Guy Yachdav ◽  
Maximilian Hecht ◽  
Metsada Pasmanik-Chor ◽  
Adva Yeheskel ◽  
Burkhard Rost

Summary: The HeatMapViewer is a BioJS component that lays-out and renders two-dimensional (2D) plots or heat maps that are ideally suited to visualize matrix formatted data in biology such as for the display of microarray experiments or the outcome of mutational studies and the study of SNP-like sequence variants. It can be easily integrated into documents and provides a powerful, interactive way to visualize heat maps in web applications. The software uses a scalable graphics technology that adapts the visualization component to any required resolution, a useful feature for a presentation with many different data-points. The component can be applied to present various biological data types. Here, we present two such cases – showing gene expression data and visualizing mutability landscape analysis.Availability: https://github.com/biojs/biojs; http://dx.doi.org/10.5281/zenodo.7706.


2016 ◽  
Author(s):  
M. Ghareghani ◽  
S. A. Motahari ◽  
S. Khazaei ◽  
M. Tavassolipour

AbstractThe main challenge in reliable variant calling using DNA reads is to extract information from reads mappable to multiple locations on the reference genome. Conventional approaches ignore these reads and rely on reads mappable uniquely to the reference genome. These approaches fail to perform satisfactorily in variant calling within repeat regions which are abundant in many species including homo sapiens. This, in turn, lowers the reliability of any downstream analysis including poor performance in genome-wide association studies. GW-CALL, a fast and accurate variant caller, is proposed. GW-CALL exploits information of all reads in a genome-wide decision making process. In particular, it partitions the genome into several independent regions called clusters and incorporates an efficient algorithm to use all reads belonging to a cluster in calling variants within that cluster.AvailabilityGW-CALL is implemented in C++ and is freely available at URL: brl.ce.sharif.edu/gwcall.


Sign in / Sign up

Export Citation Format

Share Document