scholarly journals The solution of large-scale Minimum Cost SAT Problem as a tool for data analysis in bioinformatics

Author(s):  
Giovanni Felici ◽  
Daniele Ferone ◽  
Paola Festa ◽  
Antonio Napoletano ◽  
Tommaso Pastore

Data mining is one of the main activities in bioinformatics, specifically to extract knowledge from massive data sets related with gene expression measurement, CNV, DNA strings, and others. A long array of methods are used to perform such task, ranging from the more established parametric statistical analysis to non parametric techniques, to classification methods that have been developed in knowledge engineering and artificial intelligence. In this paper, we consider a method for extracting logic formulas from data that relies on a large body of literature in integer and logic optimization, originally presented in [1], that has been largely and successfully applied to different problems in bioinformatics ([2], [3], [4], [5], [6]). Such method is based on the iterative solution of Minimum Cost SAT Problems and is able to extract logic formulas in DNF form that possess interesting features for their interpretation. While leaving the discussion of the main features and motivations of this approach to the related literature, in this talk we focus on the problem of solving efficiently very large scale instances of this well known logic programming problem and propose a new GRASP approach that, being able to exploit the specific structure of the problem, largely outperforms other established solvers for the same problem. References [1] G. Felici, K. Truemper. A Minsat Approach for Learning in Logic Domains, INFORMS Journal on Computing 14(1): 20-36 (2002). [2] P. Bertolazzi, G. Felici, E. Weitschek. Learning to classify species with barcodes, BMC Bioinformatics, 10:1-12 (2009). [3] M. Arisi, R. D’Onofrio, A. Brandi, S. Felsani, G. Capsoni, G. Drovandi, G. Felici, E. Weitschek, P. Bertolazzi, A. Cattaneo. Gene Expression Biomarkers in the Brain of a Mouse Model for Alzheimer’s Disease: Mining of Microarray Data by Logic Classification and Feature Selection. Journal of Alzheimer's Disease, 24(4) 721-738 (2011). [4] E. Weitschek, A. Lo Presti, G. Drovandi, G. Felici, M. Ciccozzi, M. Ciotti, P. Bertolazzi. Human polyomaviruses identification by logic mining techniques. BMC Virology Journal, 9:58 (2012). [5] E. Weitschek, G. Fiscon, G. Felici. Supervised DNA Barcodes species classification: analysis, comparisons and results, BMC BioData Mining, 7:4 (2014). [6] P. Bertolazzi, G. Felici, P. Festa, G. Fiscon, E. Weitschek. Integer Programming models for Feature Selection: new extensions and a randomized solution algorithm, European Journal of Operational Research, 250-389–399, 250 (2016).

2016 ◽  
Author(s):  
Giovanni Felici ◽  
Daniele Ferone ◽  
Paola Festa ◽  
Antonio Napoletano ◽  
Tommaso Pastore

Data mining is one of the main activities in bioinformatics, specifically to extract knowledge from massive data sets related with gene expression measurement, CNV, DNA strings, and others. A long array of methods are used to perform such task, ranging from the more established parametric statistical analysis to non parametric techniques, to classification methods that have been developed in knowledge engineering and artificial intelligence. In this paper, we consider a method for extracting logic formulas from data that relies on a large body of literature in integer and logic optimization, originally presented in [1], that has been largely and successfully applied to different problems in bioinformatics ([2], [3], [4], [5], [6]). Such method is based on the iterative solution of Minimum Cost SAT Problems and is able to extract logic formulas in DNF form that possess interesting features for their interpretation. While leaving the discussion of the main features and motivations of this approach to the related literature, in this talk we focus on the problem of solving efficiently very large scale instances of this well known logic programming problem and propose a new GRASP approach that, being able to exploit the specific structure of the problem, largely outperforms other established solvers for the same problem. References [1] G. Felici, K. Truemper. A Minsat Approach for Learning in Logic Domains, INFORMS Journal on Computing 14(1): 20-36 (2002). [2] P. Bertolazzi, G. Felici, E. Weitschek. Learning to classify species with barcodes, BMC Bioinformatics, 10:1-12 (2009). [3] M. Arisi, R. D’Onofrio, A. Brandi, S. Felsani, G. Capsoni, G. Drovandi, G. Felici, E. Weitschek, P. Bertolazzi, A. Cattaneo. Gene Expression Biomarkers in the Brain of a Mouse Model for Alzheimer’s Disease: Mining of Microarray Data by Logic Classification and Feature Selection. Journal of Alzheimer's Disease, 24(4) 721-738 (2011). [4] E. Weitschek, A. Lo Presti, G. Drovandi, G. Felici, M. Ciccozzi, M. Ciotti, P. Bertolazzi. Human polyomaviruses identification by logic mining techniques. BMC Virology Journal, 9:58 (2012). [5] E. Weitschek, G. Fiscon, G. Felici. Supervised DNA Barcodes species classification: analysis, comparisons and results, BMC BioData Mining, 7:4 (2014). [6] P. Bertolazzi, G. Felici, P. Festa, G. Fiscon, E. Weitschek. Integer Programming models for Feature Selection: new extensions and a randomized solution algorithm, European Journal of Operational Research, 250-389–399, 250 (2016).


2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S96-S96
Author(s):  
Joshua Russell ◽  
Matt Kaeberlein

Abstract Here we present new computational and experimental methods to leverage the gene expression and neuropathology data collected from several large-scale studies of Alzheimer’s disease . These data sets include diverse data types, including transcriptomics, neuropathology phenotypes such as quantification of amyloid beta plaques and tau tangles in different brain regions, as well as assessments of dementia prior to death. This meta-analysis is a complex undertaking because the available data are from different studies and/or brain regions involving study-specific confounders and/or region-specific biological processes. We have therefore taken neural network and probabilistic computational approaches that reduce the data dimensionality, allowing statistical comparison across all brain samples. These approaches identify gene expression changes that are significantly associated with clinical and neuropathological assessment of Alzheimer’s disease. We then conduct in vivo validation of the genes through genetic screening of C. elegans models of Alzheimer's disease utilizing our automated robotic lifespan analysis platform. This approach allows for the greater leverage of existing Alzheimer’s disease biobank data to identify deep genetic signatures that could help identify new clinical gene-expression markers and pharmacological targets for Alzheimer’s disease.


2021 ◽  
Author(s):  
Shouneng Peng ◽  
Lu Zeng ◽  
Jean-vianney Haure-mirande ◽  
Minghui Wang ◽  
Derek M. Huffman ◽  
...  

Aging is a major risk factor for late-onset Alzheimer's disease (LOAD). How aging contributes to the development of LOAD remains elusive. In this study, we examine multiple large-scale human brain transcriptomic data from both normal aging and LOAD to understand the molecular interconnection between aging and LOAD. We find that shared gene expression changes between aging and LOAD are mostly seen in the hippocampus and several cortical regions. In the hippocampus, phosphoprotein, alternative splicing and cytoskeleton are the commonly dysregulated biological pathways in both aging and AD, while synapse, ion transport, and synaptic vesicle genes are commonly down-regulated. Aging-specific changes are associated with acetylation and methylation, while LOAD-specific changes are related to glycoprotein (both up- and down-regulations), inflammatory response (up-regulation), myelin sheath and lipoprotein (down-regulation). We also find that normal aging brains from relatively young donors (45-70 years old) cluster into subgroups and some subgroups show gene expression changes highly similar to those seen in LOAD brains. Using brain transcriptome data from older individuals (>70 years), we find that samples from cognitive normal older individuals cluster with the "healthy aging" subgroup while AD samples mainly cluster with the AD similar subgroups. This implies that individuals in the healthy aging subgroup will likely remain cognitive normal when they become older and vice versa. In summary, our results suggest that on the transcriptome level, aging and LOAD have strong interconnections in some brain regions in a subpopulation of cognitive normal aging individuals. This supports the theory that the initiation of LOAD occurs decades earlier than the manifestation of clinical phenotype and it may be essential to closely study the "normal brain aging" in a subgroup of individuals in their 40s-60s to identify the very early events in LOAD development.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Young Ho Park ◽  
Jung-Min Pyun ◽  
Angela Hodges ◽  
Jae-Won Jang ◽  
Paula J. Bice ◽  
...  

Abstract Background The interaction between the brain and periphery might play a crucial role in the development of Alzheimer’s disease (AD). Methods Using blood transcriptomic profile data from two independent AD cohorts, we performed expression quantitative trait locus (cis-eQTL) analysis of 29 significant genetic loci from a recent large-scale genome-wide association study to investigate the effects of the AD genetic variants on gene expression levels and identify their potential target genes. We then performed differential gene expression analysis of identified AD target genes and linear regression analysis to evaluate the association of differentially expressed genes with neuroimaging biomarkers. Results A cis-eQTL analysis identified and replicated significant associations in seven genes (APH1B, BIN1, FCER1G, GATS, MS4A6A, RABEP1, TRIM4). APH1B expression levels in the blood increased in AD and were associated with entorhinal cortical thickness and global cortical amyloid-β deposition. Conclusion An integrative analysis of genetics, blood-based transcriptomic profiles, and imaging biomarkers suggests that APH1B expression levels in the blood might play a role in the pathogenesis of AD.


2020 ◽  
Vol 17 (2) ◽  
pp. 141-157 ◽  
Author(s):  
Dubravka S. Strac ◽  
Marcela Konjevod ◽  
Matea N. Perkovic ◽  
Lucija Tudor ◽  
Gordana N. Erjavec ◽  
...  

Background: Neurosteroids Dehydroepiandrosterone (DHEA) and Dehydroepiandrosterone Sulphate (DHEAS) are involved in many important brain functions, including neuronal plasticity and survival, cognition and behavior, demonstrating preventive and therapeutic potential in different neuropsychiatric and neurodegenerative disorders, including Alzheimer’s disease. Objective: The aim of the article was to provide a comprehensive overview of the literature on the involvement of DHEA and DHEAS in Alzheimer’s disease. Method: PubMed and MEDLINE databases were searched for relevant literature. The articles were selected considering their titles and abstracts. In the selected full texts, lists of references were searched manually for additional articles. Results: We performed a systematic review of the studies investigating the role of DHEA and DHEAS in various in vitro and animal models, as well as in patients with Alzheimer’s disease, and provided a comprehensive discussion on their potential preventive and therapeutic applications. Conclusion: Despite mixed results, the findings of various preclinical studies are generally supportive of the involvement of DHEA and DHEAS in the pathophysiology of Alzheimer’s disease, showing some promise for potential benefits of these neurosteroids in the prevention and treatment. However, so far small clinical trials brought little evidence to support their therapy in AD. Therefore, large-scale human studies are needed to elucidate the specific effects of DHEA and DHEAS and their mechanisms of action, prior to their applications in clinical practice.


Sign in / Sign up

Export Citation Format

Share Document