Technique of Gene Expression Profiles Selection Based on SOTA Clustering Algorithm Using Statistical Criteria and Shannon Entropy

Author(s):  
Sergii Babichev ◽  
Orest Khamula ◽  
Bohdan Durnyak ◽  
Jiří Škvor
Blood ◽  
2003 ◽  
Vol 102 (2) ◽  
pp. 672-681 ◽  
Author(s):  
Damien Chaussabel ◽  
Roshanak Tolouei Semnani ◽  
Mary Ann McDowell ◽  
David Sacks ◽  
Alan Sher ◽  
...  

AbstractMonocyte-derived dendritic cells (DCs) and macrophages (Mϕs) generated in vitro from the same individual blood donors were exposed to 5 different pathogens, and gene expression profiles were assessed by microarray analysis. Responses to Mycobacterium tuberculosis and to phylogenetically distinct protozoan (Leishmania major, Leishmania donovani, Toxoplasma gondii) and helminth (Brugia malayi) parasites were examined, each of which produces chronic infections in humans yet vary considerably in the nature of the immune responses they trigger. In the absence of microbial stimulation, DCs and Mϕs constitutively expressed approximately 4000 genes, 96% of which were shared between the 2 cell types. In contrast, the genes altered transcriptionally in DCs and Mϕs following pathogen exposure were largely cell specific. Profiling of the gene expression data led to the identification of sets of tightly coregulated genes across all experimental conditions tested. A newly devised literature-based clustering algorithm enabled the identification of functionally and transcriptionally homogenous groups of genes. A comparison of the responses induced by the individual pathogens by means of this strategy revealed major differences in the functionally related gene profiles associated with each infectious agent. Although the intracellular pathogens induced responses clearly distinct from the extracellular B malayi, they each displayed a unique pattern of gene expression that would not necessarily be predicted on the basis of their phylogenetic relationship. The association of characteristic functional clusters with each infectious agent is consistent with the concept that antigen-presenting cells have prewired signaling patterns for use in the response to different pathogens.


Symmetry ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 1812
Author(s):  
Sergii Babichev ◽  
Lyudmyla Yasinska-Damri ◽  
Igor Liakh ◽  
Bohdan Durnyak

The problems of gene regulatory network (GRN) reconstruction and the creation of disease diagnostic effective systems based on genes expression data are some of the current directions of modern bioinformatics. In this manuscript, we present the results of the research focused on the evaluation of the effectiveness of the most used metrics to estimate the gene expression profiles’ proximity, which can be used to extract the groups of informative gene expression profiles while taking into account the states of the investigated samples. Symmetry is very important in the field of both genes’ and/or proteins’ interaction since it undergirds essentially all interactions between molecular components in the GRN and extraction of gene expression profiles, which allows us to identify how the investigated biological objects (disease, state of patients, etc.) contribute to the further reconstruction of GRN in terms of both the symmetry and understanding the mechanism of molecular element interaction in a biological organism. Within the framework of our research, we have investigated the following metrics: Mutual information maximization (MIM) using various methods of Shannon entropy calculation, Pearson’s χ2 test and correlation distance. The accuracy of the investigated samples classification was used as the main quality criterion to evaluate the appropriate metric effectiveness. The random forest classifier (RF) was used during the simulation process. The research results have shown that results of the use of various methods of Shannon entropy within the framework of the MIM metric disagree with each other. As a result, we have proposed the modified mutual information maximization (MMIM) proximity metric based on the joint use of various methods of Shannon entropy calculation and the Harrington desirability function. The results of the simulation have also shown that the correlation proximity metric is less effective in comparison to both the MMIM metric and Pearson’s χ2 test. Finally, we propose the hybrid proximity metric (HPM) that considers both the MMIM metric and Pearson’s χ2 test. The proposed metric was investigated within the framework of one-cluster structure effectiveness evaluation. To our mind, the main benefit of the proposed HPM is in increasing the objectivity of mutually similar gene expression profiles extraction due to the joint use of the various effective proximity metrics that can contradict with each other when they are used alone.


2016 ◽  
Vol 2016 ◽  
pp. 1-13
Author(s):  
Hung-Tsu Cheng ◽  
Chaang-Ray Chen ◽  
Chia-Yang Li ◽  
Chao-Ying Huang ◽  
Wun-Yi Shu ◽  
...  

We investigated the syndromes of theSinidecoction pattern (SDP), a common ZHENG in traditional Chinese medicine (TCM). The syndromes of SDP were correlated with various severeYang deficiencyrelated symptoms. To obtain a common profile for SDP, we distributed questionnaires to 300 senior clinical TCM practitioners. According to the survey, we concluded 2 sets of symptoms for SDP: (1) pulse feels deep or faint and (2) reversal cold of the extremities. Twenty-four individuals from Taipei City Hospital, Linsen Chinese Medicine Branch, Taiwan, were recruited. We extracted the total mRNA of peripheral blood mononuclear cells from the 24 individuals for microarray experiments. Twelve individuals (including 6 SDP patients and 6 non-SDP individuals) were used as the training set to identify biomarkers for distinguishing the SDP and non-SDP groups. The remaining 12 individuals were used as the test set. The test results indicated that the gene expression profiles of the identified biomarkers could effectively distinguish the 2 groups by adopting a hierarchical clustering algorithm. Our results suggest the feasibility of using the identified biomarkers in facilitating the diagnosis of TCM ZHENGs. Furthermore, the gene expression profiles of biomarker genes could provide a molecular explanation corresponding to the ZHENG of TCM.


Author(s):  
Manuel Martín-Merino

DNA Microarrays allow for monitoring the expression level of thousands of genes simultaneously across a collection of related samples. Supervised learning algorithms such as -NN or SVM (Support Vector Machines) have been applied to the classification of cancer samples with encouraging results. However, the classification algorithms are not able to discover new subtypes of diseases considering the gene expression profiles. In this chapter, the author reviews several supervised clustering algorithms suitable to discover new subtypes of cancer. Next, he introduces a semi-supervised clustering algorithm that learns a linear combination of dissimilarities from the a priory knowledge provided by human experts. A priori knowledge is formulated in the form of equivalence constraints. The minimization of the error function is based on a quadratic optimization algorithm. A norm regularizer is included that penalizes the complexity of the family of distances and avoids overfitting. The method proposed has been applied to several benchmark data sets and to human complex cancer problems using the gene expression profiles. The experimental results suggest that considering a linear combination of heterogeneous dissimilarities helps to improve both classification and clustering algorithms based on a single similarity.


2013 ◽  
pp. 1609-1625
Author(s):  
Manuel Martín-Merino

DNA Microarrays allow for monitoring the expression level of thousands of genes simultaneously across a collection of related samples. Supervised learning algorithms such as k-NN or SVM (Support Vector Machines) have been applied to the classification of cancer samples with encouraging results. However, the classification algorithms are not able to discover new subtypes of diseases considering the gene expression profiles. In this chapter, the author reviews several supervised clustering algorithms suitable to discover new subtypes of cancer. Next, he introduces a semi-supervised clustering algorithm that learns a linear combination of dissimilarities from the a priory knowledge provided by human experts. A priori knowledge is formulated in the form of equivalence constraints. The minimization of the error function is based on a quadratic optimization algorithm. A L2 norm regularizer is included that penalizes the complexity of the family of distances and avoids overfitting. The method proposed has been applied to several benchmark data sets and to human complex cancer problems using the gene expression profiles. The experimental results suggest that considering a linear combination of heterogeneous dissimilarities helps to improve both classification and clustering algorithms based on a single similarity.


2019 ◽  
Author(s):  
James P R Schofield ◽  
Fabio Strazzeri ◽  
Jeannette Bigler ◽  
Michael Boedigheimer ◽  
Ian M Adcock ◽  
...  

AbstractStratified medicine requires discretisation of disease populations for targeted treatments. We have developed and applied a discrete Morse theory clustering algorithm to a Topological Data Analysis (TDA) network model of 498 gene expression profiles of peripheral blood from asthma and healthy participants. The Morse clustering algorithm defined nine clusters, BC1-9, representing molecular phenotypes with discrete phenotypes including Type-1, 2 & 17 cytokine inflammatory pathways. The TDA network model and clusters were also characterised by activity of glucocorticoid receptor signalling associated with different expression profiles of glucocorticoid receptor (GR), according to microarray probesets targeted to the start or end of the GR mRNA’s 3’ UTR; suggesting differential GR mRNA processing as a possible driver of asthma phenotypes including steroid insensitivity.


Sign in / Sign up

Export Citation Format

Share Document