Ontological self-organizing maps for cluster visualization and functional summarization of gene products using Gene Ontology similarity measures

Author(s):  
Timothy C. Havens ◽  
James M. Keller ◽  
Mihail Popescu ◽  
James C. Bezdek
Author(s):  
Mohammed A. Khalilia ◽  
Mihail Popescu

The notion of Best-Matching Unit (BMU) in the proposed Fuzzy Relational Self-Organizing (FRSOM) algorithm is replaced by a membership function where every neuron has a certain degree of matching to an input object. The FRSOM is an extension of the relational self-organizing map. In the proposed FRSOM we incorporate a monotonically increasing fuzzifier and a monotonically decreasing neighborhood kernel. Initially, FRSOM assigns winning neurons. However, as time progresses adjacent neurons begin communicating and sharing information about the stimulus received. The amount of information being shared at a given time is governed by the fuzzifier and the number of neurons sharing information is controlled by the neighborhood kernel. Additionally, in this paper we show that FRSOM is the relational dual of Fuzzy Batch SOM (FBSOM) followed by experimental results comparing both FBSOM and FRSOM on synthetic datasets. Then we will demonstrate the visualization and summarization capabilities of FRSOM on two real relational datasets, Gene Ontology and a patient data consisting of Activity of Daily Living score trajectories.


Author(s):  
JAMES M. KELLER ◽  
JAMES C. BEZDEK ◽  
MIHAIL POPESCU ◽  
NIKHIL R. PAL ◽  
JOYCE A. MITCHELL ◽  
...  

The standard method for comparing gene products (proteins or RNA) is to compare their DNA or amino acid sequences. Additional information about some gene products may come from multiple sources, including the set of Gene Ontology (GO) annotations and the set of journal abstracts related to each gene product. Gene product similarity measures can be based on evaluating sets of descriptor terms found in the GO taxonomy, and/or the index term sets of the related documents (MeSH annotations). While our techniques can be applied to term sets from any taxonomy, we restrict our examples in this article to GO annotations. We investigate the use of linear order statistics (LOS) to build similarity relations on pairs of terms that are used in the GO as linguistic descriptors of genes and gene products. One of our objectives is to investigate the construction and utility of visual assessments of relational data (in this case, dissimilarity matrices) for discovering tendencies of groups of gene products to "cluster together". We use gene product data derived from a group of 194 gene products representing three protein families extracted from ENSEMBL. Our examples suggest that LOS similarity measures are more effective than traditional sequence-based similarity measures at capturing relationships between pairs of gene products in ENSEMBL families when annotation information is available. We show examples of how these similarity measures can assist in knowledge discovery and gene product family validation.


2019 ◽  
Vol 9 (9) ◽  
pp. 1870 ◽  
Author(s):  
Pavel Stefanovič ◽  
Olga Kurasova ◽  
Rokas Štrimaitis

In the paper the word-level n-grams based approach is proposed to find similarity between texts. The approach is a combination of two separate and independent techniques: self-organizing map (SOM) and text similarity measures. SOM’s uniqueness is that the obtained results of data clustering, as well as dimensionality reduction, are presented in a visual form. The four measures have been evaluated: cosine, dice, extended Jaccard’s, and overlap. First of all, texts have to be converted to numerical expression. For that purpose, the text has been split into the word-level n-grams and after that, the bag of n-grams has been created. The n-grams’ frequencies are calculated and the frequency matrix of dataset is formed. Various filters are used to create a bag of n-grams: stemming algorithms, number and punctuation removers, stop words, etc. All experimental investigation has been made using a corpus of plagiarized short answers dataset.


Sign in / Sign up

Export Citation Format

Share Document