symbolic data
Recently Published Documents


TOTAL DOCUMENTS

291
(FIVE YEARS 54)

H-INDEX

20
(FIVE YEARS 4)

Author(s):  
Peter beim Graben ◽  
Markus Huber ◽  
Werner Meyer ◽  
Ronald Römer ◽  
Matthias Wolff

AbstractVector symbolic architectures (VSA) are a viable approach for the hyperdimensional representation of symbolic data, such as documents, syntactic structures, or semantic frames. We present a rigorous mathematical framework for the representation of phrase structure trees and parse trees of context-free grammars (CFG) in Fock space, i.e. infinite-dimensional Hilbert space as being used in quantum field theory. We define a novel normal form for CFG by means of term algebras. Using a recently developed software toolbox, called FockBox, we construct Fock space representations for the trees built up by a CFG left-corner (LC) parser. We prove a universal representation theorem for CFG term algebras in Fock space and illustrate our findings through a low-dimensional principal component projection of the LC parser state. Our approach could leverage the development of VSA for explainable artificial intelligence (XAI) by means of hyperdimensional deep neural computation.


2021 ◽  
Vol 16 (1) ◽  
pp. 134-144
Author(s):  
Omer Raz ◽  
Dror Chawin ◽  
Uri B. Rom
Keyword(s):  

This report documents a dataset consisting of expert annotations (symbolic data) of interthematic (higher-level) cadences in the exposition sections of all of Mozart's instrumental sonata-allegro movements.


2021 ◽  
Vol 153 ◽  
pp. 111440
Author(s):  
Diego C. Nascimento ◽  
Bruno A. Pimentel ◽  
Renata M.C.R. Souza ◽  
Lilia Costa ◽  
Sandro Gonçalves ◽  
...  

2021 ◽  
Vol 27 (4) ◽  
Author(s):  
Aaron Carter-Ényì ◽  
Gilad Rabinovitch

Onset (metric position) and contiguity (pitch adjacency and time proximity) are two melodic features that contribute to the salience of individual notes (core tones) in a monophonic voice or polyphonic texture. Our approach to reductions prioritizes contextual features like onset and contiguity. By awarding points to notes with such features, our process selects core tones from melodic surfaces to produce a reduction. Through this reduction, a new form of musical pattern discovery is possible that has similarities to Gjerdingen’s (".fn_cite_year($gjerdingen_2007).") galant schemata. Recurring n-grams (scale degree skeletons) are matched in an algorithmic approach that we have tested manually (with a printed score and pen and paper) and implemented computationally (with symbolic data and scripted algorithms in MATLAB). A relatively simple method successfully identifies the location of all statements of the subject in Bach’s Fugue in C Minor (BWV 847) identified by Bruhn (".fn_cite_year($bruhn_1993).") and the location of all instances of the Prinner and Meyer schemata in Mozart’s Sonata in C Major (K. 545/i) identified by Gjerdingen (".fn_cite_year($gjerdingen_2007)."). We also apply the method to an excerpt by Kirnberger analyzed in Rabinovitch (".fn_cite_year($rabinovitch_2019)."). Analysts may use this flexible method for pattern discovery in reduced textures through software freely accessible at https://www.atavizm.org. While our case studies in the present article are from eighteenth-century European music, we believe our approach to reduction and pattern discovery is extensible to a variety of musics.


Author(s):  
Hongjing Zhang ◽  
Ian Davidson

Recent work on explainable clustering allows describing clusters when the features are interpretable. However, much modern machine learning focuses on complex data such as images, text, and graphs where deep learning is used but the raw features of data are not interpretable. This paper explores a novel setting for performing clustering on complex data while simultaneously generating explanations using interpretable tags. We propose deep descriptive clustering that performs sub-symbolic representation learning on complex data while generating explanations based on symbolic data. We form good clusters by maximizing the mutual information between empirical distribution on the inputs and the induced clustering labels for clustering objectives. We generate explanations by solving an integer linear programming that generates concise and orthogonal descriptions for each cluster. Finally, we allow the explanation to inform better clustering by proposing a novel pairwise loss with self-generated constraints to maximize the clustering and explanation module's consistency. Experimental results on public data demonstrate that our model outperforms competitive baselines in clustering performance while offering high-quality cluster-level explanations.


Author(s):  
Sahana Munavalli ◽  
◽  
Sanjeevakumar M. Hatture ◽  

In the era of digitization the frauds are found in all categories of health insurance. It is finished next to deliberate trickiness or distortion for acquiring some pitiful advantage in the form of health expenditures. Bigdata analysis can be utilized to recognize fraud in large sets of insurance claim data. In light of a couple of cases that are known or suspected to be false, the anomaly detection technique computes the closeness of each record to be fake by investigating the previous insurance claims. The investigators would then be able to have a nearer examination for the cases that have been set apart by data mining programming. One of the issues is the abuse of the medical insurance systems. Manual detection of frauds in the healthcare industry is strenuous work. Fraud and Abuse in the Health care system have become a significant concern and that too inside health insurance organizations, from the most recent couple of years because of the expanding misfortunes in incomes, handling medical claims have become a debilitating manual assignment, which is done by a couple of clinical specialists who have the duty of endorsing, adjusting, or dismissing the appropriations mentioned inside a restricted period from their gathering. Standard data mining techniques at this point do not sufficiently address the intricacy of the world. In this way, utilizing Symbolic Data Analysis is another sort of data analysis that permits us to address the intricacy of the real world and to recognize misrepresentation in the dataset.


2021 ◽  
Author(s):  
Kian Farsandaj

In the last decade, selecting suitable web services based on users’ requirements has become one of the major subjects in the web service domain. Any research works have been done - either based on functional requirements, or focusing more on Quality of Service (QoS) - based selection. We believe that searching is not the only way to implement the selection. Selection could also be done by browsing, or by a combination of searching and browsing. In this thesis, we propose a browsing method based on the Scatter/Gather model, which helps users gain a better understanding of the QoS value distribution of the web services and locate their desired services. Because the Scatter/Gather model uses cluster analysis techniques and web service QoS data is best represented as a vector of intervals, or more generically a vector of symbolic data, we apply for symbolic clustering algorithm and implement different variations of the Scatter/Gather model. Through our experiments on both synthetic and real datasets, we identify the most efficient ( based on the processing time) and effective implementations.


2021 ◽  
Author(s):  
Kian Farsandaj

In the last decade, selecting suitable web services based on users’ requirements has become one of the major subjects in the web service domain. Any research works have been done - either based on functional requirements, or focusing more on Quality of Service (QoS) - based selection. We believe that searching is not the only way to implement the selection. Selection could also be done by browsing, or by a combination of searching and browsing. In this thesis, we propose a browsing method based on the Scatter/Gather model, which helps users gain a better understanding of the QoS value distribution of the web services and locate their desired services. Because the Scatter/Gather model uses cluster analysis techniques and web service QoS data is best represented as a vector of intervals, or more generically a vector of symbolic data, we apply for symbolic clustering algorithm and implement different variations of the Scatter/Gather model. Through our experiments on both synthetic and real datasets, we identify the most efficient ( based on the processing time) and effective implementations.


Stats ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 359-384
Author(s):  
Manabu Ichino ◽  
Kadri Umbleja ◽  
Hiroyuki Yaguchi

This paper presents an unsupervised feature selection method for multi-dimensional histogram-valued data. We define a multi-role measure, called the compactness, based on the concept size of given objects and/or clusters described using a fixed number of equal probability bin-rectangles. In each step of clustering, we agglomerate objects and/or clusters so as to minimize the compactness for the generated cluster. This means that the compactness plays the role of a similarity measure between objects and/or clusters to be merged. Minimizing the compactness is equivalent to maximizing the dis-similarity of the generated cluster, i.e., concept, against the whole concept in each step. In this sense, the compactness plays the role of cluster quality. We also show that the average compactness of each feature with respect to objects and/or clusters in several clustering steps is useful as a feature effectiveness criterion. Features having small average compactness are mutually covariate and are able to detect a geometrically thin structure embedded in the given multi-dimensional histogram-valued data. We obtain thorough understandings of the given data via visualization using dendrograms and scatter diagrams with respect to the selected informative features. We illustrate the effectiveness of the proposed method by using an artificial data set and real histogram-valued data sets.


Sign in / Sign up

Export Citation Format

Share Document