SOM-Based Clustering of Textual Documents Using WordNet

Author(s):  
Abdelmalek Amine ◽  
Zakaria Elberrichi ◽  
Michel Simonet ◽  
Ladjel Bellatreche ◽  
Mimoun Malki

The classification of textual documents has been the subject of many studies. Technologies like the Web and numerical libraries facilitated the exponential growth of available documentation. The classification of textual documents is very important since it allows the users to effectively and quickly fly over and understand better the contents of large corpora. Most classification approaches use the supervised method of training, more suitable with small corpora and when human experts are available to generate the best classes of data for the training phase, which is not always feasible. The unsupervised classification or “clustering” methods make emerge latent (hidden) classes automatically with minimum human intervention, There are many, and the SOM (self Organized Maps) by Kohonen is one of the algorithms for unsupervised classification that gather a certain number of similar objects in groups without a priori knowledge. This chapter introduces the concept of unsupervised classification of textual documents and proposes an experiment with a conceptual approach for the representation of texts and the method of Kohonen for clustering.

2004 ◽  
Vol 01 (04) ◽  
pp. 647-680 ◽  
Author(s):  
STERGIOS PAPADIMITRIOU ◽  
SPIRIDON D. LIKOTHANASSIS

Self-Organized Maps (SOMs) are a popular approach for analyzing genome-wide expression data. However, most SOM based approaches ignore prior knowledge about functional gene categories. Also, Self Organized Map (SOM) based approaches usually develop topographic maps with disjoint and uniform activation regions that correspond to a hard clustering of the patterns at their nodes. We present a novel Self-Organizing map, the Kernel Supervised Dynamic Grid Self-Organized Map (KSDG-SOM). This model adapts its parameters in a kernel space. Gaussian kernels are used and their mean and variance components are adapted in order to optimize the fitness to the input density. The KSDG-SOM also grows dynamically up to a size defined with statistical criteria. It is capable of incorporating a priori information for the known functional characteristics of genes. This information forms a supervised bias at the cluster formation and the model owns the potentiality of revising incorrect functional labels. The new method overcomes the main drawbacks of most of the existing clustering methods that lack a mechanism for dynamical extension on the basis of a balance between unsupervised and supervised drives.


Author(s):  
Olivier Salvado ◽  
Pierrick Bourgeat ◽  
Oscar Acosta Tamayo ◽  
Maria Zuluaga ◽  
Sebastien Ourselin

2014 ◽  
Vol 4 (2) ◽  
pp. 194
Author(s):  
Ahwan Fanani

<p>The philosophy of Islamic Law (uşûl al-fiqh) has been known as occupying the central position</p>in the whole structure of Islamic jurisprudence. Its method and logic of legal extrapolation has dominated not only the legal sphere of the jurists but also influenced the philosophers and the scholastics in their method and way of thinking. Uşûl al-fiqh is mainly deductive in its approach and is concerned with the analysis of linguistics. Now with the developments of new methods in legal and linguistic studies, many scholars have attempted to introduce new way of interpreting Islamic law by bringing up hermeneutics as the main tool. Hermeneutics is about interpreting text by taking into consideration the cultural and personal backgrounds of the author. It also teaches that in reading a text, a reader must be neutral in that he should not have in mind an a priori knowledge and assumption about the subject. This paper is concerned with exploring the dynamics of both uşûl al-fiqh and hermeneutics in the context of developing the Islamic Law in contemporary life.


2021 ◽  
Author(s):  
Pablo Millan Arias ◽  
Fatemeh Alipour ◽  
Kathleen Hill ◽  
Lila Kari

We present a novel Deep Learning method for the Unsupervised Classification of DNA Sequences (DeLUCS) that does not require sequence alignment, sequence homology, or (taxonomic) identifiers. DeLUCS uses Chaos Game Representations (CGRs) of primary DNA sequences, and generates “mimic” sequence CGRs to self-learn data patterns (genomic signatures) through the optimization of multiple neural networks. A majority voting scheme is then used to determine the final cluster label for each sequence. DeLUCS is able to cluster large and diverse datasets, with accuracies ranging from 77% to 100%: 2,500 complete vertebrate mitochondrial genomes, at taxonomic levels from sub-phylum to genera; 3,200 randomly selected 400 kbp-long bacterial genome segments, into families; three viral genome and gene datasets, averaging 1,300 sequences each, into virus subtypes. DeLUCS significantly outperforms two classic clustering methods (K-means and Gaussian Mixture Models) for unlabelled data, by as much as 48%. DeLUCS is highly effective, it is able to classify datasets of unlabelled primary DNA sequences totalling over 1 billion bp of data, and it bypasses common limitations to classification resulting from the lack of sequence homology, variation in sequence length, and the absence or instability of sequence annotations and taxonomic identifiers. Thus, DeLUCS offers fast and accurate DNA sequence classification for previously unclassifiable datasets.


2000 ◽  
Vol 72 (8-9) ◽  
pp. 131-141
Author(s):  
Janko Kubinjec

The subjective and objective spirit do not differ by the degree of their authenticity, but only by different spheres to which they extend. The law reaches both the subjective sphere and the objective sphere of the spirit, but the laws on which it is based belong exclusively to the sphere of the subjective spirit. The laws on which the law is based are imanent to man as a spiritual being and they are transcendental to man as the subject of knowledge. They are the object of an a priori knowledge, in contrast to the law itself which is the object of an a posteriori knowledge. The subject of methaphysics is a priori knowledge of the laws on which the law is based and this is. at the same time, the limit of its competences.


2012 ◽  
Vol 58 (3) ◽  
pp. 510-516 ◽  
Author(s):  
Maria R. Servedio ◽  
Michael Kopp

Abstract The extent to which sexual selection is involved in speciation with gene flow remains an open question and the subject of much research. Here, we propose that some insight can be gained from considering the concept of magic traits (i.e., traits involved in both reproductive isolation and ecological divergence). Both magic traits and other, “non-magic”, traits can contribute to speciation via a number of specific mechanisms. We argue that many of these mechanisms are likely to differ widely in the extent to which they involve sexual selection. Furthermore, in some cases where sexual selection is present, it may be prone to inhibit rather than drive speciation. Finally, there are a priori reasons to believe that certain categories of traits are much more effective than others in driving speciation. The combination of these points suggests a classification of traits that may shed light on the broader role of sexual selection in speciation with gene flow. In particular, we suggest that sexual selection can act as a driver of speciation in some scenarios, but may play a negligible role in potentially common categories of magic traits, and may be likely to inhibit speciation in common categories of non-magic traits.


2006 ◽  
Vol 6 (1) ◽  
pp. 151-175 ◽  
Author(s):  
Luna Filipović

Talmy’s (1985) typology proposes a classification of languages on the basis of their lexicalization patterns. All languages exhibit the tendency to code either manner or path of motion in the verb, and thus are divided accordingly into two main typological groups. The fact that languages code components of a motion event differently is therefore not a novelty, and it is only the point of departure in this paper. The aim here is to account for why these differences in lexicalization may occur. Many, if not all, languages make use of both patterns in expressions of motion events, and the reasons why one is more prominent and favoured than the other and on what occasions is the subject of the present discussion. Extensive data from Serbian/Croatian is presented in a contrastive setting in order to highlight the claim that the typology is best seen as a cline rather than a dichotomy. Two original hypotheses that explain the use of patterns in Serbian/Croatian are put forward, with the possibility to apply them further in analyses of other languages. It will be observed that the mechanism of lexicalization can be explained only after all the levels where meaning is conveyed in a language, namely morphology, syntax and semantics, are subjected to an analysis in a unifying fashion. Moreover, a network of the crucial spatial and temporal parameters is suggested here in order to distinguish event types and determine the language-specific means that reflect those event differences. Particular attention has been paid to a phase in motion events termed moment-of-change, which is notably absent from the relevant discussions in the literature. Another important contribution is the emphasis on the importance of deixis in the lexicalization of motion events, which has not been given the attention it deserves.


2020 ◽  
Vol 28 (3) ◽  
pp. 24-35 ◽  
Author(s):  
Hamza Usman ◽  
Mohd Lizam ◽  
Muhammad Usman Adekunle

AbstractAccurate pricing of the property market is necessary to ensure effective and efficient decision making. Property price is typically modelled using the hedonic price model (HPM). This approach was found to exhibit aggregation bias due to its assumption that the coefficient estimate is constant and fails to consider variation in location. The aggregation bias is minimized by segmenting the property market into submarkets that are distinctly homogeneous within their submarket and heterogeneous across other submarkets. Although such segmentation was found to improve the prediction accuracy of HPM, there appear to be conflicting findings regarding what constitutes a submarket and how the submarkets are to be driven. This paper therefore reviews relevant literature on the subject matter. It was found that, initially, submarkets were delineated based on a priori classification of the property market into predefined boundaries. The method was challenged to be arbitrary and an empirically statistical data-driven property submarket classification was advocated. Based on the review, there is no consensus on the superiority of either of the methods over the another; a combination of the two methods can serve as a means of validating the effectiveness of property segmentation procedures for more accurate property price prediction.


2018 ◽  
Vol 8 ◽  
pp. 68-73
Author(s):  
Andrii Martyn

The key features of the blockchain databases, such as decentralization, distribution, security, and record of the history of all transactions, create significant prospects for their application in the field of cadastre and real estate registration activities, including creation of the global real estate cadastre infrastructure, which will be able to go beyond national legal systems and jurisdictions. The conceptual approach to registration of land plots as spatial objects using blockchain technology is proposed. The land plot should be considered as a combination of smart contracts between landowners, surveyors, appraisers, notaries and other persons. The subject of such contracts will be the description and establishment of spatial (plot boundaries, territorial zones, etc.) and other (property rights and encumbrances, monetary valuation, soil bonitet, etc.) characteristics of land plots. The classification of such smart contracts reliability is also presented.


2015 ◽  
Vol 1 (7) ◽  
pp. e1500163 ◽  
Author(s):  
Susanne Gerber ◽  
Illia Horenko

Cluster analysis is one of the most popular data analysis tools in a wide range of applied disciplines. We propose and justify a computationally efficient and straightforward-to-implement way of imposing the available information from networks/graphs (a priori available in many application areas) on a broad family of clustering methods. The introduced approach is illustrated on the problem of a noninvasive unsupervised brain signal classification. This task is faced with several challenging difficulties such as nonstationary noisy signals and a small sample size, combined with a high-dimensional feature space and huge noise-to-signal ratios. Applying this approach results in an exact unsupervised classification of very short signals, opening new possibilities for clustering methods in the area of a noninvasive brain-computer interface.


Sign in / Sign up

Export Citation Format

Share Document