Mining Text Documents for Thematic Hierarchies Using Self-Organizing Maps

Data Mining ◽  
2011 ◽  
pp. 199-219 ◽  
Author(s):  
Hsin-Chang Yang ◽  
Chung-Hong Lee

Recently, many approaches have been devised for mining various kinds of knowledge from texts. One important application of text mining is to identify themes and the semantic relations among these themes for text categorization. Traditionally, these themes were arranged in a hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human beings. The determination of category themes and their hierarchical structures was mostly done by human experts. In this work, we developed an approach to automatically generate category themes and reveal the hierarchical structure among them. We also used the generated structure to categorize text documents. The document collection was trained by a self-organizing map to form two feature maps. We then analyzed these maps and obtained the category themes and their structure. Although the test corpus contains documents written in Chinese, the proposed approach can be applied to documents written in any language, and such documents can be transformed into a list of separated terms.

2013 ◽  
Vol 65 ◽  
pp. 24-33
Author(s):  
Pavel Stefanovič ◽  
Olga Kurasova

Straipsnyje nagrinėjama dokumentų panašumų paieška naudojant du populiarius metodus: saviorganizuojančius neuroninius tinklus (SOM) ir k vidurkių metodą. Vienas iš šių metodų tikslų – suskirstyti duomenis į klasterius pagal jų panašumą. Analizuota tekstinių dokumentų matricos sudarymo faktorių įtaka gautiems rezultatams. SOM kokybei įvertinti pasiūlyti du nauji matai, skirti klasifi kuotiems duomenims, kurių reikšmės parodo susidariusių klasterių išsidėstymą SOM žemėlapyje. Pirmasis matas parodo, kaip gerai tos pačios klasės duomenys išsidėsto žemėlapyje vienas šalia kito, antrasis matas – kaip toli yra skirtingų klasių centrai. K vidurkių metodu gautų rezultatų kokybei įvertinti skaičiuota suma nuo klasterio centro iki klasterio narių bei įvertintas klasių nesutapimas su klasteriais. Eksperimentiniams tyrimams atlikti pasirinkti tekstiniai dokumentai, paimti iš Lietuvos Respublikos Seimo dokumentų bazės.Similarity analysis of text documents by self-organizing maps and k-means Pavel Stefanovič, Olga Kurasova SummaryIn this paper, we try to fi nd similarities of different text documents by the self-organizing map (SOM) and k-means method. One of the main goals of these methods is to cluster a dataset. Using SOM, the similarities of documents can be observed visually. Both methods can be used only for numerical information, so we analyse the different options by converting text data on to numerical in order to get better results. To estimate the SOM quality, when the classifi ed data are analysed, we propose two new measures: distances between SOM cells, corresponding to data items assigned to the same class, and the distance between centres of SOM cells, corresponding to different classes. We also analyse the results of visualization by self-organizing maps. In order to estimate the k-means quality, we calculate the sum of distances between cluster centres and class members and also we estimate assignment of the data from particular classes to the clusters. The experiments have been carried out using three datasets ocquired from the document database of Seimas of the Republic of Lithuania.font-family: Calibri, sans-serif;"> 


Medicina ◽  
2021 ◽  
Vol 57 (3) ◽  
pp. 235
Author(s):  
Diego Galvan ◽  
Luciane Effting ◽  
Hágata Cremasco ◽  
Carlos Adam Conte-Junior

Background and objective: In the current pandemic scenario, data mining tools are fundamental to evaluate the measures adopted to contain the spread of COVID-19. In this study, unsupervised neural networks of the Self-Organizing Maps (SOM) type were used to assess the spatial and temporal spread of COVID-19 in Brazil, according to the number of cases and deaths in regions, states, and cities. Materials and methods: The SOM applied in this context does not evaluate which measures applied have helped contain the spread of the disease, but these datasets represent the repercussions of the country’s measures, which were implemented to contain the virus’ spread. Results: This approach demonstrated that the spread of the disease in Brazil does not have a standard behavior, changing according to the region, state, or city. The analyses showed that cities and states in the north and northeast regions of the country were the most affected by the disease, with the highest number of cases and deaths registered per 100,000 inhabitants. Conclusions: The SOM clustering was able to spatially group cities, states, and regions according to their coronavirus cases, with similar behavior. Thus, it is possible to benefit from the use of similar strategies to deal with the virus’ spread in these cities, states, and regions.


2017 ◽  
Vol 2017 ◽  
pp. 1-11 ◽  
Author(s):  
Adeoluwa Akande ◽  
Ana Cristina Costa ◽  
Jorge Mateu ◽  
Roberto Henriques

The explosion of data in the information age has provided an opportunity to explore the possibility of characterizing the climate patterns using data mining techniques. Nigeria has a unique tropical climate with two precipitation regimes: low precipitation in the north leading to aridity and desertification and high precipitation in parts of the southwest and southeast leading to large scale flooding. In this research, four indices have been used to characterize the intensity, frequency, and amount of rainfall over Nigeria. A type of Artificial Neural Network called the self-organizing map has been used to reduce the multiplicity of dimensions and produce four unique zones characterizing extreme precipitation conditions in Nigeria. This approach allowed for the assessment of spatial and temporal patterns in extreme precipitation in the last three decades. Precipitation properties in each cluster are discussed. The cluster closest to the Atlantic has high values of precipitation intensity, frequency, and duration, whereas the cluster closest to the Sahara Desert has low values. A significant increasing trend has been observed in the frequency of rainy days at the center of the northern region of Nigeria.


2021 ◽  
Vol 11 (4) ◽  
pp. 1933
Author(s):  
Hiroomi Hikawa ◽  
Yuta Ichikawa ◽  
Hidetaka Ito ◽  
Yutaka Maeda

In this paper, a real-time dynamic hand gesture recognition system with gesture spotting function is proposed. In the proposed system, input video frames are converted to feature vectors, and they are used to form a posture sequence vector that represents the input gesture. Then, gesture identification and gesture spotting are carried out in the self-organizing map (SOM)-Hebb classifier. The gesture spotting function detects the end of the gesture by using the vector distance between the posture sequence vector and the winner neuron’s weight vector. The proposed gesture recognition method was tested by simulation and real-time gesture recognition experiment. Results revealed that the system could recognize nine types of gesture with an accuracy of 96.6%, and it successfully outputted the recognition result at the end of gesture using the spotting result.


Author(s):  
Macario O. Cordel ◽  
Arnulfo P. Azcarraga

Several time-critical problems relying on large amount of data, e.g., business trends, disaster response and disease outbreak, require cost-effective, timely and accurate data summary and visualization, in order to come up with an efficient and effective decision. Self-organizing map (SOM) is a very effective data clustering and visualization tool as it provides intuitive display of data in lower-dimensional space. However, with [Formula: see text] complexity, SOM becomes inappropriate for large datasets. In this paper, we propose a force-directed visualization method that emulates SOMs capability to display the data clusters with [Formula: see text] complexity. The main idea is to perform a force-directed fine-tuning of the 2D representation of data. To demonstrate the efficiency and the vast potential of the proposed method as a fast visualization tool, the methodology is used to do a 2D-projection of the MNIST handwritten digits dataset.


2019 ◽  
Vol 1 (1) ◽  
pp. 194-202
Author(s):  
Adrian Costea

Abstract This paper assesses the financial performance of Romania’s non-banking financial institutions (NFIs) using a neural network training algorithm proposed by Kohonen, namely the Self-Organizing Maps algorithm. The algorithm takes the financial dataset and positiones each observation into a self-organizing map (a two-dimensional map) which can be latter used to visualize the trajectories of an individual NFI and explain it based on different performance dimensions, such as capital adequacy, assets’ quality and profitability. Further, we use the map as an early-warning system that would accurately forecast the NFIs future performance (whether they would stay or be eliminated from the NFI’s Special Register three quarters into the future). The results are promising: the model is able to correctly predict NFIs’ performance movements. Finally, we compared the results of our SOM-based model with those obtained by applying a multivariate logit-based model. The SOM model performed worse in discriminating the NFIs’ performance: the performance classes were not clearly defined and the model lacked the interpretability of the results. In the contrary, the multivariate logit coefficients have nice interpretability and an individual default probability estimate is obtained for each new observation. However, we can benefit from the results of both techniques: the visualization capabilities of the SOM model and the interpretability of multivariate logit-based model.


2009 ◽  
Vol 18 (04) ◽  
pp. 603-611 ◽  
Author(s):  
CHIH-FONG TSAI ◽  
YUAH-CHIAO LIN ◽  
YI-TING WANG

Stock trading activities are always very popular in many countries. Generally, investors with various backgrounds have different preferences over the stocks they trade. In literature, a number of studies examine the institutions' holding preferences for certain stock characteristics when choosing the security portfolio. However, very few studies investigate the stock trading preferences of individual investors. In this paper, we focus on two factors which affect the portfolio choices of investors, which are stock characteristics and investor features. In particular, a self-organizing map (SOM) is used to group a certain number of clusters based on a chosen dataset. Then, the decision tree model is used to extract useful rules from the clusters which contain the most trading records in the sample. We find that if the investors are females, less wealthy, and make stock trades with lower frequencies, they will be more careful and conservative. On the other hand, if the investors are males, having a high level of wealth, and make stock trades very often, they tend to choose stocks with high EPS, high market-to-book, and high prices.


Energies ◽  
2019 ◽  
Vol 12 (15) ◽  
pp. 2980 ◽  
Author(s):  
Bizhong Xia ◽  
Yadi Yang ◽  
Jie Zhou ◽  
Guanghao Chen ◽  
Yifan Liu ◽  
...  

Battery sorting is an important process in the production of lithium battery module and battery pack for electric vehicles (EVs). Accurate battery sorting can ensure good consistency of batteries for grouping. This study investigates the mechanism of inconsistency of battery packs and process of battery sorting on the lithium-ion battery module production line. Combined with the static and dynamic characteristics of lithium-ion batteries, the battery parameters on the production line that can be used as a sorting basis are analyzed, and the parameters of battery mass, volume, resistance, voltage, charge/discharge capacity and impedance characteristics are measured. The data of batteries are processed by the principal component analysis (PCA) method in statistics, and after analysis, the parameters of batteries are obtained. Principal components are used as sorting variables, and the self-organizing map (SOM) neural network is carried out to cluster the batteries. Group experiments are carried out on the separated batteries, and state of charge (SOC) consistency of the batteries is achieved to verify that the sorting algorithm and sorting result is accurate.


2018 ◽  
Vol 27 (2) ◽  
pp. 111-126 ◽  
Author(s):  
Thommen George Karimpanal ◽  
Roland Bouffanais

The idea of reusing or transferring information from previously learned tasks (source tasks) for the learning of new tasks (target tasks) has the potential to significantly improve the sample efficiency of a reinforcement learning agent. In this work, we describe a novel approach for reusing previously acquired knowledge by using it to guide the exploration of an agent while it learns new tasks. In order to do so, we employ a variant of the growing self-organizing map algorithm, which is trained using a measure of similarity that is defined directly in the space of the vectorized representations of the value functions. In addition to enabling transfer across tasks, the resulting map is simultaneously used to enable the efficient storage of previously acquired task knowledge in an adaptive and scalable manner. We empirically validate our approach in a simulated navigation environment and also demonstrate its utility through simple experiments using a mobile micro-robotics platform. In addition, we demonstrate the scalability of this approach and analytically examine its relation to the proposed network growth mechanism. Furthermore, we briefly discuss some of the possible improvements and extensions to this approach, as well as its relevance to real-world scenarios in the context of continual learning.


Sign in / Sign up

Export Citation Format

Share Document