scholarly journals SELECTION OF METRIC AND CATEGORICAL ATTRIBUTES OF RARE ANOMALOUS EVENTS IN A COMPUTER SYSTEM USING DATA MINING METHODS

T-Comm ◽  
2021 ◽  
Vol 15 (6) ◽  
pp. 40-47
Author(s):  
Oleg I. Sheluhin ◽  
◽  
Dmitry I. Rakovsky ◽  

The process of marking multi-attribute experimental data for subsequent use by means of data mining in problems of detection and classification of rare anomalous events of computer systems (CS) is considered. The labeling process is carried out using three methods: manual preprocessing, statistical analysis and cluster analysis. Among the attributes of the metric type, the authors identified two macrogroups: “integral attributes” and “impulse attributes”. It is shown that the combination of statistical and cluster analysis methods increases the accuracy of detecting anomalous events in the CS, and also allows the selection of attributes according to their information significance. The expediency of manual preprocessing of data before clustering is shown by the example of dividing attributes into macrogroups, analyzing the density distribution using violin plot and removing the trend component using the method difference stationary series. With the help of construction of violin diagrams (Violin plot) for the attribute of the “integral” macrogroup, the distribution of states of the CS is shown. It is shown that the removal of the trend component by the DS-series method, normalization and reduction to absolute values allows more accurate marking of anomalous outliers, but this is not always acceptable. The interpretation of the clustering results performed for each normalized attribute shows that the normal values for all attributes are concentrated around zero values. The result of labeling experimental data is attribute-labeled data, where each attribute at the current time is assigned one of two states: abnormal or normal.

2018 ◽  
Vol 3 (1) ◽  
pp. 001
Author(s):  
Zulhendra Zulhendra ◽  
Gunadi Widi Nurcahyo ◽  
Julius Santony

In this study using Data Mining, namely K-Means Clustering. Data Mining can be used in searching for a large enough data analysis that aims to enable Indocomputer to know and classify service data based on customer complaints using Weka Software. In this study using the algorithm K-Means Clustering to predict or classify complaints about hardware damage on Payakumbuh Indocomputer. And can find out the data of Laptop brands most do service on Indocomputer Payakumbuh as one of the recommendations to consumers for the selection of Laptops.


2010 ◽  
Vol 37 (7) ◽  
pp. 5259-5264 ◽  
Author(s):  
Seyed Mohammad Seyed Hosseini ◽  
Anahita Maleki ◽  
Mohammad Reza Gholamian

1991 ◽  
Vol 71 (4) ◽  
pp. 1069-1080 ◽  
Author(s):  
A. G. Thomas ◽  
M. R. T. Dale

The phytosociological structure of weed communities in spring wheat, barley, oats, flax, and canola was investigated using data collected during a 3-yr survey of 1384 fields in Manitoba. Fields were surveyed during July and August, after the application of all herbicides. Association and cluster analysis techniques, using the presence or absence of species in a field, were employed to distinguish co-occurring groups of species. Only a small number of significant positive and negative associations were found between species and only minor clusters with a few species were formed at low similarity levels. These results indicated that the weed community was composed of species responding to conditions more or less independently of each other. A comparison of weed associations among the five crops and four geographic regions in the province indicated that the weed community structure was determined largely by climatic variables. The pattern of weed association in the four geographic regions was correlated with differences in temperature and precipitation during the spring and summer. The lack of floristic differentiation was attributed to the fact that production practices were similar for the five spring-seeded crops. Key words: Weed communities, weed ecology, cluster analysis, association analysis


Author(s):  
Maria M. Suarez-Alvarez ◽  
Duc-Truong Pham ◽  
Mikhail Y. Prostov ◽  
Yuriy I. Prostov

Normalization of feature vectors of datasets is widely used in a number of fields of data mining, in particular in cluster analysis, where it is used to prevent features with large numerical values from dominating in distance-based objective functions. In this study, a unified statistical approach to normalization of all attributes of mixed databases, when different metrics are used for numerical and categorical data, is proposed. After the proposed normalization, the contributions of both numerical and categorical attributes to a specified objective function are statistically the same. Formulae for the statistically normalized Minkowski mixed p -metrics are given in an explicit way. It is shown that the classic z -score standardization and the min–max normalization are particular cases of the statistical normalization, when the objective function is, respectively, based on the Euclidean or the Tchebycheff (Chebyshev) metrics. Finally, clustering of several benchmark datasets is performed with non-normalized and introduced normalized mixed metrics using either the k -prototypes (for p =2) or another algorithm (for p ≠2).


2014 ◽  
Vol 687-691 ◽  
pp. 1254-1257
Author(s):  
Hui Hui

By applying the DM technologies such as Association analysis and Cluster analysis, this paper has made systematic empirical research combined with the postgraduate admission data of the key College C in Beijing, and also made description and analysis of the mining results. This paper has applied the DM method and knowledge theory in practice, which has offered strong support for the postgraduate admission management of College C.


2013 ◽  
Vol 321-324 ◽  
pp. 2995-2998
Author(s):  
Yun Jiang ◽  
Chong Wang ◽  
Dong Chen

By collecting the major group buy websites data, this paper using factor analysis and cluster analysis in data mining methods to analysis it, classify the group buy website, find and analysis the group buy website operating key strategy.


Sign in / Sign up

Export Citation Format

Share Document