Robust feature collection and classification of network culture

2021 ◽  
pp. 1-11
Author(s):  
Ya Gao

The network provides a convenient mechanism for publishing and obtaining documents, and has now become a gathering place for all kinds of information. In the network, the amount of information increases exponentially, and how to dig useful patterns or knowledge from the massive network culture has become a hot topic for scholars. In data mining, in order to enable readers to quickly obtain the content of interest, research text classification, and automatically classify text data according to a certain classification model. Internet cultural text data has the characteristics of unstructured, subjective, high-dimensional, etc., which makes it difficult for text mining algorithms to extract effective and easy-to-understand classification rules, and the computational complexity is too high. This paper proposes a feature selection method based on robust features, using sample deviation and variance as the criteria for feature attributes to rank the importance of feature attributes, and select the best feature attribute subset. The experimental results show that the classification accuracy of the feature selection method based on sample deviation and variance proposed in this paper is higher than the traditional word frequency as the feature selection method, which proves the feasibility and superiority of the feature selection method proposed in this paper.

2010 ◽  
Vol 44-47 ◽  
pp. 1130-1134
Author(s):  
Sheng Li ◽  
Pei Lin Zhang ◽  
Bing Li

Feature selection is a key step in hydraulic system fault diagnosis. Some of the collected features are unrelated to classification model, and some are high correlated to other features. These features are harmful for establishing classification model. In order to solve this problem, genetic algorithm-partial least squares (GA-PLS) is proposed for selecting the representative and optimal features. K nearest neighbor algorithm (KNN) is used for diagnosing and classifying hydraulic system faults. For expressing better performance of GA-PLS, the original data of a model engineering hydraulic system is used, and the results of GA-PLS are compared with all feature used and GA. The experimental results show that, the proposed feature method can diagnose and classify hydraulic system faults more efficiently with using fewer features.


Sign in / Sign up

Export Citation Format

Share Document