Robust feature collection and classification of network culture
The network provides a convenient mechanism for publishing and obtaining documents, and has now become a gathering place for all kinds of information. In the network, the amount of information increases exponentially, and how to dig useful patterns or knowledge from the massive network culture has become a hot topic for scholars. In data mining, in order to enable readers to quickly obtain the content of interest, research text classification, and automatically classify text data according to a certain classification model. Internet cultural text data has the characteristics of unstructured, subjective, high-dimensional, etc., which makes it difficult for text mining algorithms to extract effective and easy-to-understand classification rules, and the computational complexity is too high. This paper proposes a feature selection method based on robust features, using sample deviation and variance as the criteria for feature attributes to rank the importance of feature attributes, and select the best feature attribute subset. The experimental results show that the classification accuracy of the feature selection method based on sample deviation and variance proposed in this paper is higher than the traditional word frequency as the feature selection method, which proves the feasibility and superiority of the feature selection method proposed in this paper.