Classification Algorithm Based on Categorical Data Analysis
Through compares three methods of Standard Deviation mathematical expectation and variance, a classification algorithm based on the Standard Deviation which in the training set is proposed in this paper. The algorithm first mapped the discrete attribute values to the corresponding values, and calculates Standard Deviation, mathematical expectation and Variance of each attribute in each category. The Standard Deviation, mathematical expectation and Variance of each attribute in each category used as coordinates. When there are new datas need to determine the category, we just need to use the attributes of the new data as coordinates, and calculate its distance to each category, and then the data type is the shortest distance category. Comparison of three methods, the Standard Deviation is the most stable and most accurate. This algorithm has advantages in dealing with the noisy date.