An Assessment on the FCM Classification of Thermodynamic-Property Data
In the present study we consider the algorithmic classification of thermodynamic properties of fluids using the fuzzy C-means (FCM) clustering methodology. The FCM is a technique that can find patters directly from the data. It is based on the minimization of an objective function that provides a measure of the dissimilarity of the data being classified in a particular group. The dissimilarity in the data is commonly formulated in terms of the Euclidean distance between the data points and the cluster centroids. This mathematical formulation and the efficient implementation are among its advantages. However, some drawbacks that lead to misclassification include the convergence to local optima, the particular form of the data, and the choice of the parameters embedded in the scheme. To assess the correct classification performance of FCM algorithm, published data of pressure, volume, and temperature are used with emphasis on the way the algorithm is affected by the natural scale of the data, and the following strategies for the classification: (1) data normalization, (2) transformation, (3) sample size used, and (4) data supply to the algorithm. The results of this assessment show that the natural scaling, and the normalization and transformation strategies are important, whereas the way the data are presented to the algorithm is not a critical factor in the classification. Also, a decrease in the number of data considered degrades the quality of the clustering. A complete consideration of the issues studied here may be helpful when a FCM classification is tried on new data.