In this paper a stepwise information-theoretic feature selector is designed and implemented to reduce the dimension of a data set without losing pertinent information. The effectiveness of the proposed feature selector is demonstrated by selecting features from forty three variables monitored on a set of heavy duty diesel engines and then using this feature space for classification of faults in these engines. Using a cross-validation technique, the effects of various classification methods (linear regression, quadratic discriminants, probabilistic neural networks, and support vector machines) and feature selection methods (regression subset selection, RV-based selection by simulated annealing, and information-theoretic selection) are compared based on the percentage misclassification. The information-theoretic feature selector combined with the probabilistic neural network achieved an average classification accuracy of 90%, which was the best performance of any combination of classifiers and feature selectors under consideration.