This study evaluated the performance of three permanent vehicle classification stations on freeways against concurrent video-based ground truth. All stations had dual loop detectors and a piezoelectric sensor in each lane, which together provided axle-based and length-based classification. Evaluation was done at individual, per vehicle resolution for each vehicle that passed during the study periods (more than 18,000 vehicles in uncongested conditions). Although the stations exhibited good performance overall (97% correct), the performance for trucks was poor; for example, only 60% of the single-unit trucks (SUTs) were correctly classified. All observed errors were diagnosed. Some errors could be fixed quickly, and others could not. Data from one site were used to revise the classifier to solve almost all fixable errors, and then performance at another location was tested. A chronic error found in the research was intrinsic to the vehicle fleet and may be impossible to correct with existing sensors: the shorter SUTs have a length range and axle spacing range that overlap those of passenger vehicles. Depending on calibration, SUTs may be counted as passenger vehicles or vice versa. Such errors should be expected at most classification stations. All subsequent uses of the classification data must accommodate this unavoidable blurring error. Because of the blurring, the axle classification station cannot be uncritically used to calibrate the boundary between passenger vehicles and SUTs for length-based classification stations, because unavoidable errors in axle-based classification would be amplified in the length-based classification scheme.