The goal of this work is to distinguish between humans and robots in a mixed human-robot environment. We analyze the spatio-temporal patterns of optical flow-based features along several frames. We consider the Histogram of Optical Flow (HOF) and the Motion Boundary Histogram (MBH) features, which have shown good results on people detection. The spatio-temporal patterns are composed of groups of feature components that have similar values on previous frames. The groups of features are fed into the FuzzyBoost algorithm, which at each round selects the spatio-temporal pattern (i.e. feature set) having the lowest classification error. The search for patterns is guided by grouping feature dimensions, considering three algorithms: (a) similarity of weights from dimensionality reduction matrices, (b) Boost Feature Subset Selection (BFSS) and (c) Sequential Floating Feature Selection (SFSS), which avoid the brute force approach. The similarity weights are computed by the Multiple Metric Learning for large Margin Nearest Neighbor (MMLMNN), a linear dimensionality algorithm that provides a type of Mahalanobis metric Weinberger and Saul, J. MaCh. Learn. Res.10 (2009) 207–244. The experiments show that FuzzyBoost brings good generalization properties, better than the GentleBoost, the Support Vector Machines (SVM) with linear kernels and SVM with Radial Basis Function (RBF) kernels. The classifier was implemented and tested in a real-time, multi-camera dynamic setting.