Building a system for recognition of human actions from video involves two key problems - 1) designing suitable low-level features that are both efficient to extract from videos and are capable of distinguishing between events 2) developing a suitable representation scheme that can bridge the large gap between low-level features and high-level event concepts, and also handle the uncertainty and errors inherent in any low-level video processing. Graphical models provide a natural framework for representing state transitions in events and also the spatio-temporal constraints between the actors and events. Hidden Markov models(HMMs) have been widely used in several action recognition applications but the basic representation has three key deficiencies: These include unrealistic models for the duration of a sub-event, not encoding interactions among multiple agents directly and not modeling the inherent hierarchical organization of these activities. Several extensions have been proposed to address one or more of these issues and have been successfully applied in various gesture and action recognition domains. More recently, conditionalrandomfields (CRF) are becoming increasingly popular since they allow complex potential functions for modeling observations and state transitions, and also produce superior performance to HMMs when sufficient training data is available. The authors will first review the various extension of these graphical models, then present the theory of inference and learning in them and finally discuss their applications in various domains.