Image processing is an interesting domain for extracting knowledge from real time video and images for surveillance, automation, robotics, medical and entertainment industries. The data obtained from videos and images are continuous and hold a primary role in semantic based video analysis,
retrieval and indexing. When images and videos are obtained from natural and random sources, they need to be processed for identifying text, tracking, binarization and recognising meaningful information for succeeding actions. This proposal defines a solution with assistance of Spectral Graph
Wave Transform (SGWT) technique for localizing and extracting text information from images and videos. K Means clustering technique precedes the SGWT process to group features in an image from a quantifying Hill Climbing algorithm. Precision, Sensitivity, Specificity and Accuracy are
the four parameters which declares the efficiency of proposed technique. Experimentation is done from training sets from ICDAR and YVT for videos.