Finding the location of a robot, equipped with an imaging sensor, by taking photos from its surrounding environment is a multifaceted task consisting several obligatory phases. It starts from the calibration of a sensor, and ends in propagation of errors, to consequently express our uncertainty about the unknowns. This article uses a mathematical language to elaborate a model based on recent trends to show how the structure and motion can be estimated by image-processing methods on digital images taken from a regular non-metric camera. The direct and inverse Brown's model for calibration, as well as the basic definition of an image pyramid is discussed first. The concepts of Epipolar geometry, collinearity and co-planarity, and registrations of models, are described next. Generating a reference map, the bundle-adjustment and localization are presented finally. In the last sections, some recent trends about parallel computing are reviewed, and recommendations for building a real-time system are discussed.