Computer vision is essential to develop a social robotic system capable to interact with humans. It is responsible to extract and represent the information around the robot. Furthermore, a learning mechanism, to select correctly an action to be executed in the environment, pro-active mechanism, to engage in an interaction, and voice mechanism, are indispensable to develop a social robot. All these mechanisms together provide a robot emulate some human behavior, like shared attention. Then, this chapter presents a robotic architecture that is composed with such mechanisms to make possible interactions between a robotic head with a caregiver, through of the shared attention learning with identification of some objects.