The ability to quickly adapt to distorted speech signals, such as noise-vocoding, is one of the mechanisms listeners employ to understand one another in challenging listening conditions. In addition, listeners have the ability to exploit information offered by visual aspects of speech, and being able to see the speaker’s face while perceiving distorted speech improves perception of and adaptation to these distorted speech signals. However, it is unclear how important viewing specific parts of the speaker’s face is to the successful use of visual speech information – particularly, does looking at the speaker’s mouth specifically improve recognition of noise-vocoded speech, or is it equally effective to view the speaker’s entire face? This study aimed to establish whether viewing specific parts of the speaker’s face (eyes or mouth), compared to viewing the whole face, affected perception of and adaptation to distorted sentences. In a secondary aim, we wanted to establish whether it was possible to replicate results on processing of noise-vocoded speech from lab-based experiments in an online setting. We monitored speech recognition accuracy online while participants were listening to noise-vocoded sentences in a between-subjects design with five groups. We first established if participants were able to reliably perceive and adapt to audiovisual noise-vocoded sentences when the speaker’s whole face was visible (AV Full). Four further groups were tested: a group in which participants could only view the moving lower part of the speaker’s face – i.e., the mouth (AV Mouth), only see the moving upper part of the face (AV Eyes), a group in which participants could not see the speaker’s moving lower or upper face (AV Blocked), and a group in which they were presented with an image of a still face (AV Still). Participants repeated around 40% of key words correctly for the noise-vocoded sentences and adapted over the course of the experiment but only when the moving mouth was visible (AV Full and AV mouth). In contrast, performance was at floor level and no adaptation took place in conditions when the moving mouth was not visible (AV Blocked, AV Eyes, and AV Still). Our results show the importance of being able to observe relevant visual speech information from the speaker’s mouth region, but not the eyes/upper face region when listening and adapting to speech under challenging conditions online. Second, our results also demonstrated that it is feasible to run speech perception and adaptation studies online, but that not all findings reported for lab studies necessarily replicate.