Prior work has shown that the mouth area can yield articulatory features of speech segments and durational information (Navarra et al., 2010), while pitch and speech amplitude, are cued by the eyebrows and other head movements (Hamarneh et al., 2019). It has been reported that adults will look more at the mouth when evaluating speech information in a non-native language (Barenholtz et al., 2016). In the present study, we ask how listeners' visual scanning of a talking face is affected by task demands that specifically target prosodic and segmental information, which has not been examined by the prior work. Twenty-five native English speakers heard two audio sentences in English (the native language) or Mandarin (the non-native language) that might differ in segmental or prosodic information, or even both, and then saw a silent video of a talking face. Their task was to judge whether the video matched either the first or second audio sentence (or whether both sentences were the same).The results show that although looking was generally weighted towards the mouth, reflecting task demands, increased looking to the mouth predicted correct responses only for Mandarin trials. This effect was more pronounced in the Prosody and Both conditions, relative to the Segment condition (p < 0.05). The results suggest a link between mouth-looking and the extraction of speech-relevant information at both prosodic and segmental levels, but only under high cognitive load.