Real-time detection of a speaker and speaker's location is a challenging task, which is usually addressed by processing acoustic/visual information. However, it is a well-known fact that when a person speaks, the lip and head movements can also be used to detect the speaker and location. This paper proposes a speaker detection system using visual prosody information (e.g. head and lip movements) in a human-machine multiparty interactive dialogue setting. This analysis is performed on a human-machine multiparty dialogue corpora. The paper reports results on head movement and fusion of head and lip movements for speaker and speech activity detection in three different machine learning model training settings (speaker dependent, speaker independent and hybrid). However, it also compares the lip movement results with the head and ‘fusion of head and lip movements’. The results show that the head movements contributes significantly towards detection and outperform lip movements except in speaker independent settings, and fusion of both improves performance.