Kamerakind

Automatic Speaker Detection

Here at TNG, AI is already stealing our consultant’s jobs, at least when it comes to camera operation. For our all hands meetings (AHS), we created KameraKInd, a robotic camera system that automatically tracks the current speaker – a task that was done by a human consultants before. KameraKInd uses a spotting camera to automatically detect all persons present in the room and selects the currently active speaker for tracking. A second gimbal-mounted tracking camera will follow the current person of interest. This AI system uses a variety of different real-time detection models: It first detects persons with YOLOv8 and tracks their movement. The current person of interest is idenfied via active speaker detection from Light-ASD. Then the two camera images are matched using LightGlue to find the optimal spotting camera movement. All of these models are executed in a parallel pipeline to generate real-time control signals for the gimbal. The KameraKInd showcases how a combination of computer vision models can be orchestrated for real-time, vision based decision making. Step in front of the cameras and speak! After a couple of words, the camera will follow you. You don’t want to be followed anymore? Let someone else talk in front of the camera.

Privacy Imprint