
Here at TNG, AI is already stealing our consultant’s jobs, at least when it comes to camera operation. For our all hands meetings (AHS), we created KameraKInd, a robotic camera system that automatically tracks the current speaker – a task that was done by a human consultants before. KameraKInd uses a spotting camera to automatically detect all persons present in the room and selects the currently active speaker for tracking. A second gimbal-mounted tracking camera will follow the current person of interest. This AI system uses a variety of different real-time detection models: It first detects persons with YOLOv8 and tracks their movement. The current person of interest is idenfied via active speaker detection from Light-ASD. Then the two camera images are matched using LightGlue to find the optimal spotting camera movement. All of these models are executed in a parallel pipeline to generate real-time control signals for the gimbal. The KameraKInd showcases how a combination of computer vision models can be orchestrated for real-time, vision based decision making. Step in front of the cameras and speak! After a couple of words, the camera will follow you. You don’t want to be followed anymore? Let someone else talk in front of the camera.