We built a Proof of Concept (PoC) for a computer vision system that evaluates football technique. Given a video of an athlete performing a training drill, the system analyses movements and ball contacts to calculate a score across different categories. This PoC shows the potential for a new class of future sports apps: fully automated video-based training companions that gamify individual training and offer a cost-effective solution for the masses.
Under the hood, we use YOLO11, a state-of-the-art object detection model, to detect and track both the athlete and the ball. With this information, we can derive real-world positions, distances, speeds and accelerations that are used for calculating our performance scores. This PoC was developed using the feedback from numerous domain experts.
Our PoC is specialised on one particular drill: the Elastico Tip-Tap. Given a video that shows the exercise from start to finish, we calculate scores on a scale of 1 to 10 for the following categories: Speed, Efficiency, Smoothness and Control. Let’s have a look at one example run from our resident football aficionado Daniel Hirsch:
Our analysis pipeline consists of two major steps: Running detection and tracking algorithms to extract the raw vision data, followed by significant post-processing to calculate the scores.
In the first step, we take advantage of the vast capabilities of Ultralytics’ YOLO11 ecosystem to acquire all of our needed raw vision data:


Next, we preprocess the raw data in a couple of key steps to get reliable world data.






With this cleaned world data, it’s time to calculate scores. After consulting with the domain experts—among which is a former Bundesliga player—and looking at a selection of good and bad video examples, we identified four key performance indicators when executing the Elastico Tip-Tap drill. For each of these, we created a corresponding score and tuned them to correlate with human perception.
Efficiency is all about how many ball contacts you need to successfully complete the drill. The fewer the better, obviously! For this, we need to detect and count ball contacts.
We detect ball contacts by finding frames where a foot is sufficiently close to the ball (we found that 15cm is a reasonable threshold). To filter out observations where the foot passes behind or in front of the ball without touching it, we also check if acceleration of the ball is sufficiently high (we found that 6m/s² is a reasonable threshold). This way, we get all the all ball contacts that changed the movement of the ball.
Below, you can see how good and bad execution differs in ball contacts and how they come in over time.

On the football pitch, moves like the Elastico Tip-Tap are required to dribble and outmaneuver opponents, so keeping good control over the ball is crucial. We measure control as the distance to the ball. The closer, the better!
Here, we need to differentiate:
If the ball is outside of the legs: the ball should be really close to the feet. So, the distance to the closest foot should be minimal. If the ball is between the legs: Ideally, the ball should be around the center of gravity, so somewhere on the line between the feet. We penalise the distance to the ideal line. Averaging over the entire video gives us an average distance to the ball. Down below, you can see the distances for the five runs; the straight lines are the respective averages. Bad 3 is an interesting example that was special due to it’s extreme sloppiness in control.

How smooth are your moves? Are you connecting your ball contacts with flowy motions or are you twitching a lot without actually getting too far? Low smoothness makes you look less cool and leads to faster fatigue, so the smoother, the better.
We define smoothness as minimal foot acceleration for maximum foot velocity. Mathematically, we calculate the average norm of the velocity over time ⟨v⟩ and average norm of the acceleration over time ⟨a⟩ of both ankles. Then, we calculate a measure of smoothness as ⟨v⟩⟨a⟩ averaged over both ankles.
Below, you can see a rolling average of smoothness over time for the different runs.

The faster you finish the drill, the better. Here, we could have just measured the time from start to finish of the drill. However, for this PoC, we didn’t implement a detection whether the exercise was started or finished, so we use the average horizontal ball speed as a proxy instead. The faster, the better.
Below, you can see speed over time for the different runs. The vertical line indicates the average speed.

For each of the performance indicators, we empirically choose parameters to map the raw metric value to a score between 0 and 10 such that they align with our data. For example, the Elastico Tip-Tap ideally requires around 24 ball contacts, yielding a 10/10. 30+ ball contacts are pretty bad and are ranked as 0/10.
Finally, the scores are averaged to an overall score.
