IML4E

Validation of pose estimation models

What is it?

We developed a validation tool that can be used to evaluate and assess the accuracy, reliability and performance of pose estimation models. Pose estimation is a computer vision task where the aim is to determine the keypoints of objects or entities in an image or a video. In case of human pose estimation, the keypoints are usually the body joints and facial features.

Why is it necessary?

Our aim is to develop and improve our spinal health assessment application. This includes improving the built-in machine learning model based on the continuously gathered data. In order to accomplish this, it is necessary to evaluate and compare the developed machine learning models before deciding to push them to production.

Architecture of the case study system Vitarex

How does it work?

First, predictions are produced on the validation datasets by the model under assessment. Then the predicted keypoints are compared to the ground truth keypoints with the help of the COCOeval interface of pycocotools. This calculates the Object Keypoint Similarity (OKS) scores which quantify the closeness of the predicted object (human) with the ground truth. Based on the OKS scores, average precision and recall scores are calculated. After that, the coco-analyze evaluation is executed. This tool evaluates the impact of various error types specific to pose estimation by quantifying their extent with the help of the OKS scores. These error types include undetected keypoints, small and large differences in keypoint positions, confusion between left and right side and mixing the body parts of different humans. At the end of the evaluation, a pdf report is generated which includes the values of the different calculated metrics. The whole validation process is integrated with the tracking feature of mlflow. Because of this, the machine learning models trained with different hyperparameters can be easily compared on the mlflow ui.

IML4E – Validation of pose estimation models

Validation of pose estimation models

What is it?

Why is it necessary?

How does it work?