Unable to Understand the Results of Human Pose Estimation Demo

Summary

Explanation of output from Human Pose Estimation Demo

Description

Executed the Human Pose Estimation Demo with:

python human_pose_estimation.py -m human-pose-estimation-0004.xml -i <input_path> -at ae -d CPU -r

Cannot comprehend the output.

Resolution

When using -r command line option, the demo will print raw detection results.

Output represents 17 keypoints for each person, followed by confidence.

The print of raw inference results is implemented by the following piece of code in Line 160-164 human_pose_estimation_demo.py:

def print_raw_results(poses, scores):

log.info('Poses:')

for pose, pose_score in zip(poses, scores):

pose_str = ' '.join('({:.2f}, {:.2f}, {:.2f})'.format(p[0], p[1], p[2]) for p in pose)

log.info('{} | {:.2f}'.format(pose_str, pose_score))

Format for the raw inference results are as follows:

Individual keypoints in array format(X-coordinate, Y-coordinate, Joint Confidence Score) from Line 131-132 hpe_associative_embedding.py:
# 2 is for x, y and 1 is for joint confidence

self.pose = np.zeros((num_joints, 2 + 1 + tag_size), dtype=np.float32)
Final value of the inference result is the mean value for the total of keypoints' confidence score from Line 334 hpe_associative_embedding.py:
scores = np.asarray([i[:, 2].mean() for i in ans])

Additional information

Refer to Running Human Pose Estimation Python Demo

Refer to decoder_ae.py for details on how to decode model output for OpenVINO™ 2021.2 release.

Refer to models class for details on how to decode model output for OpenVINO™ 2021.3 release.

Number of printed raw detection results is dependent on the number of keypoints detected by the model used.

Select Your Language