The idea behind this thesis work is to make a first step towards the development of a vision-based system to teleoperate cobots in real-time using the tip of the user’s hand.
This has been experimentally done by developing a ROS-based program that simultaneously (i) analyzes the hand-gesture performed by the user leveraging OpenPose skeletonization algorithm and (ii) moves the robot accordingly.
The measured position of the object in each point has been calculated by applying the conversion from pixels to meters developed before. Hence, it has been possible to estimate the positional error as the difference between the measured position and the real position of the bottom-left corner of the object in each pose.
Similarly to the first test, in this one a rectangular object of shape 73 x 48 x 0.5 mm has been positioned in correspondence of the 8 measure points.
In this case the average displacement observed is equal to 6.4 mm for the x-axis and 6.5 mm for the y-axis. This highlighted how the prospectic distortions heavily affect the measurements: in fact, since the height of the object is not enough to cast shadows on the plane, these errors are only due to the lens distortions.
Since the system adoperates OpenPose to extract in real-time the hand skeleton, it has been necessary to define three hand-gestures to detect according to the position of the keypoints (see Fig. 3, Fig. 4 and Fig. 5).
However, since OpenPose estimates the hand keypoints even if they are not present, it has been necessary to define a filtering procedure to determine the output gesture according to some geometrical references:
- the thumb must be present to correctly assign the keypoints numbers
- the distance between the start and end keypoint of the thumb must be between 20 and 50 mm
- the angle between the thumb and the x-axis must be > 90°
- the distance between the start and end keypoints of index, middle and ring fingers must be between 20 and 100 mm
- the distance between the start and end keypoint of the pinky finger must be between 20 and 70 mm
The proposed gestures have been performed by three male students with pale skin in different moments of the day (morning, afternoon, late afternoon). The purpose of this test was to determine if the system was able to robustly detect the gestures also considering the illumination of the scene.
The students moved their hand around the user frame and performed the gesture one at a time. I resulted that, on average, the proposed gestures were recognized 90% of the times.
It is worth noting that gesture “positioning” was defined in such a way to reduce the misclassification of the keypoints that could happen in some cases due to the presence of only one finger (the index). In fact, it has been observed that incrementing the number of fingers clearly visible in the scene also incremented the recognition accuracy of the gesture. This is probably due to the fact that the thumb must be present to avoid misclassifications with the index finger. However, even if the “positioning” gesture adopts three fingers, only the position of the index finger’s tip (keypoint 8) is used to estimate the position to which the user is pointing to.
COMPLETE TELEOPERATION SYSTEM
The complete system is composed by (i) the gesture recognition ROS node that detects the gesture and (ii) a robot node to properly move the Sawyer cobot accordingly. Hence, since the robot workspace is vertical (as shown in Fig. 6) it has been necessary to properly calibrate the vertical workspace with respect to the robot user frame. This has been done using a centering tool to build the calibration matrix (adopting points couples of robot frame coordinates – vertical workspace coordinates).