Train the gesture with Yolov4, which is used to control the robot arm in smart construction to solve the problem of insufficient manpower in construction industry
video.2021-11-29.12-11-41.mp4
-
issue
- To improve the serious shortage of labor in Taiwan’s construction industry
- To solve the problem that laborers are unwilling to do a lot of rough work (eg: nailing templates, cutting templates, tying steel bars, etc.)
-
Solution
- Use robotic arms to replace manpower, and replace a lot of manual work by controlling the robotic arm and designing its operation script.
- Use the advantage that the mechanical arm can lift heavier objects to solve the situation of rough labor without human resources.
-
Construction site environment
-
Robotic Arm Construction System --- Gesture Recognition
In the second situation,when the scene is a relatively noisy construction site environment, because the sound cannot be well received, gesture recognition can be used to execute the written script in this situation, such as the path of stacking bricks.
-
-
Building materials wholesale factory environment
-
Robotic Arm Handling system --- Speech Recognition
In the first situation,since the hands of the workers need to do other tasks during the handling process in the factory, it is inconvenient for the hands to move. Therefore, in this case, voice recognition can be used to instruct the robotic arm to move the building materials, and carry them back and forth between two points.
-
Command | Gesture | Purpose |
---|---|---|
Move to material point | ☝ | The robot arm moves to the starting position for moving materials |
Move to destination | 🤙 | The robotic arm moves to the destination where the material is stacked |
Start operation | 🖐 | Activate the robot arm (servo on) |
End operation | ✋ | Close the robot arm (servo off) |
Down | 👎 | Move the robotic arm down to the material position |
Clamp | 🤏 | Close the gripper of the robotic arm |
Drop | ✌ | Open the gripper of the robotic arm |
Confirm | 👌 | Confirm execution of voice commands |
Pause | ✊ | Pause the current command action |
Cancel | 🤞 | Cancel the current voice command and re-identify |
-
Situation simulation:
- Suppose there is an operational situation where materials need to be moved from A (x=10,y=10,z=0) to B (x=20,y=20,z=0). Factory moving situation
-
Operation process:
-
Manually input material point and destination point position on the user interface, and click the initialization command such as servo on.
-
After the mechanical equipment is ready, enter the following sequence in front of the camera:
"☝" -> "👎" -> "✊" -> "✌" ->"🤏" -> "🤙" -> "👎" -> "✊" -> "✌"
-
Training with Google Colab and Kaggle
-
Log in to Google Drive and get image, xml and py files
-
Modify the corresponding file path in the train_add_plot.py file
-
When performing the first half of training (freeze training), one epoch takes about 12 minutes, and the training ends early at 19 epochs
-
When performing the second half of the training, one epoch takes about 15 to 20 minutes.
-
Prepare the corresponding environment
-
The default GPU driver version of Kaggle is Cuda 11.0.0, and Cuda 10.0.0 needs to be installed for tensorflow-gpu 1.13.2
-
In the second half of the training part, one epoch takes about 7 minutes to execute, and 20 epochs take about 3 hours to execute (including environment installation)
-
Final result: open and run yolo.py (first) and video.py (later)
Spyder.Python.3.6.2021-11-29.12-10-22.mp4
- When using Colab in the first half, it is necessary to keep the webpage open, which will affect the performance of the laptop. The second half moves to Kaggle training to solve this problem.
- Because it is not executed all at once, it cannot be drawn through the history parameter, and plt.plot cannot be displayed in the log result. If you use Kaggle to train from scratch and change the plot to saveimg, it should be able to solve the problem and get the pictures of acc and loss.
- The environment versions of Colab and Kaggle are newer than the version required for yolo training, so there will be an error caused by inconsistency of function parameters. It is necessary to add instructions to install the corresponding version and downgrade tensorflow and keras to execut.
- Colab's GPU limit is not clear, there have been cases where two accounts were restricted from using the GPU.
Data set can be expanded
- The gesture data set is too simple, and the recognition results are limited and inaccurate. It requires gestures on a white background and a certain angle to be recognized.
- The current model is that the training gesture data is not enough. In the future, more gestures from different people can be added to make the training more accurate.