Scene Labeling from Camera Plugin
Hello,
I am attempting to generate a dataset using the camera plugin that I have on the robot. The task is essentially to generate a semantic mask for each class of object while driving the robot around in random patterns.
My initial approach was the following:
- Use the
/gazebo/link_states/
channel to retrieve the pose of the objects in the scene. - Knowing the camera properties from the
camera_info
channel, I would be able to then use a geometric approach to figure out where the object of class C at pose (X, Y, Z) falls on the image frame and label is accordingly.
The issue with this approach is that the pose isn't enough to generate a mask of the scene.
I figured that there might be a way in which the world is projected/casted into the camera plugin. I was hoping to figure out how this projection works under the hood for the camera plugin and see if there is a way that I could also extract the class of the objects that are cast to the image plane.
I am wondering if this sounds like a viable plan, and i anyone has any information as to how the camera plugin is implemented.
In addition, if anyone has experience with dataset generation/labeling using Gazebo, it would be greatly helpful to get some insight on this.
I am using Gazebo 11 for this task and my robot is the VRX robot: https://github.com/osrf/vrx
Thank you.