Intermittent segmentation fault possibly by custom WorldPlugin attaching and detaching child
My goal is to teleport a robot to an arbitrary pose in a 0-gravity world, while keeping the robot in a fixed pose after teleportation. I need to do the teleport a bunch of times.
I’ve tried many ways to do this, and here is the only way that does the task, but I get intermittent segmentation faults after running the simulation for a while.
I am doing the teleport by attaching and detaching a joint between the world link and the base link of the robot. Attaching the robot to world makes the robot fixed. Detaching the joint allows the robot to be moved freely.
The teleport is done by a WorldPlugin. It subscribes to a rosservice (among others) that says detach or attach. It then calls physics::Joint::Attach(world_link, base_link)
or physics::Joint::Detach()
. The two links are found by physics::WorldPtr->GetByName()
and dynamically casted to physics::LinkPtr
.
The intermittent seg fault ALWAYS happens after a Detach() has been done and the rosservice has returned successfully. But the seg fault doesn’t happen within my rosservice handler function inside the WorldPlugin. It happens somewhere else in the simulation loop, which leaves me very puzzled and have nowhere to start debugging!
I tried many things to fix this, and I’m still stuck with one big question, what is the seg fault caused by? Is it really from calling Attach()
and Detach()
too much? It happens after running simulation for more than 5 to 10 minutes, during which I’ve attached and detached many times.
Here is a backtrace from gdb:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x290b of process 11318]
0x0000000100600612 in dQMultiply3 ()
from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_ode.2.dylib
(gdb) bt
#0 0x0000000100600612 in dQMultiply3 ()
from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_ode.2.dylib
#1 0x0000000100617541 in getHingeAngle(dxBody*, dxBody*, double*, double*) ()
from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_ode.2.dylib
#2 0x000000010061569f in dxJointHinge::getInfo1(dxJoint::Info1*) ()
from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_ode.2.dylib
#3 0x00000001005f392a in dxQuickStepper(dxWorldProcessContext*, dxWorld*, dxBody* const*, int, dxJoint* const*, int, double) ()
from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_ode.2.dylib
#4 0x000000010060ea36 in dxProcessIslands(dxWorld*, double, void (*)(dxWorldProcessContext*, dxWorld*, dxBody* const*, int, dxJoint* const*, int, double)) ()
from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_ode.2.dylib
#5 0x00000001005e8229 in dWorldQuickStep ()
from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_ode.2.dylib
#6 0x00000001005202c1 in gazebo::physics::ODEPhysics::UpdatePhysics() ()
from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_physics_ode.2.dylib
#7 0x000000010038483d in gazebo::physics::World::Update() ()
from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_physics.2.dylib
#8 0x0000000100383864 in gazebo::physics::World::Step() ()
from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_physics.2.dylib
#9 0x0000000100382d66 in gazebo::physics::World::RunLoop() ()
from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_physics.2.dylib
#10 0x00000001019ec6c5 in boost::(anonymous ...
About how many attach / detaches happen during these 10 minutes?
AndreiHaidu: It varies. Looks like ~10 detaches / attaches per minute. I ran the program 11 times, without changing any code, each time until seg fault. These are the numbers I get: 65 detaches / 64 attaches, 36/35, 28/27, 66/65, 16/15, 11/10, 25/24, 109/108, 22/21, 2/1, 96/95. I’ve actually never seen it die after just 2/1 times before, that is rare. It also doesn’t always seg fault in the same place like dQMultiply3() above; another backtrace shows in setBall(). Any ideas?
well it does sound like some bug, which gazebo version are you using?
Gazebo 2.2.6, because I'm using ROS Indigo. Do you mean bug in my code or in Gazebo? Hoping for bug in my code because I need this to work as soon as possible... Otherwise I can try a different Gazebo version that likes ROS Indigo... not sure what else is compatible. I see that it's possible to use 4 5 or 6 if installed from source.
I would say bug in gazebo, since you are calling a ros service to attach detach joints, there is nothing else you can do, and especially that it crashes only after some time. I would suggest trying out the new gazebo versions, last time I managed to get gazebo 5 running with ROS.
Tried in ROS Jade with Gazebo 5.1.0. Seg fault happened in a similar way... Do you have any clues on which physics:: object I should look at? Perhaps Joint, or is it something in a bigger scope? I'm contemplating whether to try to fix this by installing Gazebo from source. Otherwise I might have to hunt for a simulator other than Gazebo, which would be redoing a lot of work... Or, perhaps there's some other way to make a robot fixed to the world, without using Attach()?
Maybe its worth doing this via a gazebo plugin plugin, there are examples on attaching and detaching joints in the Gripper example: https://bitbucket.org/osrf/gazebo/src/5299916de5cce4a5f3baf8acf75cf6bff68f8184/gazebo/physics/Gripper.cc?at=default&fileviewer=file-view-default#Gripper.cc-251 Maybe it is even worth comparing this code with the ROS plugin version which listens to the service call for creating the joint
Hmm I am actually already using a Gazebo plugin, the WorldPlugin, which is how I access physics::World and everything. My rosservices are triggered by calling ros::spinOnce() inside OnUpdate() of the WorldPlugin. I'm not sure what a ROS plugin is. Anyway, I made my code look like Gripper.cc, so instead of Attach(), I call CreateJoint() once at the beginning, then Load() and Init() in the rosservice. But seg faults still happened, still after Detach(). Gripper.cc uses Detach() too.
I found a workaround. The objective is to avoid using Detach(), since seg faults are always after it. I simply keep the robot fixed by SetWorldPose() on its base link, in OnUpdate(). This replaces the need to Attach() and Detach() the robot to a fixed joint. I can still teleport it, and keep it fixed after teleport, while its joints can still move freely, just like before. Only difference is, no more seg faults. Thank you so much Andrei for your prompt replies! I will comment if seg fault again.
Glad it worked! When you have the time you should probably write an answer with the workaround and mark it as solved. It might help others as well. Also a bug report where you point to this question can help as well. `https://bitbucket.org/osrf/gazebo/issues/new`