Gazebo | Ignition | Community
Ask Your Question
1

Intermittent segmentation fault possibly by custom WorldPlugin attaching and detaching child

asked 2015-12-03 03:36:00 -0600

mz gravatar image

My goal is to teleport a robot to an arbitrary pose in a 0-gravity world, while keeping the robot in a fixed pose after teleportation. I need to do the teleport a bunch of times.

I’ve tried many ways to do this, and here is the only way that does the task, but I get intermittent segmentation faults after running the simulation for a while.

I am doing the teleport by attaching and detaching a joint between the world link and the base link of the robot. Attaching the robot to world makes the robot fixed. Detaching the joint allows the robot to be moved freely.

The teleport is done by a WorldPlugin. It subscribes to a rosservice (among others) that says detach or attach. It then calls physics::Joint::Attach(world_link, base_link) or physics::Joint::Detach(). The two links are found by physics::WorldPtr->GetByName() and dynamically casted to physics::LinkPtr.

The intermittent seg fault ALWAYS happens after a Detach() has been done and the rosservice has returned successfully. But the seg fault doesn’t happen within my rosservice handler function inside the WorldPlugin. It happens somewhere else in the simulation loop, which leaves me very puzzled and have nowhere to start debugging!

I tried many things to fix this, and I’m still stuck with one big question, what is the seg fault caused by? Is it really from calling Attach() and Detach() too much? It happens after running simulation for more than 5 to 10 minutes, during which I’ve attached and detached many times.

Here is a backtrace from gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x290b of process 11318]
0x0000000100600612 in dQMultiply3 ()
   from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_ode.2.dylib

(gdb) bt
#0  0x0000000100600612 in dQMultiply3 ()
   from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_ode.2.dylib
#1  0x0000000100617541 in getHingeAngle(dxBody*, dxBody*, double*, double*) ()
   from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_ode.2.dylib
#2  0x000000010061569f in dxJointHinge::getInfo1(dxJoint::Info1*) ()
   from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_ode.2.dylib
#3  0x00000001005f392a in dxQuickStepper(dxWorldProcessContext*, dxWorld*, dxBody* const*, int, dxJoint* const*, int, double) ()
   from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_ode.2.dylib
#4  0x000000010060ea36 in dxProcessIslands(dxWorld*, double, void (*)(dxWorldProcessContext*, dxWorld*, dxBody* const*, int, dxJoint* const*, int, double)) ()
   from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_ode.2.dylib
#5  0x00000001005e8229 in dWorldQuickStep ()
   from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_ode.2.dylib
#6  0x00000001005202c1 in gazebo::physics::ODEPhysics::UpdatePhysics() ()
   from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_physics_ode.2.dylib
#7  0x000000010038483d in gazebo::physics::World::Update() ()
   from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_physics.2.dylib
#8  0x0000000100383864 in gazebo::physics::World::Step() ()
   from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_physics.2.dylib
#9  0x0000000100382d66 in gazebo::physics::World::RunLoop() ()
   from /usr/local/Cellar/gazebo2/2.2.6/lib/libgazebo_physics.2.dylib
#10 0x00000001019ec6c5 in boost::(anonymous ...
(more)
edit retag flag offensive close merge delete

Comments

About how many attach / detaches happen during these 10 minutes?

AndreiHaidu gravatar imageAndreiHaidu ( 2015-12-03 03:46:30 -0600 )edit

AndreiHaidu: It varies. Looks like ~10 detaches / attaches per minute. I ran the program 11 times, without changing any code, each time until seg fault. These are the numbers I get: 65 detaches / 64 attaches, 36/35, 28/27, 66/65, 16/15, 11/10, 25/24, 109/108, 22/21, 2/1, 96/95. I’ve actually never seen it die after just 2/1 times before, that is rare. It also doesn’t always seg fault in the same place like dQMultiply3() above; another backtrace shows in setBall(). Any ideas?

mz gravatar imagemz ( 2015-12-03 17:16:18 -0600 )edit

well it does sound like some bug, which gazebo version are you using?

AndreiHaidu gravatar imageAndreiHaidu ( 2015-12-04 03:36:38 -0600 )edit

Gazebo 2.2.6, because I'm using ROS Indigo. Do you mean bug in my code or in Gazebo? Hoping for bug in my code because I need this to work as soon as possible... Otherwise I can try a different Gazebo version that likes ROS Indigo... not sure what else is compatible. I see that it's possible to use 4 5 or 6 if installed from source.

mz gravatar imagemz ( 2015-12-04 11:13:58 -0600 )edit

I would say bug in gazebo, since you are calling a ros service to attach detach joints, there is nothing else you can do, and especially that it crashes only after some time. I would suggest trying out the new gazebo versions, last time I managed to get gazebo 5 running with ROS.

AndreiHaidu gravatar imageAndreiHaidu ( 2015-12-05 03:49:35 -0600 )edit

Tried in ROS Jade with Gazebo 5.1.0. Seg fault happened in a similar way... Do you have any clues on which physics:: object I should look at? Perhaps Joint, or is it something in a bigger scope? I'm contemplating whether to try to fix this by installing Gazebo from source. Otherwise I might have to hunt for a simulator other than Gazebo, which would be redoing a lot of work... Or, perhaps there's some other way to make a robot fixed to the world, without using Attach()?

mz gravatar imagemz ( 2015-12-05 23:52:06 -0600 )edit

Maybe its worth doing this via a gazebo plugin plugin, there are examples on attaching and detaching joints in the Gripper example: https://bitbucket.org/osrf/gazebo/src/5299916de5cce4a5f3baf8acf75cf6bff68f8184/gazebo/physics/Gripper.cc?at=default&fileviewer=file-view-default#Gripper.cc-251 Maybe it is even worth comparing this code with the ROS plugin version which listens to the service call for creating the joint

AndreiHaidu gravatar imageAndreiHaidu ( 2015-12-07 02:31:48 -0600 )edit

Hmm I am actually already using a Gazebo plugin, the WorldPlugin, which is how I access physics::World and everything. My rosservices are triggered by calling ros::spinOnce() inside OnUpdate() of the WorldPlugin. I'm not sure what a ROS plugin is. Anyway, I made my code look like Gripper.cc, so instead of Attach(), I call CreateJoint() once at the beginning, then Load() and Init() in the rosservice. But seg faults still happened, still after Detach(). Gripper.cc uses Detach() too.

mz gravatar imagemz ( 2015-12-08 02:01:00 -0600 )edit

I found a workaround. The objective is to avoid using Detach(), since seg faults are always after it. I simply keep the robot fixed by SetWorldPose() on its base link, in OnUpdate(). This replaces the need to Attach() and Detach() the robot to a fixed joint. I can still teleport it, and keep it fixed after teleport, while its joints can still move freely, just like before. Only difference is, no more seg faults. Thank you so much Andrei for your prompt replies! I will comment if seg fault again.

mz gravatar imagemz ( 2015-12-08 02:07:44 -0600 )edit

Glad it worked! When you have the time you should probably write an answer with the workaround and mark it as solved. It might help others as well. Also a bug report where you point to this question can help as well. `https://bitbucket.org/osrf/gazebo/issues/new`

AndreiHaidu gravatar imageAndreiHaidu ( 2015-12-08 03:39:20 -0600 )edit

2 Answers

Sort by » oldest newest most voted
1

answered 2015-12-24 01:37:26 -0600

mz gravatar image

updated 2015-12-24 01:42:59 -0600

For my specific purpose, which is to teleport a robot, where the robot is kept fixed after teleportation, I’ve found a workaround.

The cause of segmentation fault was likely the call to physics::joint::Detach(). So I avoided calling Detach(). Then there were no more segmentation faults. (Even when I only kept one Detach() call in the entire duration of the Gazebo WorldPlugin, I still had segmentation faults from time to time. So I removed the calls entirely, and then no more seg faults. This further points to Detach() as the problem.)

I still use the Gazebo WorldPlugin, but instead of using Attach() and Detach() to let the robot loose from a fixed "world" link for teleportation, I don’t connect the robot to the world in the URDF at all. I allow the robot to float in space, and I force it still by calling SetWorldPose(teleport_pose) in the OnUpdate() function of the WorldPlugin.

The goal pose for teleportation is passed in from a custom rosservice call. Then OnUpdate() continuously sets the robot’s pose to this goal, and twist to 0 (in case of bumping into something and sliding away in 0 gravity). This effectively achieves my goal, free of seg faults.

Code snippet:

void OnUpdate (const common::UpdateInfo & /*_info*/)
{
  physics::ModelPtr model;

  // Find the model in the Gazebo world
  physics::BasePtr model_genericType = world_ -> GetByName (
    model_name);
  if (model_genericType)
    model = boost::dynamic_pointer_cast <physics::Model> (model_genericType);
  else
    fprintf (stderr, "Did not find model %s.\n", model_name.c_str ());

  if (model)
  {
    //math::Pose pose = ...  // Set some goal pose to teleport to. I set this via a custom rosservice call
    model -> SetWorldTwist (math::Vector3 (), math::Vector3 ());
    model -> SetWorldPose (pose);
  }

  // Check if a rosservice call comes in
  ros::spinOnce ();
}

void Load (physics::WorldPtr _world, sdf::ElementPtr _sdf)
{
  // Check ROS is initialized, subscribe to rosservice, etc. ...

  world_ = _world;
  this -> updateConnection_ = event::Events::ConnectWorldUpdateBegin(
      boost::bind (&MyWorldPlugin::OnUpdate, this, _1));
}
edit flag offensive delete link more
1

answered 2020-01-05 05:10:24 -0600

bwibrin gravatar image

Hello everyone,

I'm coming back with a solution to the problem mentioned in this post regarding the segmentation fault occurring randomly when calling the function physics::Joint::Detach().

After some investigation, I realized that the segmentation fault occurred when the call to the Detach function was done during an update of the physics engine. In this situation, a joint that was present in the beginning of the iteration of the physics engine is suddenly erased by the call to the Detach function, resulting in a segmentation fault.

To avoid this situation, you can use a workaround that consists in pushing the calls to the Detach function in a list and to execute effectively those Detach just before the physics engine is updated. This can be done using an event of type ConnectBeforePhysicsUpdate. We can hence link a callback function 'OnUpdate' that executes the Detach that are stored in a list, if any, just before the physics engine is updated. By doing so, we make sure that a call to Detach will no longer be executed during an update of the physics engine.

Code:

Add the following variables in the constructor of your class:

std::vector<fixedJoint> vect;
this->detach_vector = vect;
this->beforePhysicsUpdateConnection = Event::Events::ConnectBeforePhysicsUpdate(std::bind(&YourClass::OnUpdate, this));

In your function where you call Detach(), push back those calls in the detach_vector

bool YourClass::detach(std::string model1, std::string link1,
                                     std::string model2, std::string link2)
  {
      // search for the instance of joint and do detach
      fixedJoint j;
      if(this->getJoint(model1, link1, model2, link2, j)){
          this->detach_vector.push_back(j);
          ROS_INFO_STREAM("Detach joint request pushed in the detach vector");
          return true;
      }
    return false;
  }

(the function getJoint just returns the joint between the two links)

Create a callback function "OnUpdate" that will be executed each time just before the physics engine is updated, and where you effectively execute the call to Detach(), if there are any in the detach_vector:

void YourClass::OnUpdate()
  {
    if(!this->detach_vector.empty())
    {
      ROS_INFO_STREAM("Received before physics update callback... Detaching joints");
      std::vector<fixedJoint>::iterator it;
      it = this->detach_vector.begin();
      fixedJoint j;
      while (it != this->detach_vector.end())
      {
        j = *it;
        j.joint->Detach();
        ROS_INFO_STREAM("Joint detached !");
        ++it;
      }
      detach_vector.clear();
    }
  }

Hope this can help.

Best regards.

edit flag offensive delete link more

Comments

Thank you for investigating into the underlying cause of the seg fault and posting a solution!

mz gravatar imagemz ( 2020-06-24 23:39:44 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2015-12-03 03:34:29 -0600

Seen: 1,499 times

Last updated: Dec 24 '15