Reinforcement Learning: Segmentation fault using rllab+gym+gym-gazebo
Hello,
I am currently trying to perform a reinforcement learning task using gazebo and turtlebot for my simulation environment. To do this I am using gym-gazebo which is an extension of OpenAI gym meant to incorporate gazebo as a new environment. To train my agent I am using rllab which contains already implemented RL algorithms and is fully compatible with gym.
I have implemented my environment class by modifying some of the example scripts in gym_gazebo/envs/ and I am training it by modifying a simple example script in rllab and importing the environment I designed. These are the outputs in my command line:
[##############################] | ETA: 00:00:00
Total time elapsed: 00:01:52
2017-03-15 13:11:54.271304 PDT | itr #0 | fitting baseline...
2017-03-15 13:11:54.273932 PDT | itr #0 | fitted
=: Compiling function f_loss
done in 0.355 seconds
2017-03-15 13:11:54.642680 PDT | itr #0 | computing loss before
2017-03-15 13:11:54.653648 PDT | itr #0 | performing update
2017-03-15 13:11:54.653848 PDT | itr #0 | computing descent direction
=: Compiling function f_grad
done in 0.648 seconds
=: Compiling function f_Hx_plain
done in 1.351 seconds
2017-03-15 13:11:56.693011 PDT | itr #0 | descent direction computed
=: Compiling function f_loss_constraint
done in 0.447 seconds
2017-03-15 13:11:57.164723 PDT | itr #0 | backtrack iters: 1
2017-03-15 13:11:57.164967 PDT | itr #0 | computing loss after
2017-03-15 13:11:57.165123 PDT | itr #0 | optimization finished
=: Compiling function constraint
done in 0.282 seconds
2017-03-15 13:11:57.471295 PDT | itr #0 | saving snapshot...
2017-03-15 13:11:57.471541 PDT | itr #0 | saved
2017-03-15 13:11:57.472722 PDT | ----------------------- --------------
2017-03-15 13:11:57.472902 PDT | Iteration 0
2017-03-15 13:11:57.473060 PDT | AverageDiscountedReturn -38.7691
2017-03-15 13:11:57.473211 PDT | AverageReturn -58.9338
2017-03-15 13:11:57.473358 PDT | ExplainedVariance 3.88735e-11
2017-03-15 13:11:57.473505 PDT | NumTrajs 40
2017-03-15 13:11:57.473650 PDT | Entropy 2.83788
2017-03-15 13:11:57.473795 PDT | Perplexity 17.0795
2017-03-15 13:11:57.473941 PDT | StdReturn 27.7234
2017-03-15 13:11:57.474086 PDT | MaxReturn -8.95798
2017-03-15 13:11:57.474231 PDT | MinReturn -117.026
2017-03-15 13:11:57.474376 PDT | AveragePolicyStd 1
2017-03-15 13:11:57.474520 PDT | LossBefore -1.60982e-17
2017-03-15 13:11:57.474664 PDT | LossAfter -0.0103623
2017-03-15 13:11:57.474809 PDT | MeanKL 0.00962472
2017-03-15 13:11:57.474963 PDT | dLoss 0.0103623
2017-03-15 13:11:57.475108 PDT | ----------------------- --------------
0% 100%
[##############################] | ETA: 00:00:00
Total time elapsed: 00:02:01
2017-03-15 13:13:59.157416 PDT | itr #1 | fitting baseline...
2017-03-15 13:13:59.159324 PDT | itr #1 | fitted
2017-03-15 13:13:59.173532 PDT | itr #1 | computing loss before
2017-03-15 13:13:59.185539 PDT | itr #1 | performing update
2017-03-15 13:13:59.185756 PDT | itr #1 | computing descent direction
2017-03-15 13:13:59.225501 PDT | itr #1 | descent direction computed
2017-03-15 13:13:59.238288 PDT | itr #1 | backtrack iters: 0
2017-03-15 13:13:59 ...