Reinforcement Learning: Segmentation fault using rllab+gym+gym-gazebo

asked 2017-03-15 15:12:45 -0600

MrRed gravatar image

Hello,

I am currently trying to perform a reinforcement learning task using gazebo and turtlebot for my simulation environment. To do this I am using gym-gazebo which is an extension of OpenAI gym meant to incorporate gazebo as a new environment. To train my agent I am using rllab which contains already implemented RL algorithms and is fully compatible with gym.

I have implemented my environment class by modifying some of the example scripts in gym_gazebo/envs/ and I am training it by modifying a simple example script in rllab and importing the environment I designed. These are the outputs in my command line:

[##############################] | ETA: 00:00:00
Total time elapsed: 00:01:52
2017-03-15 13:11:54.271304 PDT | itr #0 | fitting baseline...
2017-03-15 13:11:54.273932 PDT | itr #0 | fitted
=: Compiling function f_loss
done in 0.355 seconds
2017-03-15 13:11:54.642680 PDT | itr #0 | computing loss before
2017-03-15 13:11:54.653648 PDT | itr #0 | performing update
2017-03-15 13:11:54.653848 PDT | itr #0 | computing descent direction
=: Compiling function f_grad
done in 0.648 seconds
=: Compiling function f_Hx_plain
done in 1.351 seconds
2017-03-15 13:11:56.693011 PDT | itr #0 | descent direction computed
=: Compiling function f_loss_constraint
done in 0.447 seconds
2017-03-15 13:11:57.164723 PDT | itr #0 | backtrack iters: 1
2017-03-15 13:11:57.164967 PDT | itr #0 | computing loss after
2017-03-15 13:11:57.165123 PDT | itr #0 | optimization finished
=: Compiling function constraint
done in 0.282 seconds
2017-03-15 13:11:57.471295 PDT | itr #0 | saving snapshot...
2017-03-15 13:11:57.471541 PDT | itr #0 | saved
2017-03-15 13:11:57.472722 PDT | -----------------------  --------------
2017-03-15 13:11:57.472902 PDT | Iteration                   0
2017-03-15 13:11:57.473060 PDT | AverageDiscountedReturn   -38.7691
2017-03-15 13:11:57.473211 PDT | AverageReturn             -58.9338
2017-03-15 13:11:57.473358 PDT | ExplainedVariance           3.88735e-11
2017-03-15 13:11:57.473505 PDT | NumTrajs                   40
2017-03-15 13:11:57.473650 PDT | Entropy                     2.83788
2017-03-15 13:11:57.473795 PDT | Perplexity                 17.0795
2017-03-15 13:11:57.473941 PDT | StdReturn                  27.7234
2017-03-15 13:11:57.474086 PDT | MaxReturn                  -8.95798
2017-03-15 13:11:57.474231 PDT | MinReturn                -117.026
2017-03-15 13:11:57.474376 PDT | AveragePolicyStd            1
2017-03-15 13:11:57.474520 PDT | LossBefore                 -1.60982e-17
2017-03-15 13:11:57.474664 PDT | LossAfter                  -0.0103623
2017-03-15 13:11:57.474809 PDT | MeanKL                      0.00962472
2017-03-15 13:11:57.474963 PDT | dLoss                       0.0103623
2017-03-15 13:11:57.475108 PDT | -----------------------  --------------
0%                          100%
[##############################] | ETA: 00:00:00
Total time elapsed: 00:02:01
2017-03-15 13:13:59.157416 PDT | itr #1 | fitting baseline...
2017-03-15 13:13:59.159324 PDT | itr #1 | fitted
2017-03-15 13:13:59.173532 PDT | itr #1 | computing loss before
2017-03-15 13:13:59.185539 PDT | itr #1 | performing update
2017-03-15 13:13:59.185756 PDT | itr #1 | computing descent direction
2017-03-15 13:13:59.225501 PDT | itr #1 | descent direction computed
2017-03-15 13:13:59.238288 PDT | itr #1 | backtrack iters: 0
2017-03-15 13:13:59 ...
(more)
edit retag flag offensive close merge delete