In that tutorial, you're making a model which can be placed anywhere in the world, so you don't need to worry about the world coordinates there.

Each pose is described with respect to the parent element. The hierarchy goes: model > link > visual/collision. That is, link poses are expressed with respect to the model, visual and collision poses are expressed with respect to the link.

The caster wheel is composed of a caster_collision and caster_visual, both under the chassis link. The chassis origin (the center of the box) is 0.1 m on the Z axis with respect to the model origin. The caster collision and visual are exactly on top of each other. The sphere origin (its center) is -0.15 m in the X axis and -0.05 m in the Z axis from the link origin. If you want to know the caster's pose with respect to the model, just add this to the link's pose and you get [-0.15 0 -0.95 0 0 0].

See this question for a bit more information about coordinate conventions in Gazebo.