Many parameters of the octopus arm simulator are controlled by a settings file provided at startup, which allows extensive configuration of the simulator. This document describes how to modify the provided settings file to create different learning tasks in the octopus arm environment, with different goals and different initial states.
Note: a default settings file is provided with the distribution. You should only modify this file when you want to change settings. Writing a new settings file “from scratch” is difficult and is not recommended.
Settings files are XML files with settings as the root element.
They contain two major sections: an environment section that defines the learning task, and a constants section that specifies the constants used in the physical simulation. These may occur in any order.
environment sectionThe environment section defines the learning task that will be presented to the agent. It consists of two subsections: a goal definition subsection that defines the agent’s goals, and an arm subsection that defines the configuration of the arm. These may occur in any order. The name of the goal definition section depends on the type of goal being specified.
The goal definition subsection specifies the exact goals that the agent will be rewarded for accomplishing, as well as other parameters related to the learning task. Two basic types of goals are available: touching targets, or pushing food into a mouth. Only one may be specified at a time.
For any type of goal, the goal definition must carry the timeLimit and stepReward attributes. timeLimit specifies the maximum length of an episode, while stepReward specifies the agent’s reward at every timestep that no “sub-goal” is accomplished. See below for examples.
targetTaskThis type of goal requires the agent to make the arm touch one or more targets. To specify such a goal, place a targetTask subsection inside the environment section.
In the simplest goal of this type, there is only one target, and the learning episode ends once the arm touches it. Here is an example taskTarget subsection that specifies such a goal:
<targetTask timeLimit="1000" stepReward="-0.01">
<target position="3 3" reward="10" />
</targetTask>
The position attribute on the target element specifies the x-y coordinates of the target, and the reward attribute specifies the reward obtained from touching it.
As a more complicated goal, the arm could be required to touch a sequence of targets in order; the episode ends once the last target is touched. For example:
<targetTask timeLimit="1000" stepReward="-0.01">
<sequence>
<target position="11 2" reward="2" />
<target position="13 0" reward="2" />
<target position="11 -2" reward="2" />
</sequence>
</targetTask>
Alternately, the could be required to touch any one of a set of targets; the epsiode ends once any of them is touched. For example:
<targetTask timeLimit="1000" stepReward="-0.01">
<choice>
<target position="7 2" reward="2" />
<target position="13 3" reward="4" />
<target position="7 -6" reward="6" />
</choice>
</targetTask>
To create more elaborate “compound” goals, sequence and choice elements can be nested. For example:
<targetTask timeLimit="1000" stepReward="-0.01">
<choice>
<sequence>
<target position="9 1" reward="2" />
<target position="9 3" reward="2" />
<sequence>
<target position="13 3" reward="3" />
</choice>
</targetTask>
In this example, the arm must either touch the target at (13,3), or the target at (9,1) followed by the one at (9,3). Note that once the (9,1) target has been touched, the (13,3) target is essentially deactivated, and the agent can no longer be rewarded for it.
As a general rule, the targetTask subsection contains a single “objective”, which may be simple (target) or compound (sequence or choice). Compound objectives further contain any number of simple or compound objectives.
foodTaskThis goal requires the agent to push several pieces of food into an elliptical mouth; the episode ends once all the pieces have been “eaten”. Here is an example foodTask element:
<foodTask>
<mouth x="5" y="3.5" width="2" height="2" />
<food position="5 3" velocity="0 0" mass="1" reward="5" />
<food position="6 3" velocity="0 0" mass="2" reward="7" />
</foodTask>
The x, y, width, and height attributes on the mouth element define the bounding box of the mouth. The mouth is always elliptical; other shapes cannot be specified. Note that the mouth element must occur before any food elements.
Each food element defines one piece of food. The position attribute specifies the x-y coordinates of its initial position, and velocity specifies the x and y components of its initial velocity. mass specifies the piece’s mass. (The mass does not use any particular measurement unit.) Food mass values are typically between 1 and 5, and should not exceed about 10. Using large values may produce unexpected behaviour. reward specifies the reward obtained when the food enters the mouth.
arm subsectionThe arm subsection specifies the arm’s shape, physical properties, and initial state. Here is an example defining a three-compartment arm:
<arm>
<nodePair>
<upper position="0 1" velocity="0 0" mass="1" />
<lower position="0 0" velocity="0 0" mass="1" />
</nodePair>
<nodePair>
<upper position="1 1" velocity="0 0" mass="0.99" />
<lower position="1 0" velocity="0 0" mass="0.99" />
</nodePair>
<nodePair>
<upper position="2 1" velocity="0 0" mass="0.98" />
<lower position="2 0" velocity="0 0" mass="0.98" />
</nodePair>
<nodePair>
<upper position="3 1" velocity="0 0" mass="0.97" />
<lower position="3 0" velocity="0 0" mass="0.97" />
</nodePair>
</arm>
Each of the nodePair elements defines a connected pair of vertices on the arm, one on the ventral (“upper”) side, and one on the transversal (“lower”) side. The two vertices form a transversal (“vertical”) edge. Except for the first and last pairs, each such edge is shared by two adjacent compartments of the arm.
The upper and lower elements correspond to the upper and lower vertices of a pair. The position and velocity attributes specify the x and y components of each vertex’s initial position and velocity, respectively. mass specifies the vertex’s mass.
Be careful when modifying the arm, as some configurations can produce unexpected behaviour. In particular, do not create unusual or degenerate shapes, and avoid using values that are orders of magnitude different from the ones provided.
constants sectionThe constants section specifies numerical values for all the constnts used in the physical simulation. Modifying these constants is generally not recommended; they have already been tweaked to produce good behaviour, and even slight variations can sometimes produce unexpected results.
The structure of the constants section should be self-explanatory. The constants may be given in any order. The physics description document provides a description of the significance of each constant.