Reinforcement learning competitition and benchmarking event

To be held in conjunction with ICML 2006


8-9: Breakfast
9-9:30: Summary of competition: Tasks and results
9:30-10:50: Presentations from the participating teams and discussion
10:50-11:20: Coffee break
11:20-12:30: General discussion: New problems, the future of the RL competition
12:30: Lunch

There will be no afternoon session.

Competition - June 27

Please send your executable agent to The benchmarking will start Tuesday morning, 9am. Due to some technical problem, unfortunately we will not be able to benchmark over the Internet this time. Hence, we will be running your agent on our machine.

For the Octopus task, we will use 8 compartments and two problem instances, one in which the goal is easier to reach and one in which it is harder. The reward is -1 per time step, with no discounting. We will record performance during 10000 training episodes, and then during 200 testing episodes, in which the grredy policy should be used. We will average the results over 30 runs. The episodes will be truncated at 10000 time steps.

For the mountain-car, we will use a sensory delay value of 10. We will record performance during 5000 training episodes, then udring 100 test episodes, using the greedy policy. Results will be averaged over 30 runs. The episodes will be truncated at 5000 time steps.

For the Cat and Mouse, we will be using 3 different task configurations.

Please also send a very brief (2-page max) description of your agent to

Organizers: Shie Mannor and Doina Precup, McGill University

Advisory committee

Technical organization committee


The code distributions and specifications are now available for: For all the environments, the simulator communicates with RL agents using sockets. This has been done in order to be able to benchmark agents remotely, without needing the agent and environment simulator to run on the same machine. Of course, during the training process, the teams can use the same machine as well. The communication protocol mimicks the one used in RL-Glue framework. However, instead of having a benchmark which essentially runs both the environment and the agent, we use the environment as a server. The agent connects to the environment as a client. The communication protocol is common in terms of the types of messages exchanged, and similar to RL-Glue. The content of the messages however has to be task-specific, and is described on each tasks's web page. Random agents in C, Java and Python are available, which implement this protocol. Any programming language that supports sockets should work as well. The blackjack and cart-pole domains arre still in the original form using pipes. Modified socket versions will be posted by May 25. The current versions are available at: RL repository


In the last two years, the reinforcement learning community has moved towards the establishment of standard benchmark problems. It has been widely acknowledge that the community would benefit not only from having a collection of such problems, but also from having competitive events.

The first reinforcement learning benchmarking event was held in conjunction with NIPS 2005 and was a big success, attracting an international field of more than 30 participants. Our goal is to keep this momentum, by organizing a competition to be held in conjunction with ICML 2006.

The competition will focus on controlling an octopus arm. This is a new, large-scale domain. It has several parameters which can be used to generate several problem instances.

Additionally, we will run evaluations for several well-known domains: Blackjack, Cat-mouse, Mountain-car, Cart-pole.

Unlike in the previous event, we will use a client-server architecture, with the environment server running at McGill. The benchmarking will be performed with the environment server at McGill and the clients running on their own machines.

Additionally, there will be a demonstration event, featuring tasks that may be used in the next competition.

Small prizes will be awarded.

Important dates

Note that all participants in the competition must register for the workshop

Call for submissions

For all teams who participate in the event, we solicit a 2-page submission (in ICML format) describing the approach taken. All teams will have poster space to present their approach. The top ranking teams will also get an oral presentation. Additionally, we solicit 2-page opinion papers on good domains to use for future events, as well as how such events should be organized in the future.