Schedule
8-9: Breakfast
9-9:30: Summary of competition: Tasks and results
9:30-10:50: Presentations from the participating teams and discussion
10:50-11:20: Coffee break
11:20-12:30: General discussion: New problems, the future of the RL competition
12:30: Lunch
There will be no afternoon session.
Please send your executable agent to dan.cast@mail.mcgill.ca. The benchmarking will start Tuesday morning, 9am. Due to some technical problem, unfortunately we will not be able to benchmark over the Internet this time. Hence, we will be running your agent on our machine.
For the Octopus task, we will use 8 compartments and two problem instances, one in which the goal is easier to reach and one in which it is harder. The reward is -1 per time step, with no discounting. We will record performance during 10000 training episodes, and then during 200 testing episodes, in which the grredy policy should be used. We will average the results over 30 runs. The episodes will be truncated at 10000 time steps.
For the mountain-car, we will use a sensory delay value of 10. We will record performance during 5000 training episodes, then udring 100 test episodes, using the greedy policy. Results will be averaged over 30 runs. The episodes will be truncated at 5000 time steps.
For the Cat and Mouse, we will be using 3 different task configurations.
Please also send a very brief (2-page max) description of your agent to dprecup@cs.mcgill.ca.
Organizers: Shie Mannor and Doina Precup, McGill University
Advisory committee
In the last two years, the reinforcement learning community has moved towards the establishment of standard benchmark problems. It has been widely acknowledge that the community would benefit not only from having a collection of such problems, but also from having competitive events.
The first reinforcement learning benchmarking event was held in conjunction with NIPS 2005 and was a big success, attracting an international field of more than 30 participants. Our goal is to keep this momentum, by organizing a competition to be held in conjunction with ICML 2006.
The competition will focus on controlling an octopus arm. This is a new, large-scale domain. It has several parameters which can be used to generate several problem instances.
Additionally, we will run evaluations for several well-known domains: Blackjack, Cat-mouse, Mountain-car, Cart-pole.
Unlike in the previous event, we will use a client-server architecture, with the environment server running at McGill. The benchmarking will be performed with the environment server at McGill and the clients running on their own machines.
Additionally, there will be a demonstration event, featuring tasks that may be used in the next competition.
Small prizes will be awarded.
For all teams who participate in the event, we solicit a 2-page submission (in ICML format) describing the approach taken. All teams will have poster space to present their approach. The top ranking teams will also get an oral presentation. Additionally, we solicit 2-page opinion papers on good domains to use for future events, as well as how such events should be organized in the future.