School of Computer Science
Computer Science 308-655
Parallel and Distributed Simulation
Course meets on:Monday,Wednesday 11:30-1:00
Office hours: TBA
In the area of computer networks, an on-going effort is
being undertaken to provide a parallel simulation of the Internet. The intention of the
project is to provide a realistic test-bed for the development of new Internet protocols and to locate performance problems
in aras such as routing algorithms and congestion control algorithms. To date, it has been necessary to do experiments with
actual networks to achieve the same goals. A detailed parallel simulation provides an environment in which the experimental conditions can be controlled
and in which it is possible to replicate experiments.
Air-traffic simulation is making use of distributed simulation as is the design
of large manufacturing environments for VLSI wafers. The paralllel simulation of VLSI circuitry can give an enormous competitive advantage to a circuit manufacturer. This is an area
being actively pursued at McGill.
Military simulations are benefitting a great deal from advances in distributed simulation. A major goal in this area is the unification of
simulations of different aspects of combat together e.g. tank warfare, close air support and infantry warfare. The inclusion of live
exercises into the scenario then provides a formidable training environment. The SPEEDES simulation environment is being used in major
The High Level Architecture(HLA) has been proposed as the common framework for DoD simulation applications.
The fundamental design guideline of the HLA is that the local time management mechanism used in each simulation federate (simulation component)
should not be visible to other federates. Hence all forms of time management (event-driven, time-stepped, parallel discrete-event simulation and
real-time driven simulation..) may be linked together. The field of Distributed Interactive Simulation is devoted to these
advances. A counterpart to this military training simulation environment is a virtual laboratory
, in which scientists interact with a simulation in a virtual environment.
The scientist can make changes to a model of the system which he is studying, causing it to be rolled back to the
point which is dictated by the changes and to resume its execution from that point. An example of this environment
is the CAVE Automatic Virtual Environment.
Like any distributed program, a distributed simulation is composed of processes which communicate with one another. This communication may occur
via shared memory in a shared memory multiprocessor or via message passing in a distributed environment. The processes
in distributed simulation are each intended to simulate a portion of the system being modeled and are referred to as Logical Processes
(LPs). An LP creates events, sends events to other LPs and receives events from other LPs in the course of a simulation. Associated with each
LP are input queues used to store messages from other LPs, state variables which reflect the state of the
physical process which is being modeled,
and a local simulation
clock. The purpose of the simulation clock is to capture the advance of simulation time at each LP.
The treatment of time in a distributed system is of fundamental importance to understanding and building distributed systems.
A distributed simulation poses a particularly vexing problem. In a uniprocessor, the events in a discrete event simulation are stored in a priority queue
nd simulated in the order of the smallest timestamp first. This is easily done in a uniprocessor because all of the events in the simulation are stored in
memory. In a distributed simulation, however, the events are spread out among the processors involved in the simulation.
A causality violation occurs
if we simulate an event at an LP and an event generated by
another LP with a smaller timestamp arrives afterwards in real time.
This late arrival can easily cause incorrect results in the simulation.
In a military simulation it matters in which order the two events "aim the gun" and "fire
the gun" are executed.
The central problem of distributed simulation is the development of synchronization algorithms which are capable of maintaining causality
and which do so with minimal overhead. This overhead can be in the form of increased memory demands
and increased execution time.
Two primary approaches to synchronization algorithms have been developed, the conservative and the optimistic classes of algorithms.
A conservative algorithm is characterized by its blocking behavior. In a conservative simulation, if one of the input queues at an LP is empty, the LP
blocks, awaiting a message from another LP. This behavior insures causality.
However, it does so a price. To begin with, when an LP suspends processing there may be an increase in the execution time of the
n addition, it is entirely possible for deadlocks to form. This occurs if a group of LPs is arranged in the form of
a cycle such that each LP in the cycle is awaiting a message from its predecessor in the cycle. More generally, a deadlock occurs if a knot
of LPs forms.
Hence, if a conservative synchronization algorithm is used in a distributed simulation, it becomes necessary to
either find means to avoid deadlocks or to detect and break them. There are a plethora of algorithms for each of these approaches.
The other major class of synchronization algorithm is known as optimistic, the prime example of which is
Time Warp. In optimistic algorithms,
LPs maintain one input queue. All of the events which arrive from other LPs are stored in the queue in
increasing order of their timestamps and are processed without any concern for the arrival of events with
smaller timestamps. If such a "straggler" event arrives at an LP, the LP restores its state just prior to the
arrival of the straggler and continues with the simulation from that point. This process is known as rolling back.
In order to restore previous states, the LP must checkpoint its state prior to the processing of an event.
It is also necessary to cancel messages which
were produced subsequent to the straggler. This is accomplished via the use
of so-called anti-messages, which cancel, or annihilate the original messages.
In this course, we plan to focus on the synchronization algorithms at the
heart of distributed simulation and on the applications of these algorithms
to real-world problems. The course will be conducted as a seminar course;
we will read papers prior to each class and dissect them in class, trying
to get a feeling for their strengths and weaknesses. There will be one
programming project which will involve the construction of a distributed
simulation on a network of workstations, making use of MPI
to connect them.
For a brief summary article on distributed simulation, you can consult
survey article written by me.
It is expected that the students maintain an adequate supply of
coffee for the professor during class.
A good time will be had by all.
The course will be in the form of a research seminar. This means
that students will be responsible for presenting papers and
also participating in discussions of the papers. Part of your grade
will be the extent and quality of this participation.
We will begin the course by covering the fundamentals of the area by going through a text,
and will then read selected papers in the area. The text is
Parallel and Distributed Simulation Systems, by Richard Fujimoto, Wiley Interscience Publsihers
It is available in the bookstore and may also be purchased at www.amazon.com.
The papers may be found at