Introduction to the problem
A good measure of distance between rhythmic patterns is essential to many basic tasks in music information retrieval (MIR). Rhythmic features can be used to help machines distinguish among musical genres, match queries from users against databases of recordings, and follow along with performers, whose real-world rhythms exhibit more variation than the idealised patterns notated on musical scores. Moreover, a large amount of research in music cognition has investigated human perceptions of rhythm, and these researchers can benefit from formal mathematical distances to use as seeds for or comparisons against perceptual measures of distance. This tutorial will focus on one particular distance measure, the cyclic swap distance, and an efficient algorithm for computing it developed by Yoan José Pinzón Ardila, Raphaël Clifford, and Manal Mohamed (2005), dubbed the ‘necklace swap problem’.
Notions and notations
We need to invoke a number of definitions and notations before learning about cyclic swap distances and the algorithms for computing them. First, we must define our notion of rhythm itself. For the purposes of this demonstration and the paper on which it is based, a rhythm is a pattern of strokes repeating at regularly spaced intervals of time. In the literature, there pattern are often called timelines.
· Rhythmic notation ·
There are three important visual representations of rhythm for this project.
Western notation
The most common form for notating such patterns when they are intended for performance by trained musicians in the West is Western musical notation. An example of the standard son rhythm, usually associated with the clave, appears below in Western notation:

Time unit box system
Because Western musical notation makes many cultural assumptions and is unfamiliar to many musicians and anthropologists across the world, a common alternative is the time unit box system (TUBS). In TUBS, rhythms are represented as o time-ordered series of boxes, which correspond to the smallest subdivision of the musical metre. The boxes corresponding to the metric moments of the rhythmic strokes are filled, the rest left empty. The son rhythm above, for example, would be transcribed:

Polygonal notation
Because the patterns we are studying repeat, we also can visualise them as polygons inscribed within a circle. Each box of TUBS is spread evenly around a circle, as if it were a clock, and the rhythmic strokes correspond to the vertices of the circumscribed polygon. The son rhythm looks as follows:

· Computational representations ·
The latter two styles of notation in particular suggest several representations appropriate for computation on rhythmic patterns.
Binary
Binary representations follow naturally from TUBS: filled boxes become ones and open boxes zeros. In binary notation, the son rhythm is:
1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0
Beat count
Sometimes this representation is abbreviated to a string of non-negative inteagers corresponding the beats with strokes:
0 3 6 10 12
Inter-onset intervals
To save even more space, the lengths of time between each stroke, known as inter-onset-intervals (IOIs), can be recorded instead of the complete binary representation:
3 3 4 2 4
These values are analagous to the lengths of the line segments in polygonal notation. This notation also has the advantage of encoding the total number of beats in the pattern as the sum of all IOIs.
Permutations
When rhythms are permuted, silent beats, or zeros in the binary notation, are skipped. The first permutation of the son rhythm is not:
0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 1
but:
1 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0
These work out to be ordinary permutations of the IOIs:
3 4 2 4 3
Rotations, on the other hand, do incorporate the silent beats.
· Measures of rhythmic similarity ·
Researchers have used many difference measures of distance (or similarity) between rhythmic timelines, including the Euclidean distance, Hamming distance, interval ratio distance, swap distance, chronotonic distance, and many perceptual meterics. See the Toussaint 2004 and Desain and Honing 2003 for more details.
This tutorial treats the cyclic swap distance. The standard swap distance is the minimum the number of swaps, or exchanges of neighbouring bits, necessary to transform one rhythm into another. The cyclic swap distance further minimises the swap distance over all possible rotations of each rhythm. In the simplest case, the number of beats and the number of strokes must be the same for both rhythms.
Algorithms
In the following discussion of algorithms for computing the cyclic swap distance, the number of beats in each timeline is denoted by n and the the number of strokes by k. The standard swap distance between two timelines can be computed in O(k) time. Refer to the paper for proofs of the lemmas referenced.
· Naïve algorithm ·
A mapping is a bijective function from the strokes of one timelines to the strokes of another. Only mappings that do not cross need to be considered (Lemma 3), yielding a total of k. The most naïve approach to computing the cyclic swap distance would be to check all n rotations for each of the k mappings for a total time of O(nk2).
naive(x, y)
Compute IOIs of x.
s* := infinity
for i := 0 to k-1 do
for j := 0 to n-1 do
xi := permutation i of x
yj := rotation j of y
if s* < d(xi, yj) then
s* := d(xi, yj)
return s*
Note that this computation time is quadratic in the number of strokes rather than the number of time units, which is essential for work with live performances (where the number of quantised ‘beats’ would be very large).
· Computation in cubic time ·
In fact, it is not necessary to check all n rotations (Lemma 2). It sufficient to check the k permutations of the second vector under each mapping, for a total time of O(k3).
cubic(x, y)
Compute IOIs of x and y.
s* := infinity
for i := 0 to k-1 do
for j := 0 to k-1 do
xi := permutation i of x
yj := permutation j of y
if s* < d(xi, yj) then
s* := d(xi, yj)
return s*
· Computation in quadratic time ·
It is possible to do still better (Lemma 1) by adjusting one timeline by the median of the residual vector of differences between the beat counts of all permutations and finding the minimum among only these k swap distances. Because the median can be computed in linear time, the total time for this variant is only O(k2).
quadratic(x, y)
Compute IOIs of x and y.
x0 := permutation 0 of x
s* := infinity
for h := 0 to k-1 do
yh := permutation h of y
ch := (yh - x0) - median(yh - x0)
if s* < sum(abs(ch)) then
s* := sum(abs(ch))
return s*
Examples
The applet below to demonstrates the quadratic-time algorithm. Experiment with different rhythmic pairs and permutations. For each permutation, an animation shows the adjustment dictated by the median of the residuals. You can experiment yourself or use the (re)compute button to test every permutation exhaustively and return to the best match; in both cases the current minimum swap distance reflects the best swap distance seen so far for the most recent pair of rhythms.
You can also tap your own pair of rhythms. Select ‘Record custom’ from one of the drop-down menus, set the number of beats and strokes, and then tap your timeline on the ‘Tap’button. You will need to tap the first stroke of the first repetition of the timeline, too, in order for the applet to calculate your tempo. Try setting a high number of beats (around 1000) to simulate real-time quantisation; these values also tend to illustrate how well the median adjustment minimises swap distance.
Conclusions
This solution to the necklace swap problem is efficient and robust to the fine quantisations of time that are necessary to study rhythms in live performances; because it also enables such a space-efficient implementation, it is superior to alternative distance metrics in many regards. The authors of the algorithm are interested in extending it to be able to compare rhythms with different numbers of strokes, i.e., vectors of different lengths. Its primary weakness is its inability to treat non-cyclic rhythms, although this limitation is less extreme than it may seem for the purposes of MIR: if the goal is to compute features for classification or retrieval, it may be safe to assume that rhythmic substrings of longer pieces of music are cyclic for the purpose of determining whether they share general rhythmic vocabulary. Follow the references below for more information.
References
- Ardila, Yoan José Pinzón, Raphaël Clifford, and Manal Mohamed. 2005. The necklace swap problem for rhythmic similarity. In Lecture Notes in Computer Science 3772, edited by M. Consens and G. Navarro. Berlin: Springer.
- Desain, Peter and Henkjan Honing. 2003. The formation of rhythmic categories and metric priming. Perception 32:341–65.
- Iliopoulos, Costas S., Gad M. Landau, Manel Mohamed, and Yoan Pinzon. 2005. Approximation algorithm for the cyclic swap problem. To appear in Proceedings of the Prague Stringology Conference.
- Toussaint, Godfried T. 2004. A comparison of rhythmic similarity measures. In Proceedings of the 5th International Conference for Music Information Retrieval. Barcelona, Spain: University Pompeu Fabra.
About the author
John Ashley Burgoyne is a doctoral student in music technology at the Schulich School of Music of McGill University. His research interests include optical music recognition and machine-learned music theories.