# Background for Bit Commitment

## Contents

This is part of a collection of pages which describe a specific protocol for a general cryptographic task called bit commitment. On this page you will find background helpful in understanding the demonstration and explanation of the protocol. This protocol, due to Crépeau, uses the properties of noisy channels and is unconditionnally secure with high probability. Puzzled? Lost? Read on to know more.

## What is Bit Commitment?

[Top] [Overview] [Main demo] [Bibliography]

Consider the following instructive story (from [Kil89]):

Alice and Bob were going to play a game and they wanted to decide who would go first. To be fair, they decided to flip a coin, but they soon realized that neither of them had a coin to flip. Alice suggested a solution: "In our heads, we each pick either a 0 or a 1 at random. Then you tell me what you picked and I tell you what I picked. If we both pick the same number, you get to go first. If we pick different numbers, I get to go first."

Bob thought about this for a moment, looked at Alice suspiciously and replied "But you could change your mind when you hear what I picked. How would I know that what you tell me is really what you originally picked?".

Alice frowned at how easily her seemingly cunning plan had been detected. However, she soon had another (somewhat more honest) idea. "I've got it!" she exclaims. "I'll write my choice down on a piece of paper, put it in an envelope and give it to you. Then you tell me your choice before we open the envelope together. After that we can open the envelope. If the two choices are the same, you get to go first, otherwise I go first. Is this fair enough?"

After thinking it over, Bob agreed. They ran the protocol as described (for it was fair) and played happily ever after.

The end.

What Alice suggested is a simple form of bit commitment. Alice has a secret bit (i.e. a 0 or a 1) in her head which she wants to commit to Bob. This means she wants to be able to reveal it to him later in such a way that (1) he will be convinced that the bit she reveals later is really the bit she has in mind now and so that (2) Bob cannot guess what her bit is before she reveals it to him. Formally, a Bit Commitment protocol has 2 parts:

1. Committing: Alice commits her secret bit
2. Unveiling: Alice reveals her secret bit to Bob (later)

As mentioned earlier, there are 2 essential aspects to these protocols:

1. Once Alice commits her bit, she cannot change her mind and reveal to Bob a different bit when she comes to unveil.
In our example, once she put the paper in the envelope and gave it to Bob, she could not change the contents of the envelope.
2. Bob cannot know what Alice's bit is until she unveils it.
In our example, Bob couldn't know the contents of the envelope until they opened it together.

Remark: The names Alice and Bob are standard in cryptography (it's better than "Person A", "Person B"). We will use them throughout this demonstration.

## Why is Bit Commitment Useful?

[Top] [Overview] [Main demo] [Bibliography]

Traditionnally, cryptography has been the science of protecting communications from eavesdropping and tampering. However, recently cryptography has branched off into a large number of other cryptographic tasks. These usually involve two or more parties, some of which may be dishonest.

Bit commitment is a building block used to construct and execute such complex crytpographic tasks. Here are some applications in which bit commitment is useful.

Coin-flipping
Two parties want to generate a common sequence of bits which they both know to be completely random. The story above is an example of just such a situation. The solution Alice and Bob agreed on will always work assuming that the two parties have a way of executing Bit Commitment and at least one of the two parties is honest (if neither is honest, we don't care what happens: they deserve everything they get).
Multi-party computation
Several parties want to calculate a result (which they will all learn) based on input they want to keep private. For example, this could allow electronic voting: each citizen would have a private input (his vote) and the result would be total number of votes for each candidate. Another example is compilation of statistics in which the census bureau learns only the global statistical information it needs (i.e. average proportion of revenue donated to charity by Canadians) without learning anything information which individuals wish to keep private.
Zero-knowledge proofs
Alice wants to prove to Bob that she knows some key piece of information without actually revealing it to him. For example, she might want to convince him that she knows a ten-line proof of the Reimann Hypothesis without him learning anything more than the length of her proof.

We will not get into the details of how these last two tasks are actually performed, but you can see that the variety of applications in which bit commitment is useful justifies trying to find ways to implement it as securely as possible.

For more information on bit commitment, see [Sch96], [Kil89], [CGT95], and [BCC88]

## Why This Protocol?

[Top] [Overview] [Main demo] [Bibliography]

The protocol we describe here is interesting for two main reasons:

• It is unconditionally secure
• It requires nothing more than a noisy channel

These two points are described below.

## Computational vs. Unconditional Security

[Top] [Overview] [Main demo] [Bibliography]

Clearly, it is important that any cryptographic protocol be secure (in the case of bit commitment security covers the two aspects defined above). However, security can be defined in different ways.

Computational security
Computational security is based on the amount of work required to crack a cryptographic protocol (i.e. cheat) by the best currently known methods. A protocol is considered computationally secure if cracking the protocol would require an infeasible amount of time. For example, many protocols based on number theory can be broken if the person trying to cheat is able to factor large integers. However, the best known methods would require years to factor an integer of only a few hundred digits.

Computational security is limited, though, by the evolution of new algorithms and technologies. What would take thousands of years to do on today's computers may take only a few hours on the computers of tomorrow. A system or protocol which is computationally secure is by no means guaranteed to remain so.

Unconditional security
Also called information-theoretic security, unconditional security is based on information theory, developped by Shannon in the 1940's and 50's. Proofs of unconditional security make no assumptions about an adversary's computing power and so they can't "go out of date" the way that proofs of computational security can.

In the case of bit commitment, traditional unconditional security would require that Alice have absolutely no way of changing her mind about her bit and that Bob have absolutely no way of learning anything at all about Alice's secret bit before she reveals it to him. However, this is not practical in our situation, because we rely on a noisy channel (described below) which is inherently unpredictable. There is always a small chance of failure.

Probabilistic security
For an unconditionally secure protocol to be practical, we must relax our definition of security. Specifically, we must allow for an arbitrarily small probability of error. This means making protocols which can be slightly modified so that the probability of error can be brought as low as we want it. This usually involves a tradeoff in terms of speed - the smaller the probability of error, the longer it takes to run the protocol.

It is important to realize that the security of such a protocol is still information-theoretic in that we make no assumptions about the computing power of someone trying to cheat. We simply allow a probability of error which can be as tiny as we want to make it.

The protocol described in these pages is of this last type: by choosing certain parameters appropriately, we can make the probability of error as small as we like, while making no assumptions about the intelligence or the computing power of Alice and Bob. The only assumptions made are about the properties of the noisy channel connecting Alice to Bob.

For further explanation of the distinction between computational and unconditional security, see [Mau93a] and [Mau93b].

## Noisy Channels

[Top] [Overview] [Main demo] [Bibliography]

This protocol is also interesting because of its use of noisy channels. Most of the channels we use daily for communication such as telephone lines and fiber optic cables are imperfect: they introduce errors into the information they transmit. However, they can be made reliable by the use of error-correcting codes (discussed below). This is normally incorporated into the hardware used for transmission and so we do not notice it. However, if used directly (without error-correcting devices), the properties of noisy channels can be used for cryptographic purposes.

The use of noisy channels in cryptography was introduced by Wyner (and later generalized by Maurer in [Mau93b]) in the context of protecting communications against eavesdropping. This protocol is significant in that it is the first use of noisy channels for a multy-party protocol.

In our protocol, the noisy channel is used by sending a string of bits (which somehow represent Alice's secret bit) from Alice over the noisy channel to Bob, who receives a corrupted version. He gets just enough information to make sure Alice isn't cheating when she unveils her bit but not enough information to guess the value of her bit.

The specific model used for the channel is explained below.

## The Binary Symmetric Channel (BSC)

[Top] [Overview] [Main demo] [Bibliography]

The mathematical model generally used for a noisy channel is the binary symmetric channel (BSC). It is a channel which transmits bits (0's or 1's) only. When a bit is sent there is a fixed probability that the bit will be changed. This probability is often denoted by p. The probability that a given bit will be flipped is independent of the actual value of the bit and of what happenned to the bits which preceded it. It's important to note that neither the sender nor the receiver knows which bits get flipped and which are sent faithfully.

## BSC's and the Law of Large Numbers

[Top] [Overview] [Main demo] [Bibliography]

From a mathematical viewpoint, the binary symmetric channel is a good model because of its simplicity. Even though we don't know what happens to each bit which goes through the channel, we know that on average, the proportion of errors which occur (i.e. the number of errors divided by the total length of the message) will be about p (the probability of error).

The following charts give the probability that a given number of errors will happen, given that p = 0.3 and we send 50, 250 or 1000 bits through the channel. Notice the peaks centered at 15 (= 0.3 * 50), 75 (= 0.3 * 250) and 300 ( = 0.3 * 1000).

Probablilty that a given number of errors will occur over a BSC of error probability 0.3   n = 50n = 250n = 1000

An important theorem in statistics, called the law of large numbers, tells us that when you send many bits through the channel, the probability that the proportion of errors will be very different fro m p is very small. In fact, as the number of bits sent through the channel increases, that probability (that the proportion of errors won't be close to p) gets small very quickly. This result is the key to preventing Alice from cheating in the bit commitment protocol. Exactly why is explained in the main demonstration page.

## Distance of Bit Strings

[Top] [Overview] [Main demo] [Bibliography]

A convenient notion that is important for understanding why this protocol works is that of distance of strings. If we have two strings of bits of the same length, the distance between them is the number of positions in which they differ. For example, the distance between 00011 and 10101 is 3.

We can restate the point of the previous section by saying that if we send a string of length n through a BSC of error probability p we expect the distance between what was sent and what was received to be roughly p*n.

Say Alice sends an n-bit string called X through a BSC of error probability p to Bob, who receives X'. With very high probability, the distance between X and X' will be between (p-e)n and (p+e)n, where e is some very small number. If we picture all possible n-bit strings as points in the plane, then we can say Alice expects Bob's bits to be somewhere on the red circle in the picture on the right:

The real situation What Alice sees  Conversely, Bob, who sees only X', expects Alice's original word X to be on the blue circle in this picture:

Bob expects Alice's codeword x to be on the blue circle Remark: These pictures can be a little misleading because they flatten the situation out into 2 dimensions. All the same, they provide a decent mental image of what is really happening.

## Operations on Bit Strings

[Top] [Overview] [Main demo] [Bibliography]

There are a few operations it is often convenient to do on bit strings. They are based on the two following bit operations:

AND
The AND of 2 bits is their product (the result of multiplying them together). 1 AND 1 = 1, but 1 AND 0 = 0.
XOR
The XOR of several bits is 0 if their sum is even and 1 if their sum is odd. For example, the XOR of 0,1,1,0 and 1 gives 1 (because their sum is 3). The XOR of 1 and 1 gives 0 (because their sum is 2). XOR is addition modulo 2.

Using these 2 operations, we can define (for 2 strings u and v of the same length):

Bitwise XOR ( u + v )
The bitwise XOR of two bit strings is a third string made by taking hte XOR of the first bit of each string, the XOR of the second bit of each string, etc. 10011 + 01010 = 11001.
Bitwise AND
The bitwise AND of two bit strings is a third string of the same length formed by taking the AND of first two strings bit by bit. The bitwise AND of 10011 and 01010 is 00010.
Parity
The parity of a string is simply the XOR of all the bits in the string. 01101 has parity 1 while 11110 has parity 0.
Scalar product ( u · v )
The scalar product of two strings is the parity of their bitwise AND. 11110 · 01010 = 0.

If you look at AND and XOR as defining multiplication and addition of bits, then the scalar product defined above is exactly the same as the scalar product taught in linear algebra classes except using bits instead of real numbers.

## Error-Correcting Codes

[Top] [Overview] [Main demo] [Bibliography]

In our context, a code of length n is just a subset of all the possible strings of bits of length n. What makes a code "able to correct errors" is the fact that the codewords (strings) are chosen so no two codewords resemble each other too much. The best codes are the ones for which any pair of codewords will differ in quite a few positions. This means that when a codeword is sent through a noisy channel, the chances that the receiver will receive another codeword are quite slim and so the errors can be detected.

Based on that idea we can define the minimum distance of a code to be the distance (i.e. number of differences) between the two closest codewords. If the minimum distance of a code is 10, we can detect up to 9 errors when a codeword is sent through a noisy channel.

A linear code is a code (set of bit strings) with a special additional property. The bitwise XOR of any two codewords gives another codeword. An example of a linear code of length 6 is the following set of words (strings):

```000000
000111
111000
111111
```

A linear code with up to 2k codewords can be created based on a list of k different words of the same length by taking all the possible words which can be found by XORing together words from the list. For example, the code listed above was generated by the two words 000111 and 111000. When the code has 2k codewords, we call k the dimension of the code.

Traditionally we write the list in the form of a matrix (a table of 0's and 1's) where each word in the list is one row of the matrix. It turns out that choosing a matrix more or less at random yields codes with a very good minimum distance. This is the construction used in this protocol to create codes.

You now have enough background to understand the protocol and why it works. To go straight to the demonstration, click here. If you want a little more formal mathematical background, read on !

# For the mathematically inclined...

[Top] [Overview] [Main demo] [Bibliography]

Advisory: The following sections contain strong mathematical language.

### The Algebra of Bits

The operations AND and XOR as defined in Operations on Bits equip the set {0,1} with the structure of a algebraic field. This means that we can define a linear algebra on bits the same way we can for real numbers, complex numbers, or the elements of any other field. From now on, we will see strings as row vectors of bits.

### Linear Codes

Seen in this light, a linear code of length n is simply a linear subspace of the vector space {0,1}n. That is, it is closed under addition (XOR) and under multiplication by scalars (there are only two scalars here: 0 and 1. Multiplication by 0 yields the zero vector and multiplication by 1 leaves the vector untouched). This has two interesting consequences:

• The zero vector is always a codeword in a linear code.
• All linear codes are of size 2k for some integer k.

This last statement is true because a linear subspace can always be seen as the span of some basis, say of size k. Each vector in the subset can uniquely be written as a linear combination of the k basis vectors. There are exactly 2k such linear combinations and hence 2k codewords. k is the dimension of the code.

In fact, if we take the vectors of a basis of our subspace and write them down as the rows in a k × n matrix, we get what is called a generator matrix for the linear code. When seen as a linear transformation from {0,1}k into {0,1}n a genrator matrix yields the linear code as its image. This means that codewords can be quickly genrated by taking row vectors of length k and multiplying them by a generator matrix.

Another useful matrix is a parity matrix of a linear code. This is a (n-k) × n matrix such that the code is its null space. A parity matrix is useful because it allows one to very quickly verify if a given string is a codeword by simply multiplying the matrix by the vector (seen as a column vector).

### Random Linear Codes

One construction of linear codes involves picking a k × n generator matrix at random and taking the code to be simply the span of the rows of that matrix. This construction gives codes which are very likely to have good minimum distance.

A simple improvement on this method involves taking a generator matrix whose first part is a k × k identity matrix and whose second half is taken at random. This still yields good minimum distance while providing the added bonus of being very easy to handle. The proof of these results, based on a counting argument, is given in an analysis of the protocol [DVI, PS].

## More on the Law of Large Numbers

The name "law of large numbers" is attributed to several theorems having to do with the behaviour of very large collections of related random variables. In our case, we deal with a "law of large numbers" generally attributed to Bernstein. It describes the behaviour of the sum of a large number of random variables for which we know both the expected value and the variance (or, equivalently, the standard deviation).

In the case of variables taking on the value 1 with probability p and 0 with propability 1 - p, the probability that the actual average value of n such variables will stray from its expected value (p ) by any fixed amount is exponentially small in n.

This theorem is of capital importance to the security of the protocol presented here because it allows us to predict with great accuracy the behaviour of the binary symmetric channel. For more details on how it is used in the proof of the security of the protocol, see the analysis provided in the bibliography.