This is part of a collection of pages which describe a specific protocol for a general cryptographic task called bit commitment. On this page you will find background helpful in understanding the demonstration and explanation of the protocol. This protocol, due to Crépeau, uses the properties of noisy channels and is unconditionnally secure with high probability. Puzzled? Lost? Read on to know more.
Consider the following instructive story (from [Kil89]):
Alice and Bob were going to play a game and they wanted to decide who would go first. To be fair, they decided to flip a coin, but they soon realized that neither of them had a coin to flip. Alice suggested a solution: "In our heads, we each pick either a 0 or a 1 at random. Then you tell me what you picked and I tell you what I picked. If we both pick the same number, you get to go first. If we pick different numbers, I get to go first."
Bob thought about this for a moment, looked at Alice suspiciously and replied "But you could change your mind when you hear what I picked. How would I know that what you tell me is really what you originally picked?".
Alice frowned at how easily her seemingly cunning plan had been detected. However, she soon had another (somewhat more honest) idea. "I've got it!" she exclaims. "I'll write my choice down on a piece of paper, put it in an envelope and give it to you. Then you tell me your choice before we open the envelope together. After that we can open the envelope. If the two choices are the same, you get to go first, otherwise I go first. Is this fair enough?"
After thinking it over, Bob agreed. They ran the protocol as described (for it was fair) and played happily ever after.
The end.
What Alice suggested is a simple form of bit commitment. Alice has a secret bit (i.e. a 0 or a 1) in her head which she wants to commit to Bob. This means she wants to be able to reveal it to him later in such a way that (1) he will be convinced that the bit she reveals later is really the bit she has in mind now and so that (2) Bob cannot guess what her bit is before she reveals it to him. Formally, a Bit Commitment protocol has 2 parts:
As mentioned earlier, there are 2 essential aspects to these protocols:
Remark: The names Alice and Bob are standard in cryptography (it's better than "Person A", "Person B"). We will use them throughout this demonstration.
Traditionnally, cryptography has been the science of protecting communications from eavesdropping and tampering. However, recently cryptography has branched off into a large number of other cryptographic tasks. These usually involve two or more parties, some of which may be dishonest.
Bit commitment is a building block used to construct and execute such complex crytpographic tasks. Here are some applications in which bit commitment is useful.
We will not get into the details of how these last two tasks are actually performed, but you can see that the variety of applications in which bit commitment is useful justifies trying to find ways to implement it as securely as possible.
For more information on bit commitment, see [Sch96], [Kil89], [CGT95], and [BCC88]
The protocol we describe here is interesting for two main reasons:
These two points are described below.
Clearly, it is important that any cryptographic protocol be secure (in the case of bit commitment security covers the two aspects defined above). However, security can be defined in different ways.
Computational security is limited, though, by the evolution of new algorithms and technologies. What would take thousands of years to do on today's computers may take only a few hours on the computers of tomorrow. A system or protocol which is computationally secure is by no means guaranteed to remain so.
In the case of bit commitment, traditional unconditional security would require that Alice have absolutely no way of changing her mind about her bit and that Bob have absolutely no way of learning anything at all about Alice's secret bit before she reveals it to him. However, this is not practical in our situation, because we rely on a noisy channel (described below) which is inherently unpredictable. There is always a small chance of failure.
It is important to realize that the security of such a protocol is still information-theoretic in that we make no assumptions about the computing power of someone trying to cheat. We simply allow a probability of error which can be as tiny as we want to make it.
The protocol described in these pages is of this last type: by choosing certain parameters appropriately, we can make the probability of error as small as we like, while making no assumptions about the intelligence or the computing power of Alice and Bob. The only assumptions made are about the properties of the noisy channel connecting Alice to Bob.
For further explanation of the distinction between computational and unconditional security, see [Mau93a] and [Mau93b].
This protocol is also interesting because of its use of noisy channels. Most of the channels we use daily for communication such as telephone lines and fiber optic cables are imperfect: they introduce errors into the information they transmit. However, they can be made reliable by the use of error-correcting codes (discussed below). This is normally incorporated into the hardware used for transmission and so we do not notice it. However, if used directly (without error-correcting devices), the properties of noisy channels can be used for cryptographic purposes.
The use of noisy channels in cryptography was introduced by Wyner (and later generalized by Maurer in [Mau93b]) in the context of protecting communications against eavesdropping. This protocol is significant in that it is the first use of noisy channels for a multy-party protocol.
In our protocol, the noisy channel is used by sending a string of bits (which somehow represent Alice's secret bit) from Alice over the noisy channel to Bob, who receives a corrupted version. He gets just enough information to make sure Alice isn't cheating when she unveils her bit but not enough information to guess the value of her bit.
The specific model used for the channel is explained below.
The mathematical model generally used for a noisy channel is the binary symmetric channel (BSC). It is a channel which transmits bits (0's or 1's) only. When a bit is sent there is a fixed probability that the bit will be changed. This probability is often denoted by p. The probability that a given bit will be flipped is independent of the actual value of the bit and of what happenned to the bits which preceded it.
It's important to note that neither the sender nor the receiver knows which bits get flipped and which are sent faithfully.
From a mathematical viewpoint, the binary symmetric channel is a good model because of its simplicity. Even though we don't know what happens to each bit which goes through the channel, we know that on average, the proportion of errors which occur (i.e. the number of errors divided by the total length of the message) will be about p (the probability of error).
The following charts give the probability that a given number of errors will happen, given that p = 0.3 and we send 50, 250 or 1000 bits through the channel. Notice the peaks centered at 15 (= 0.3 * 50), 75 (= 0.3 * 250) and 300 ( = 0.3 * 1000).
Probablilty that a given number of errors will occur over a BSC of error probability 0.3 | ||
---|---|---|
n = 50 | n = 250 | n = 1000 |
An important theorem in statistics, called the law of large numbers, tells us that when you send many bits through the channel, the probability that the proportion of errors will be very different fro m p is very small. In fact, as the number of bits sent through the channel increases, that probability (that the proportion of errors won't be close to p) gets small very quickly. This result is the key to preventing Alice from cheating in the bit commitment protocol. Exactly why is explained in the main demonstration page.
A convenient notion that is important for understanding why this protocol works is that of distance of strings. If we have two strings of bits of the same length, the distance between them is the number of positions in which they differ. For example, the distance between 00011 and 10101 is 3.
We can restate the point of the previous section by saying that if we send a string of length n through a BSC of error probability p we expect the distance between what was sent and what was received to be roughly p*n.
Say Alice sends an n-bit string called X through a BSC of error probability p to Bob, who receives X'. With very high probability, the distance between X and X' will be between (p-e)n and (p+e)n, where e is some very small number. If we picture all possible n-bit strings as points in the plane, then we can say Alice expects Bob's bits to be somewhere on the red circle in the picture on the right:
The real situation | What Alice sees |
---|
Conversely, Bob, who sees only X', expects Alice's original word X to be on the blue circle in this picture:
Bob expects Alice's codeword x to be on the blue circle |
---|
Remark: These pictures can be a little misleading because they flatten the situation out into 2 dimensions. All the same, they provide a decent mental image of what is really happening.
There are a few operations it is often convenient to do on bit strings. They are based on the two following bit operations:
Using these 2 operations, we can define (for 2 strings u and v of the same length):
If you look at AND and XOR as defining multiplication and addition of bits, then the scalar product defined above is exactly the same as the scalar product taught in linear algebra classes except using bits instead of real numbers.
In our context, a code of length n is just a subset of all the possible strings of bits of length n. What makes a code "able to correct errors" is the fact that the codewords (strings) are chosen so no two codewords resemble each other too much. The best codes are the ones for which any pair of codewords will differ in quite a few positions. This means that when a codeword is sent through a noisy channel, the chances that the receiver will receive another codeword are quite slim and so the errors can be detected.
Based on that idea we can define the minimum distance of a code to be the distance (i.e. number of differences) between the two closest codewords. If the minimum distance of a code is 10, we can detect up to 9 errors when a codeword is sent through a noisy channel.
A linear code is a code (set of bit strings) with a special additional property. The bitwise XOR of any two codewords gives another codeword. An example of a linear code of length 6 is the following set of words (strings):
000000 000111 111000 111111
A linear code with up to 2^{k} codewords can be created based on a list of k different words of the same length by taking all the possible words which can be found by XORing together words from the list. For example, the code listed above was generated by the two words 000111 and 111000. When the code has 2^{k} codewords, we call k the dimension of the code.
Traditionally we write the list in the form of a matrix (a table of 0's and 1's) where each word in the list is one row of the matrix. It turns out that choosing a matrix more or less at random yields codes with a very good minimum distance. This is the construction used in this protocol to create codes.
You now have enough background to understand the protocol and why it works. To go straight to the demonstration, click here. If you want a little more formal mathematical background, read on !
Advisory: The following sections contain strong mathematical language.
The operations AND and XOR as defined in Operations on Bits equip the set {0,1} with the structure of a algebraic field. This means that we can define a linear algebra on bits the same way we can for real numbers, complex numbers, or the elements of any other field. From now on, we will see strings as row vectors of bits.
Seen in this light, a linear code of length n is simply a linear subspace of the vector space {0,1}^{n}. That is, it is closed under addition (XOR) and under multiplication by scalars (there are only two scalars here: 0 and 1. Multiplication by 0 yields the zero vector and multiplication by 1 leaves the vector untouched). This has two interesting consequences:
This last statement is true because a linear subspace can always be seen as the span of some basis, say of size k. Each vector in the subset can uniquely be written as a linear combination of the k basis vectors. There are exactly 2^{k} such linear combinations and hence 2^{k} codewords. k is the dimension of the code.
In fact, if we take the vectors of a basis of our subspace and write them down as the rows in a k × n matrix, we get what is called a generator matrix for the linear code. When seen as a linear transformation from {0,1}^{k} into {0,1}^{n} a genrator matrix yields the linear code as its image. This means that codewords can be quickly genrated by taking row vectors of length k and multiplying them by a generator matrix.
Another useful matrix is a parity matrix of a linear code. This is a (n-k) × n matrix such that the code is its null space. A parity matrix is useful because it allows one to very quickly verify if a given string is a codeword by simply multiplying the matrix by the vector (seen as a column vector).
One construction of linear codes involves picking a k × n generator matrix at random and taking the code to be simply the span of the rows of that matrix. This construction gives codes which are very likely to have good minimum distance.
A simple improvement on this method involves taking a generator matrix whose first part is a k × k identity matrix and whose second half is taken at random. This still yields good minimum distance while providing the added bonus of being very easy to handle. The proof of these results, based on a counting argument, is given in an analysis of the protocol [DVI, PS].
The name "law of large numbers" is attributed to several theorems having to do with the behaviour of very large collections of related random variables. In our case, we deal with a "law of large numbers" generally attributed to Bernstein. It describes the behaviour of the sum of a large number of random variables for which we know both the expected value and the variance (or, equivalently, the standard deviation).
In the case of variables taking on the value 1 with probability p and 0 with propability 1 - p, the probability that the actual average value of n such variables will stray from its expected value (p ) by any fixed amount is exponentially small in n.
This theorem is of capital importance to the security of the protocol presented here because it allows us to predict with great accuracy the behaviour of the binary symmetric channel. For more details on how it is used in the proof of the security of the protocol, see the analysis provided in the bibliography.