Why do we use XOR with One Time Pad - Visual example

One Time Pad(OTP) is a cipher that provides perfect secrecy given that the key is as long as the message and it is  The aim of this post is to first intuitively explain why OTP uses XOR operator and then explain it more formally.

The XOR Operator

We can simply define the XOR($\oplus$) operator by using a truth table:

$a$ $b$ $a \oplus b$
0 0 0
0 1 1
1 0 1
1 1 0

Note that this is equal to $a \oplus b = a + b \mod 2$.

Also $x \oplus x = 0$ and $ x \oplus 0  = x$, thus if we define a function $f_k(x) = x \oplus k$, then $f_k$ would be involutory meaning that $f_k(f_k(x)) = x$ since $(x \oplus k) \oplus k  = x \oplus (k \oplus k) = x \oplus 0 = x$.

Additionally we have $P[a \oplus b = 1] = P[a \oplus b = 0] = \frac{1}{2}$. This is going to be more important later.

How One Time Pad Works?

OTP is one of the simplest yet powerful ciphers. But it is not practical the use it given the constraints for perfect secrecy above. Let's see how it works:

$$ Enc_k(m) = m \oplus k = c \\ Dec_k(c) = c \oplus k = m $$

We need show that this is correct meaning that $Dec_k(Enc_k(m)) = m$ so if we encrypt a message $m$ with a key $k$ and decrypt using the same key, then we should obtain the same message.

Here, you can see that $Dec_k$ and $Enc_k$ is the same function with different inputs. Then the proof of correctness is trivial because $Enc_k$ is equivalent to the function $f_k$ that we have defined in the previous function, then we have $Dec_k(Enc_k(m)) = Enc_k(Enc_k(m))$ and since $Enc_k$ is involutory, we have $ Enc_k(Enc_k(m)) = m$.

Why Do We Use XOR?

Before explaining this formally, I am going to use some visual examples to compare and contrast what we obtain by using other logical operators and by using xor. I will be using Python PILLOW Library to apply logical operators to two images.

Here we will use a tux image as $m$ and a random noise as $k$ each 512x512 px:

1) XOR:

By writing the following code, we can xor two images:

from PIL import Image, ImageChops
m = Image.open("m.jpg")
k = Image.open("key.png")
m = m.convert("1")
k = k.convert("1")

c = ImageChops.logical_xor(m, k)
c.show()

And we have the resulting ciphertext $c$ as:

Resulting image with $c = m \oplus k$

2) AND:

Now let's write the same code for and:

from PIL import Image, ImageChops
m = Image.open("m.jpg")
k = Image.open("key.png")
m = m.convert("1")
k = k.convert("1")

c = ImageChops.logical_and(m, k)
c.show()

And we have the resulting ciphertext $c$ as:

Resulting image with $c = m \oplus k$

As you can see this does not look like we are really hiding something. Let's see the truth table for AND

$a$ $b$ $a \wedge b$
0 0 0
0 1 0
1 0 0
1 1 1

We see that the result of AND operation is 1 if and only if $a,b=1$. Thus we learn 1-bit of information from both the message and the key for each 1 in the output. This leak of information makes the operator not suitable.

Notice that $P[a \wedge b = 1] = \frac{1}{4} \neq P[a \wedge b = 0] = \frac{3}{4}$. Previously these two probabilities where equal and that's why we didn't have such information leakage while using XOR. This can be explained more formally using Shannon Entropy but above explanation is enough for the general idea of this post.

XOR with Uniform Distribution

Now let's prove a property of the XOR operation to see why we have obtained a more random ciphertext when we have used XOR.

Proposition: Let $m,k \in \{0,1\}^n$ and $k$ is sampled from a Uniform random variable $K$ and $m$ is sampled from the random variable $M$ . If $M and K$ are independent, then $c = m \oplus k$ follows a uniform distribution.

If this proposition is true, then the consequences are easy to understand, since we have randomly sampled our key from a uniform distribution, then our resulting ciphertext will be sampled from a uniform distribution as well. Let's prove that this is true.

Let $k$ be uniformly distributed over $\{0,1\}^n$ and our message m is a constant that lies in $\{0,1\}^n$.

$$
\begin{align*}
P[C = c] = P[m\oplus k = c] &= \sum_{\ell} P[m\oplus k = c \wedge m = \ell) \\
&= \sum_{\ell} P[k = m \oplus c]P[m = \ell] \\
&= \sum_{\ell} \frac{1}{2^n}P[m = \ell] \text{($k$ follows uniform)}\\
&= \frac{1}{2^n}\sum_{\ell} P[m = \ell]  \\
&= \frac{1}{2^n} \cdot 1 \text{(sum over all $\ell \in M$)}\\
&= \frac{1}{2^n} \\
\end{align*}
$$

Thus the ciphertext c also follows a uniform distribution since the probability of obtaining a specific n-bit bitstring c is $\frac{1}{2^n}$.

Other Logical Operators

We have only looked two of the operators which are XOR and AND. Then we have seen that why AND does not really work for us. But there are more logical operators that we haven't consider yet. With 2 operators, we can have 16 distinct truth tables $ \{0000, 0001, 0010, ..., 1111\}$. Notice that there are only 6 possible truth tables with the property $P[a \text{ op } b = 1] =  P[a \text{ op } b = 0] = \frac{1}{2}$. These are:

  1. $0011 = a$
  2. $0101 = b$
  3. $0110 = a \oplus b$ (XOR)
  4. $1001 = \overline{a \oplus b}$ (XNOR)
  5. $1010 = \bar{b}$
  6. $1100 = \bar{a}$

Let's kill these options one by one.

  • 1 and 2 can not be used since 1st one leaks the message itself and 2nd one leaks the key itself.
  • Similarly 5 and 6 can not be used since 5th leaks the inverse of key and 6th leaks the inverse of the message.
  • Number 3 is XOR and it is what we already use. The only alternative is XNOR, actually it can be used as an alternative to XOR and will have the same properties. But mainly it is algebraically inconsistent if we are using 0 as a neutral element since  $\overline{0 \oplus 1} = 0$. There are also hardware dependent reasons but it changes hardware to hardware.

Conclusion

In this post, we have examined why we are using the XOR operation in One Time Pad and tried to understand the reasons behind this decision both intuitively and formally. I hope that it was enjoyable for you to read and you have learned something new. See you in another post.

Show Comments