Exploiting the Diffie-Hellman bug in socat

A few days ago socat, a popular networking tool, issued a curious sounding security advisory:

"In the OpenSSL address implementation the hard coded 1024 bit DH p parameter was not prime. The effective cryptographic strength of a key exchange using these parameters was weaker than the one one could get by using a prime p. Moreover, since there is no indication of how these parameters were chosen, the existence of a trapdoor that makes possible for an eavesdropper to recover the shared secret from a key exchange that uses them cannot be ruled out."

More background information on this vulnerability can be found on Ars Technica and Hacker News, in this post I want to focus on building an exploit.

The patch shows that

p = 143319364394905942617148968085785991039146683740268996579566827015580969124702493833109074343879894586653465192222251909074832038151585448034731101690454685781999248641772509287801359980318348021809541131200479989220793925941518568143721972993251823166164933334796625008174851430377966394594186901123322297453

It has been discovered that

p = 271 * 13597 * 38894884397634366007356454548332370646972724268802781973440208895542936165564656473524541403310393405820598366261673173802130771236325314878371830363723788045821711985461441675679316058246609104355161134470046705337593170498462616195650378975298117141144096886684800236261920005248055422089305813639519

The last factor, let's denote it F3889, is a 1002-bit non-prime integer, whose factors remain unknown. By trial division we know that its smallest factors are larger than $2^{40}$. If we want to go further, we'll need to deploy a proper factorization algorithm. In that case I'll choose Lenstra elliptic curve factorization, whose running time depends on the size of smallest factor of F3889 rather than the size of the number itself. If the smallest factor is smaller than $2^{250}$, the Lenstra's algorithm would recover it before too long. The other factors can then be recovered by either Lenstra's or the general number field sieve algorithm.

On the other hand if F3889 is a product of two 500-bit primes, chances are we might never be able to factor it (without spending a million of dollars or so). This is very unlikely, if $p$ was indeed randomly generated. Thus, it's reasonable to assume that anyone determined and lucky enough can factor F3889 completely. It'll be a fun project, let me know if you want to work on it :).

But, you might ask, why do we care so much about the factors of $p$? It seems to have nothing to do with solving the discrete log problem (DLP) on $Z_p$, doesn't it? The answer is yes, knowing the factors of $p$, and if they are small enough, allows one to solve DLP on $Z_p$, thanks to the Chinese Remainder Theorem (CRT).

As I wrote before on this blog, CRT is one of the most powerful cryptanalysis tools ever. I've seen countless systems broken because of it. Whenever analyzing or designing a new system, ask yourself if you can break it using this simple trick, and you'll be surprised that most of the times the answer is yes. Pohlig and Hellman probably asked this question themselves, and eventually figured out that if the order of a group is a product of small primes (i.e., a smooth number), one can solve DLP in the subgroups, which is easier, and combine the results using CRT to obtain the discrete log in the original group.

Let's look at an example. Suppose that we want to solve for $x$, given $g$ and $h = g^x \pmod{n}$, where the order of group $Z_n$ is $\phi(n) = q_1 * q_2$ with $q_1$ and $q_2$ are prime. We can break this problem into three smaller sub-problems:
1/ Find $x_1$ such that $h^{q_1} = (g^{q_1})^{x_1} \pmod{n}$
2/ Find $x_2$ such that $h^{q_2} = (g^{q_2})^{x_2} \pmod{n}$
3/ We can prove that $x = x_1 \pmod{q_2}$ and $x = x_2 \pmod{q_1}$, and thus can figure out $x$ with CRT.

Note that 1/ and 2/ are themselves DLP, but they are in subgroups of order $q_1$ and $q_2$, respectively. Hence, the Pohlig-Hellman algorithm reduces DLP in group of order $q_1 * q_2$ to DLP in group of order $q_1$ or $q_2$. If $q_1$ or $q_2$ or small, we can brute-force for $x_1$ or $x_2$. If they are larger, we can deploy the Pollard's rho or the index calculus algorithm. It has been estimated that an academic team can break discrete log if $q_1$ or $q_2$ is a 768-bit prime and that a nation-state can break a 1024-bit prime.

I hope that it's clearer now why we need to factor $p$. We want to calculate $\phi(p)$ and its factors, which we need to deploy the Pohlig-Hellman attack. Is it surprising that the factors of the order of the group determines how hard DLP is on that group?

If the largest factor of $p$ is smaller than 2^800 it's reasonable to assume that anyone knowing this factor can solve DLP on the multiplicative group $Z_p$. That is, they can find $x$ given $g$ and $h$, where $g^x = h \pmod{p}$; this in turn allows them recovering the shared secret just by passively eavesdropping on the Diffie-Hellman key exchange. Note that if the larger factors of $p$ are not safe prime, i.e., not in the form of $2 * q + 1$ where $q$ is also a prime, the computation cost is even smaller.

In summary you can exploit this bug by following these steps:
1/ Using Lenstra elliptic curve factorization and the general number field sieve to factor $p$ completely, and
2/ Using Pohlig-Hellman and CRT to reduce DLP on the multiplicative group $Z_p$ to the multiplicative group $Z_{p'}$ where $p'$ is the largest factor of $\phi(p)$, and
3/ Using Pollar's rho or index calculus to solve DLP on $Z_{p'}$, and
4/ Sniffing socat traffic and recovering the DH shared secret, and
5/ Profit!

If this is a backdoor, it's trivial for its creators to exploit it, because they can skip 1/ and pre-compute most of 2/ and 3/. Even if this is not a backdoor, I suspect that it doesn't cost more than \$10K on AWS or Google Cloud Platform to perform step 1/, 2/ and 3/. If we're so lucky that F3889 is a product of 3 primes, 2 of which are 250-bit, step 1/ might cost less than \$2K, and pre-computation for step 2/ and step 3/ might cost even less.

Post a Comment

Previous Post Next Post

Labels Max-Results No.

Boxed(True/False)