natechoe.dev The blog Contact info Other links The github repo

Computer Science Shower Thoughts: Breaking a plaintext OTP with LLMs

I tried to explain the Vigenère cipher for this article, I'm not good enough at computer science communication to do that. Just read the Wikipedia article, you'll be fine.

An interesting note about the Vigenère cipher is that if the key and data are the same length, it's equivalent to a one-time pad (again, just read the Wikipedia article).

Of course, this assumes that each letter of the key is randomly generated, which they usually aren't with Vigenère ciphers. This leads to an interesting puzzle: Can you break a one-time pad with the knowledge that both the key and data are English text?

The big problem here is that we're combining two documents into a single new document. If I have 1 megabyte of encrypted data, 2 megabytes of unencrypted data went into it: 1 megabyte for the original data, and 1 megabyte for the key. There are competitions for designing plaintext compression programs that get far better compression factors, so theoretically this isn't impossible, but our scheme is designed for encryption, not compression.

I think that this is a perfect use for LLMs. We generate some plausible key, see what the corresponding plaintext would be, then see if that plaintext would be plausible. The traditional tools of cryptography would not work here. I'd wager that even the best frequency analysis would get you less than 0.1% of the original data, most likely signifcantly less. You'd need some way to interpolate between the pieces that you have, and I think that AI is the best tool for that.