I've always wondered if the assumption, they're never using the same one twice, would suffice to practically always use the same one then (assuming data-length never changes).
Not sure what you're asking? If you use the same key over and over again, then you're using the same key much more than just twice.
The idea of a one-time-pad is that you're basically just randomly flipping the bits of your input, which means the output of an OTP cipher is indistinguishable from random data.
If you're using the same key multiple times though, then the output of the cipher (considered over time) won't be random, and you'll be able to detect patterns from the original input in the cipher output (e.g., the shape of an image, frequency of certain letters).
> The output is always random, regardless how often the key has been used.
The output of any _one_ use of the pad is, yes, but the point is to consider all of the data that an attacker may have. If you re-use the key multiple times, then the entirety of the cipher texts an attacker has is not random. (See also: https://xkcd.com/221/)
> But how do you know they're using the same one? Or how are you sure, they're not?
Note, that illustration (Tux_ECB) is demonstrating a different problem—ECB cyphers may expose patterns across blocks—rather than reused one-time pads. One-time pads will always produce random images as their output.
Eh, it's sort of the same? You can imagine each pixel as its own message: the point is that repeatedly transforming things in a consistent but "random" way isn't actually random. The ciphertext of each pixel is "random", but the pattern when looking at all the pixels is clear.
Someone intercepts the data, and has: EDGF and HGJI
And now?
Or maybe like this: Since OTP and data are interchangeable, due to matching lengths, isn't using the same OTP with different data, essentially the same like using the same data with a different key?
The normal operation for an OTP is xor. Now if you reuse the random key K on messages A and B, you get encrypted messages A' = K xor A and B' = K xor B.
Now, an attacker who learns A' and B' just needs to do A' xor B' = A xor K xor B xor K = A xor B. Since your input is not random, but structured data like natural language, this is now relatively trivial to break using crypt analysis since you essentially end up with something like "MEET AT DAWN" xor "I LIKE TRAINS".
Story time: The USSR once reused an OTP key (after years or even decades, can't recall), but a US' three letter agency had the old ciphertext (A') and reused that to break the new ciphertext (B'). They probably had some scheme with a broadcaster saying "use codebook 1234, the secret is GARBLED DATA". At least that's the story a cryptography lecturer told us (and the fragments I remember).
I'm reaching the limits of my stats knowledge, but you may be able to figure out, even from just those two ciphertexts, something about the input plaintexts. It's obviously harder with shorter inputs.
I guess one thing to note is that, if what you were transmitting was just random noise to begin with, OTP re-use may not matter/be evident. But essentially all data that people care about transmitting isn't random noise, it has some structure, and that structure comes through with OTP re-use (more and more the more you re-use and the more data you re-use with).
EDGF encrypted with HGJI is now the same as ABCD encrypted with DEFG. In this example, that means the distance between characters of the encrypted messages is the same as the distance between the original messages.
From my limited knowledge of the matter, that alone doesn't give you the cypher - you'll need to know additional information about the messages to get the cypher (statistics of words, conditional probabilities of letter sequences etc.). But without the one reuse of your cypher, you couldn't apply these techniques.
Look at an ASCII table. Each byte could be any value, but if you’re sending text data the 8th bit is very likely to be 0. That means if your sending say 10 different messages the same pattern is going to show up 10 different times making it clear something is going on.
That ASCII example is rather extreme, but all messages have patterns as long as you’re given enough of them you can break a reused OTP.
Obviously it's more complicated with text (where you have less information filled in); for that you have to do crib dragging which is a bit more involved but not fundamentally difficult.
I'm confused, this and a blog post linked somewhere else in this topic, describes the OTP has XORing data with key.
I've always assumed it was just adding the key-value to the data-value, as is described in the Wikipedia article[0].
And those two can't be the same, e.g. with both data and key value 'a' (0x5c), I'd get 0x0 with XOR and 0xb8 with addition.
EDIT: Ah, damnit, second paragraph, it says: "On July 22, 1919, U.S. Patent 1,310,719 was issued to Gilbert Vernam for the XOR operation used for the encryption of a one-time pad."
Everything I've posted in this thread about OTP was under the assumption of an additive cipher. My bad.
The Wikipedia article is using modular addition as it's more accessible to the layperson than a binary XOR, but they're functionally equivalent for the purposes of cryptanalysis. XOR is just modulo arithmetic on single bits instead of larger numbers.
Edit for your edit: All of this discussion applies the same for additive OTP as it does for XOR OTP; once you have depths you can start applying the OTP (however it's done) “in reverse” as it were to begin extracting patterns in the data.
Both are one-time pads, there isn't one exact algorithm that's "the one-time pad". And XOR is really the same as addition, just done per-bit instead of per-byte/per-character. But as far as I know, XOR is the most commonly referenced example.
You can easily discover reused keys if you can guess any part of either plaintext. From that guessed fragment, you can recover both plaintexts and the entire key (pad) using the "Zig Zag" method.
(P=plaintext, C=cyphertext, K=reused_pad, ⊕=XOR)
If we capture two ciphertexts that reused the same key
C1 = P1 ⊕ K
C2 = P2 ⊕ K
Then combining the ciphertexts cancels the key
D = C1 ⊕ C2 = P1 ⊕ P2
The resulting D is also the plaintexts XORed together. If you can guess any part of either plaintext - a standard header or commonly used words (like "weather" or "Heil Hitler") - then XORing that guess with D reveals part of the other plaintext at the same position. Once a plausible match is found, the rest of the decryption is relatively easy: zig-zaging guesses of neighboring words extending out from the original guess.
Professor Brailsford's explanation[1] of the method on Computerphile is nice introduction to this type of cryptanalysis.
It's not like it's hard to get K and K to cancel each other out using addition. Why are you generalizing from "the inverse of XORing K is XORing K again" to "the inverse of adding K is adding K again"? It isn't, but everyone knows what the inverse of adding K is.
C1 = P1 + K
C2 = P2 + K
C1 - C2 = P1 - P2
And if you can guess part of either plaintext, you'll see the same part of the other plaintext, and know what the key was, exactly the same as for XORing. As somebody else already pointed out, that's because XORing is a variety of addition anyway.