Talk:Basic Encryption Modules

From Second Life Wiki
Revision as of 14:01, 1 March 2009 by Xaviar Czervik (talk | contribs) (Formating)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

I am suspect of the math involved in this function and would at some point like to fix it. In the current state it is possible for the randomizer to cause the output to change between runs on the same input. This could be solved by reverse sorting the letters in the decrypter or by properly setting the characters on powers of two in the hash table. Strife Onizuka 18:31, 5 March 2007 (PST)

Thanks, it was not really made for anything practicle, it was just something that someone had asked to see if/how it could be done. It was mostly just intended to be a demo, and sufficiently hard enough to crack that it would be unlikely for anyone intercepting the message to understand it. Not to mention, it's pretty rare to need to send floats. I would think that for most communications, simply swapping letters in a string and changing channels would be sufficient. For those that need more, using multiple strings with different character runs and adding a 3rd field to indicate the run to use for the next decryption would probably work too. Perhaps one of these days, if i were to actually need it for something, i might work on it again. As far as the randomizer causing errors, i have not seen any, all the differences that I have seen come from how the float is rounded. If you llOwnerSay the float before encryption, and after decryption it is always the same. BUT it is sometimes rounded differently. there are other scripts available though forbreaking up floats to retain their correct value which could be used in conjunction to retain the correct value for the float. Thanks :) Bev 17:30 10 March 2007 (PST)

Limited Cryptanalysis

Introduction

I (Xaviar Czervik) have managed three different attacks against this encryption module, each with different requirements. What makes this encryption (as does any encryption) "secure" is the key -- the order of the characters in the variable letters. Since there are 49 characters, there are a total of 49! possible keys, or 210-odd bits. This is quite secure by any standard, yet key length is rarely what makes a cipher secure.

For the moment, I assume that the cracker can intercept any and all messages. This is not an unreasonable assumption: the channel is randomly distributed across the range [-9388608, -1000000], a difference of 8388608. Any one script can have 64 listeners, which means that 131,072 scripts would be required. If each prim of a 128 prim object had 128 scripts (not an unreasonable assumption), only eight objects would be required to listen in on every channel that this encryption can talk on. This method will listen in on all of the messages sent back and fourth; however that is hardly necessary. Assume ten messages are sent every minute, if we only need a total of 100 random ciphertexts, then we could only intercept 10% of all messages for 100 minutes. Or we could only intercept 7 of every thousand messages sent for an entire day. Using this example, only 7 prims with 128 scripts is required. The general idea here is that it is not unreasonable to assume a list of ciphertexts can be obtained.

For the remainder of this page, I discuss any number as a sum of characters. For example, from the default setup of characters, the number 15 is encoded as UVWX. This is because U=8, V=4, W=2, and X=1, and the sum is 15. Another way I will write this is U+V+W+X=15.

Finally, this is just something I worked up by my self over the last day and a half, so make sure to check anything I write a second time before trusting it. Also, forgive my lack of explanation in parts: I like doing the cryptanalysis, but not so much the writing it up. However, what should be taken away from this is that this encryption is, for all intensive purposes, broken.


Known-Plaintext Attack

If some plaintexts are known, then it is trivial to then determine the values associated with each character. No more than 50 plaintexts should be required, but it mostly depends on the luck of what is encrypted. A series of equations can be gained by writing the encrypted value as a sum of variables, and that sum equals the known plaintext. These equations can be solved simultaneously (using matrices or otherwise), and the key will be broken. All further messages can then be decrypted.

Take an example of the simplified set of possible values 4, 2, 1, 0.75, and 0.5. The associated characters are V, W, X, Y and Z; but it is not known which relate to the other. Assume the following are intercepted known plaintext-ciphertext pairs: 1.5=XZ, 3=ZW and 6=YW. The only character that is the same between 1.5 and 3 is Z, so therefore Z must equal one. That means X=.5, and W=2. If W=2, then Y=4. The only values not broken now is V, so V must equal .75 (the only remaining value). Here I break a total of 5!, or 120, possible keys with only 3 known plaintexts.

This attack is highly probable since in every message the channel that the next message will be sent on is appended. If two consecutive messages are obtained, the channel section of the first message can now be associated to a known value, the actual channel the second message was sent on. Assume 50 consecutive message-pairs are required, and again ten messages are encrypted per minute. Should the cracker wishes to obtain the key within one day, then the attacker need only listen to 6% of all communications. This means the attacker will receive consecutive pairs .36% of the time, and should, in a one day period, then receive 51 plaintext-ciphertext pairs.

As well as that, there may be other ways (depending on the protocol) to discover the plaintext after it has been used, and it may be possible to discover even more plaintexts this way.


Chosen-Plaintext Attack

This is the least likely type of attack, but most effective. Fewer than ten chosen plaintexts are required to break the key. This is an extension of the Known-Plaintext Attack. In this case, the perfect set of values are encrypted such that the equation can be solved simultaneously with the least effort. This is done by divide and conquer, essentially. Each equation is chosen such that half of the values are present. In the example above, the values I 'intercepted' were not actual random, I chose them to make the computation simple; which is how a chosen-plaintext attack works.

There are very few ways to mount a chosen-plaintext attack, and there is little reason to do so as a Known-Plaintext attack is fairly simple, and requires so few known plaintexts.


Known-Ciphertext Attack

This is the most probable attack, but hardest attack to mount. This attack requires many thousands of ciphertexts, but again this is not unreasonable. While this attack does not break the key totally, it reduces the work that must be done to 10^41 instead of 10^62. This Known-Ciphertext attack works very well with the Known-Plaintext attack described above, and can reduce the number of Known-Plaintexts required by a significant portion.

The first step is to determine which characters are numbers greater than one, and which are less than one. This is done by taking the list of ciphertexts, and separating out the chat channel from them. Then, the number of occurrences of every character is recorded. The character which represents 8388608 can be determined right away. Since the algorithm for determining the next chat channel is done by taking a random number from 0 to 8388608, and adding 100,000, only 1.2% of all numbers will be greater than 8388608, and hence contain that character. There are now 23 values which have a probability of near 50%, and 25 values which have a probability of 0. This is because a channel is an integer, and hence never will contain a character that is a float.

More information can be gained from the value that is encrypted. Worst case, this value will be totally random; and I will assume as such for the remainder. In an ideal encryption system, encrypting a random float will cause half of the characters to appear for any one encryption. However, this is not the case. The number of values that will be encrypted that fall between .04 and .05 is significantly smaller than the number of values that fall between .02 and .04, or from .01 to .02. While this may seem counter intuitive, it is the same as if there was a value of 60 in between 32 and 64. The 60 character would only show up if the value was 60, 61, 62, or 63. However there are many more values for which the 32 and 64 would show up. The eight characters with the lowest probability (after the character with 1.2%) will be .000004, .00004, .0004, .004, .04, .4, .5, and .75. This is because they are not half of the previous value, and are only a fraction of the previous value.

While only knowing the ciphertext is not enough to break the encryption, combining it with a Known-Plaintext attack is very efficient. When the equations are learned, it will be known which of the values are the .000004, .00004, .0004, .004, .04, .4, .5, and .75 set, and which are the other numbers less than 1, and which are greater than one, and which is the highest one. Since in the known-plaintext attack described above, only a very small percent of the time (6%) will a plaintext that has just been intercepted be the next sequential plaintext, there will be plenty of unknown-plaintexts, and hence plenty of known-ciphertexts to use.


Conclusion

While this encryption method is certainly better than not encrypting data at all, it has several flaws. Values that are encrypted do not become totally random, and this weakness can be used to break it. As well, any kind of known-plaintexts can be used to break the cipher in negligible time.