Semeir Encryption

Sunday, January 6, 2008

Encryption Thoughts #2

Continuing earlier thoughts, I have a pre/post encryption algorythmn, that completely changes the internal structure as it encrypts/decrypts; Using very unique and simple algorythmns. If it starts getting complicated it slows the algorythmn down, the number of pre/post peices alter in every step how it changes, so only by having the original key/RNG string could you duplicate it.

Earlier i commented on using prime numbers, I which i highly suggest in use of this process, since it can greatly cut down on the number required subkeys to be considered 'secure', and thereby increase encryption. 2,3,5,7,11,13, each increasing by 1 till it passes of it's max, it then turns around and starts at 1, or some other predetermined number (which could be random). which totals at 30,030 combinations, but i have seen primes in the hundreds; increasing complexity, even for 8bits of combinations very complex.

However, for the main key itself, primes are not applicable, or useful, since you might select the Alphabet only, or a combination, or all of them at random. Since the length is unique, it's internal counters and rythmns follow different rules, using odd numbers for even keys, and even for odd keys; And failure to follow the rules can remove anywhere from 1/4 to 1/2 of it's potential. The stepping rules are different as well, adding/subtracting any number.

There are further ways to greatly increase complexity and strength, however i have made limits under certain rules. Following these rules, i have tried to even out at high strength without high memory/CPU cost.

1) The bit sizes for the codes never exceed 12 bits, and never go under 8 bits.

Reasons: 8bits has much complexity, but not as much as i'd like. 12 is the inbetween number between 8/16, allowing for middle areas (3bytes for example) to increase greatly in key combinations and complexity. If Memory requirements increase too large, it will be difficult to implement in chips and devices. Memory used for keys is as follows.

8bit 4K per key, 6K for speedups
9bit 8K per key, 12K
10bit 16K per key, 24K
11bit 32K per key, 48K
12bit 64K per key, 96K

Using a 32bit key, straight up, in memory would require some 48Gigs, i consider it too long to create, work with, and not worth it in exchange for it's encryption abilities. (2^32 * 4 * 3) 2^32 * 4 for actual key combinations, and 2 or more for the keys, and 1 for pre/post encryption. With this in mind, i perfer to use several smaller keys to create the complexity without memory consumption requirements.

Era

Monday, December 3, 2007

Strong Cipher thoughts

Over the years, since i originally started, the cipher that i began with started with something incredibly simple, yet with obvious patterns, i began to break down it's structure and add elements that can't be figured out without intimate knowlage of the current key.

Example. Any symbol replacement which is done, on a 1-for-1 character replacing, assuming it's text, can be broken by means of frequency analysis, saying that a,e, and t are far more common than x and z. By using a similar counting method, used in compression, you simply take the weights, and then switch them in the same order, or replace them with same weights as the frequency suggests. Then Obvious words, and some incorrect words will appear, but since the majority of the key is broken, it is easy to simply fix the words by replacing the letters that are wrong, and find the correct leftover frequency, leaving the whole original text intact.

First, for the cipher to be secure, it must involve sizes larger than 1 byte (8bits). When i say that, for example, the encryption can happen at 8bits, but must not be fixed, rather can change or has it's own little algorythmn that changes as it goes.

Two, Cipher Feedback. This i have noticed is a must. What cipher Feedback is, it will take the original unencrypted text, and use it according to a table, or something else, and then encrypt the next character or set. Then that character is encrypted normally, but it's original is noted before cipher feedback takes place, leaving 2 levels of complexity that won't easily be solved.

Three, Overlapping. I have tried and successfully implimented this in semeir, where it starts in one place, and encrypts 90% new, and 10% overlap, and does so over the whole part, even the original beginning. Example

1111 - text
xxx1 - x notes the working encryption area
x1xx
xx1x
1xxx

It might indeed go through an additional layer on the original encrypting area, just to be safe. Depending on the size of the variable, assuming 16bit chunks, and say a 48bit area (6 bytes) 2 new chunks are encrypted, and 1 chunk is overlapped, leaving 2^16 for an exact unencrypted carryover.

Finally. Levels. I've incorporated a multiple level algorythmn that builds itself according to the needs, and go anywhere from 2 levels, up to 511 levels. The more levels means the slower the algorythmn, but each level is a key, and the extreame overlapping would cause a far larger encryption key than can ever be made by a normal human.

Example: Using any sort of common block encryption, and say you have 16 levels, randomly generated keys by means of a special key. The key can be any size, but the final result is in the area of 128^16 = 2^24 bits of encryption strength, and only for 16 levels. (i believe this is correct, but i will re-check it later). The ending result is a far stronger encryption, that has too many levels to break with brute force, since the result could be anything.

One could also do an experimental algorythmn, which follows similar lines. It would rotate the data x number of elements and encrypt the block using a new keyset, creating a larger blocksize than it's original blocksize, and then reverse the whole set to get the original texts, although it is quite a bit slower.

Era

Tuesday, November 13, 2007

Compression #4

Further thoughts on compression.

I've been looking through more interesting information, including the Huffman coding, which is merely a format for creating a prefix free code, which is variable in size.

I've also tried some experiements that are based on that code, and look very promising. By taking the basic format, and modifying it in very simple ways i've tested random numbers (32bits, experimental using 10base), phrases, and some generic patterns.

My findings, say that a compression that would yeild 32bits in regular huffman code, would yeild between 29 and 25 bits with the modified code. The tree structure is quite different, so storing the tree in it's new format (for 8bit characters) could originally be stored 10bits for every 8bit, but mine would take 12bits for every 8 bits. (expected, not tested)

With that in mind, the great use for it would be when it's used as an adaptive code, quite simply every single character starts off with a 1 weight, and is divided up. As they are added on, the weights would change a bit at a time. (No table needs to be stored, but it doesn't have as good compression originally starting)

The downside with the experimental format, is it requires large INT's (32-1024bits), or that it requires actual math, instead of pasting x bits in a specific pattern. But in that way, it can also work. With text, it could be compessed, X elements at a time, or as many as possible before it overflows. Then output it as a straight 32bit word. In that way, it wouldn't need a huge library, and could reasonably be added into most standard applications with little to no problem.

Current testing and building of the new experimental huffman code, (ideas welcome for name) are incomplete, but show promising results.

Depending on how LZMA works, it might also be implimented with little to no problem using the same ideal background, gaining an additional 5-30% compression, but this is just a thrown idea till testing and proff come forward.

A current test, which didn't turn out very well due to setup and formatting, but was promising was a x3 format. This format MIGHT have worked in certain times, but not always. I might speak more on this later when i have more time.

But at this moment, no compression can be yielded from encryption, rather compression then encryption is the only real option. For a real but-kicker, add a 2byte offset information, and 2byte size (4bytes). Then up till that offset, fill with random numbers, and afterwards fill with random numbers till it fills the original message size, then encrypt the whole thing. That will make it many many many levels harder to break, since they are trying to break random data or comrpessed (dense) data, and can't confirm nor test with. But that's just another random thought :P

Era

Monday, November 5, 2007

Compression #3 - Continued.. Part B

Further notes. This will take a while, however it's somewhere to start.
I've compiled a list i will give later on specifics, on how many bits it takes to do x number, and then to encode that number. I've seen it jump from 1 bit to 12 bits, or 180 bits. However the larger it gets, the more data i need to try and represent that. Not only that, it doesn't include required numbers of how many times it's used. If failed, the compression rate is much depreciated, not to mention the header information is a failure as well.

I'm also working on the following thoughts. Exact way to encode to get smallest size in all ways is currently up for ideas, and requires a lot of math, or large INT's that require some type of library. I will submit it later, although there are others avaliable for free, this is one i know works that i built, and is still in the process of being built.

K, the value is 5, using it in a 4bit (16) area. Simply, you add all other values on top, and when it goes too large to represent the number you minus it. (for 8bit, it was 8,28,56,70 Ect)

I've gotten numbers in the 12k (15bit) range, for starting, but representing 5-6 data parts (20-24 bits). Depending on what they were, 1-6 combinations, determined how it would be stored.

If you think of a better way to store the number, please tell me and i'll tell you if it will work. If i get this to work, GPL goes the compression idea :)

The 1 wouldn't include any further information since it's the same number 6 times.
The 2, would simply do the (2x2x2x2x2x2 = 64), which is within reach

The 3, (3x3x3x3x3x3 = 729)

The 4, (10 (4+6 for 1 and 2 reuses) x 6 (2 repeat inserts) x 2 (worse case, order of repeats) x 4x3x2x1 = 2880)

The 5, The best way to represent this is to give the proper output of (5x5 for duplicate and offset where to put it. Then 5x4x3x2x1 = 3000)

The 6, Simply each fall off as they go. (6x5x4x3x2x1 = 720)

Ok, So we compare Vs the worse case. if it's 24bit, then it can't exceed 16Mil, if it does, hopefully not higher than 32Mil, since that's 25+ bits.

Another way to help with getting the proper size, is get your max value to represent onbits in their combinations, then divide the area (16mil) against it. Then try to keep all your other represented values less than this value, and if you can do it you are guarenteed compression :)

I've also tried where i dropped the lowest value (1) and left an extra value in, if that was hit, we manually determine the 4bit value, which increases the value from 12bit, to about 17-18 bit, however it's still less than 24bit. However likely hood of hitting. Not high. (We're talking a 3byte combination here) But it did lower the main hit numbers quite a bit, but not nearly enough.

I am increasing the values, but verifying them can be very hard, considering the math involved, and manually building all the possible combinations for worse case senario (which is the only one i compare with).

A Note. If the higher chance of hitting a <=1 bitsize rather than a >=1 bitsize, it is a success, and will be built and tested to see if it is worth it. But don't expect huge compression in small files, rather a decent compression over a large area, but even that can fail.

Era

Thursday, November 1, 2007

Compression #3 - Continued

Further with compression i've attempted, and although theoretically will work in certain situations, won't guarentee endless compression as of yet. But i'm working on it.

1) High Low. Ever played the gambling game where you flip a card up, and you guess if the next card is high or low? Well, using this compression type has some limited success.

You need a previous history of your last value, and compare it against your next value. The odds are in your favor it's in the more likely in the larger area. But when it fails you refer to a fallback value, and re-encode the correct value using the smaller value. so.. Using very large INT's.

Tends to make 8bits to 9.5, 4bits gets to 8.8 bits? but not worth it.

2) Attempted, encoding X of unique values in a row, and specify a failure (non unique), which we then encode from our list. There are of course success on all levels, EXCEPT the highest level. Using this knowlage we can then encode using a lesser form of 1bits (combinations of a certain number of 1's on, example is 2 on's in a 8bit area is 28).

There are problems.

Problem 1) Can only go to 5 on a 8bit, and 6 on 16bit. Higher values, equal less compression.
Problem 2) highest value doesn't guarentee a repeat or failure, so our last value can't get compression for a smaller bitvalue.

Tried a straight 2different 3bit unique bytetype, which works. 8 x 7 = 56 of 64. (or 28, *2*1) However, there's the problem and chance of 1 in 8 that they are identicle. The failure rate or references (3-5bits) for successful non-matches in a row nearly negate any worth of value. IF it's every 8 (on average) then the 3bits cover itself, if larger or in smaller than 8, the bitsizes are lost on big math and it isn't worth it.

I've also tested using a 2level compression for bit sizes, and getting counts on types. However it's not yet successful, but not finnished testing either.

Era

Wednesday, October 24, 2007

Compression #3

A few ideas have come up as of late, a few ideas i thought were pretty good and am currently working on, but haven't been perfected.

First, is using the previously mentioned multiplication to add together several values hidden inside a number, and by using the appropriate divisor and getting it's remainder, you can rebuild the original data. In this way, you could in the 1-8 combinations, store the data in 15.7 bits, instead of 18-24 bits.

The first idea, involves using the above mentioned, and getting appropriate data. I'm still fiddling with it so i will de-classify it as i go. I have a 1-4 matches (2bits) and then can rebuild accordingly, with the next few bits for which matches we are working with. When a match is made, we only need to reference the internal match, therefore saving a lot of space. Example.

1 unique, 2 long (4 * 8) : 0,0
2 unique, 3 long (4*28*2*2) : 1,2,2
3 unique, 4 long (4*56*3*2*3) 5,4,3,4
4 unique, 4 long (4*70*4*3*2) 1,2,3,4

All levels offer some compression, except the fourth one, which takes up about 12.7 bits, and it's source is 12 bits. (I will give exact bit details and how i get the 8,28,56,70 numbers later)

I'm also testing with two current ideas. The first is +1, which goes something like this. Let's say i'm working with 3bits (8), so the multiplier is always +1 larger on all of them. The reason for this, is the final nth is a breakout protocol, and then afterwards is a number telling me what the repeat is. So if the example goes 1,2,5,4,3,2, this covers 3*6=18bits. The max number, is then 9*8*7*6*5*4*6 362880 (58980h) which is about 19 something bits. Obviously we can't get compression, but if the match was one larger, it would be equal or smaller in size, and then can be saved.

In order to save more bits, a very powerful large infinate int library would do well (i have a homemade one, which is working)

I will say more when i have more to say.

Era

Sunday, October 21, 2007

Compression #2 - notes

Here are notes on maximum bits in a larger area. This for example, says that if you have 12 active (on bits) in a 32bit area, there will be X combinations.

32:1 - 32 (5)
32:2 - 496 (9)
32:3 - 4960 (13)
32:4 - 35960 (16)
32:5 - 201376 (18)
32:6 - 906192 (20)
32:7 - 3365856 (22)
32:8 - 10518300 (24)
32:9 - 28048800 (25)
32:10 - 64512240 (26)
32:11 - 129024480 (27)
32:12 - 225792840 (28)
32:13 - 347373600 (29)
32:14 - 471435600 (29)
32:15 - 565722720 (30)
32:16 - 601080390 (30)
32:17 - 565722720 (30)
32:18 - 471435600 (29)
32:19 - 347373600 (29)
32:20 - 225792840 (28)
32:21 - 129024480 (27)
32:22 - 64512240 (26)
32:23 - 28048800 (25)
32:24 - 10518300 (24)
32:25 - 3365856 (22)
32:26 - 906192 (20)
32:27 - 201376 (18)
32:28 - 35960 (16)
32:29 - 4960 (13)
32:30 - 496 (9)
32:31 - 32 (5)
32:32 - 1 (0)

These are the values i've come across using a set of functions. 32: is the bits currently referred to, :x is the bits on. - ?? is the combinations, and (x) is the number of bits to encode it.
In order to get any compression, assuming you need to tell the 5bits (32), you would need to get the on bits between the 1-10 or 22-32. However there might be other things or way to use this, in much larger numbers it might be more useful, but are usually 2-3 bits away from it's middle ground.

full ref 4-64 bits.

Era