Monday, December 3, 2007

Strong Cipher thoughts

Over the years, since i originally started, the cipher that i began with started with something incredibly simple, yet with obvious patterns, i began to break down it's structure and add elements that can't be figured out without intimate knowlage of the current key.

Example. Any symbol replacement which is done, on a 1-for-1 character replacing, assuming it's text, can be broken by means of frequency analysis, saying that a,e, and t are far more common than x and z. By using a similar counting method, used in compression, you simply take the weights, and then switch them in the same order, or replace them with same weights as the frequency suggests. Then Obvious words, and some incorrect words will appear, but since the majority of the key is broken, it is easy to simply fix the words by replacing the letters that are wrong, and find the correct leftover frequency, leaving the whole original text intact.

First, for the cipher to be secure, it must involve sizes larger than 1 byte (8bits). When i say that, for example, the encryption can happen at 8bits, but must not be fixed, rather can change or has it's own little algorythmn that changes as it goes.

Two, Cipher Feedback. This i have noticed is a must. What cipher Feedback is, it will take the original unencrypted text, and use it according to a table, or something else, and then encrypt the next character or set. Then that character is encrypted normally, but it's original is noted before cipher feedback takes place, leaving 2 levels of complexity that won't easily be solved.

Three, Overlapping. I have tried and successfully implimented this in semeir, where it starts in one place, and encrypts 90% new, and 10% overlap, and does so over the whole part, even the original beginning. Example

1111 - text
xxx1 - x notes the working encryption area
x1xx
xx1x
1xxx

It might indeed go through an additional layer on the original encrypting area, just to be safe. Depending on the size of the variable, assuming 16bit chunks, and say a 48bit area (6 bytes) 2 new chunks are encrypted, and 1 chunk is overlapped, leaving 2^16 for an exact unencrypted carryover.

Finally. Levels. I've incorporated a multiple level algorythmn that builds itself according to the needs, and go anywhere from 2 levels, up to 511 levels. The more levels means the slower the algorythmn, but each level is a key, and the extreame overlapping would cause a far larger encryption key than can ever be made by a normal human.

Example: Using any sort of common block encryption, and say you have 16 levels, randomly generated keys by means of a special key. The key can be any size, but the final result is in the area of 128^16 = 2^24 bits of encryption strength, and only for 16 levels. (i believe this is correct, but i will re-check it later). The ending result is a far stronger encryption, that has too many levels to break with brute force, since the result could be anything.

One could also do an experimental algorythmn, which follows similar lines. It would rotate the data x number of elements and encrypt the block using a new keyset, creating a larger blocksize than it's original blocksize, and then reverse the whole set to get the original texts, although it is quite a bit slower.

Era

Tuesday, November 13, 2007

Compression #4

Further thoughts on compression.

I've been looking through more interesting information, including the Huffman coding, which is merely a format for creating a prefix free code, which is variable in size.

I've also tried some experiements that are based on that code, and look very promising. By taking the basic format, and modifying it in very simple ways i've tested random numbers (32bits, experimental using 10base), phrases, and some generic patterns.

My findings, say that a compression that would yeild 32bits in regular huffman code, would yeild between 29 and 25 bits with the modified code. The tree structure is quite different, so storing the tree in it's new format (for 8bit characters) could originally be stored 10bits for every 8bit, but mine would take 12bits for every 8 bits. (expected, not tested)

With that in mind, the great use for it would be when it's used as an adaptive code, quite simply every single character starts off with a 1 weight, and is divided up. As they are added on, the weights would change a bit at a time. (No table needs to be stored, but it doesn't have as good compression originally starting)

The downside with the experimental format, is it requires large INT's (32-1024bits), or that it requires actual math, instead of pasting x bits in a specific pattern. But in that way, it can also work. With text, it could be compessed, X elements at a time, or as many as possible before it overflows. Then output it as a straight 32bit word. In that way, it wouldn't need a huge library, and could reasonably be added into most standard applications with little to no problem.

Current testing and building of the new experimental huffman code, (ideas welcome for name) are incomplete, but show promising results.

Depending on how LZMA works, it might also be implimented with little to no problem using the same ideal background, gaining an additional 5-30% compression, but this is just a thrown idea till testing and proff come forward.

A current test, which didn't turn out very well due to setup and formatting, but was promising was a x3 format. This format MIGHT have worked in certain times, but not always. I might speak more on this later when i have more time.

But at this moment, no compression can be yielded from encryption, rather compression then encryption is the only real option. For a real but-kicker, add a 2byte offset information, and 2byte size (4bytes). Then up till that offset, fill with random numbers, and afterwards fill with random numbers till it fills the original message size, then encrypt the whole thing. That will make it many many many levels harder to break, since they are trying to break random data or comrpessed (dense) data, and can't confirm nor test with. But that's just another random thought :P

Era

Monday, November 5, 2007

Compression #3 - Continued.. Part B

Further notes. This will take a while, however it's somewhere to start.
I've compiled a list i will give later on specifics, on how many bits it takes to do x number, and then to encode that number. I've seen it jump from 1 bit to 12 bits, or 180 bits. However the larger it gets, the more data i need to try and represent that. Not only that, it doesn't include required numbers of how many times it's used. If failed, the compression rate is much depreciated, not to mention the header information is a failure as well.

I'm also working on the following thoughts. Exact way to encode to get smallest size in all ways is currently up for ideas, and requires a lot of math, or large INT's that require some type of library. I will submit it later, although there are others avaliable for free, this is one i know works that i built, and is still in the process of being built.

K, the value is 5, using it in a 4bit (16) area. Simply, you add all other values on top, and when it goes too large to represent the number you minus it. (for 8bit, it was 8,28,56,70 Ect)

I've gotten numbers in the 12k (15bit) range, for starting, but representing 5-6 data parts (20-24 bits). Depending on what they were, 1-6 combinations, determined how it would be stored.

If you think of a better way to store the number, please tell me and i'll tell you if it will work. If i get this to work, GPL goes the compression idea :)

The 1 wouldn't include any further information since it's the same number 6 times.
The 2, would simply do the (2x2x2x2x2x2 = 64), which is within reach

The 3, (3x3x3x3x3x3 = 729)

The 4, (10 (4+6 for 1 and 2 reuses) x 6 (2 repeat inserts) x 2 (worse case, order of repeats) x 4x3x2x1 = 2880)

The 5, The best way to represent this is to give the proper output of (5x5 for duplicate and offset where to put it. Then 5x4x3x2x1 = 3000)

The 6, Simply each fall off as they go. (6x5x4x3x2x1 = 720)

Ok, So we compare Vs the worse case. if it's 24bit, then it can't exceed 16Mil, if it does, hopefully not higher than 32Mil, since that's 25+ bits.

Another way to help with getting the proper size, is get your max value to represent onbits in their combinations, then divide the area (16mil) against it. Then try to keep all your other represented values less than this value, and if you can do it you are guarenteed compression :)

I've also tried where i dropped the lowest value (1) and left an extra value in, if that was hit, we manually determine the 4bit value, which increases the value from 12bit, to about 17-18 bit, however it's still less than 24bit. However likely hood of hitting. Not high. (We're talking a 3byte combination here) But it did lower the main hit numbers quite a bit, but not nearly enough.

I am increasing the values, but verifying them can be very hard, considering the math involved, and manually building all the possible combinations for worse case senario (which is the only one i compare with).

A Note. If the higher chance of hitting a <=1 bitsize rather than a >=1 bitsize, it is a success, and will be built and tested to see if it is worth it. But don't expect huge compression in small files, rather a decent compression over a large area, but even that can fail.

Era

Thursday, November 1, 2007

Compression #3 - Continued

Further with compression i've attempted, and although theoretically will work in certain situations, won't guarentee endless compression as of yet. But i'm working on it.

1) High Low. Ever played the gambling game where you flip a card up, and you guess if the next card is high or low? Well, using this compression type has some limited success.

You need a previous history of your last value, and compare it against your next value. The odds are in your favor it's in the more likely in the larger area. But when it fails you refer to a fallback value, and re-encode the correct value using the smaller value. so.. Using very large INT's.

Tends to make 8bits to 9.5, 4bits gets to 8.8 bits? but not worth it.

2) Attempted, encoding X of unique values in a row, and specify a failure (non unique), which we then encode from our list. There are of course success on all levels, EXCEPT the highest level. Using this knowlage we can then encode using a lesser form of 1bits (combinations of a certain number of 1's on, example is 2 on's in a 8bit area is 28).

There are problems.

Problem 1) Can only go to 5 on a 8bit, and 6 on 16bit. Higher values, equal less compression.
Problem 2) highest value doesn't guarentee a repeat or failure, so our last value can't get compression for a smaller bitvalue.

Tried a straight 2different 3bit unique bytetype, which works. 8 x 7 = 56 of 64. (or 28, *2*1) However, there's the problem and chance of 1 in 8 that they are identicle. The failure rate or references (3-5bits) for successful non-matches in a row nearly negate any worth of value. IF it's every 8 (on average) then the 3bits cover itself, if larger or in smaller than 8, the bitsizes are lost on big math and it isn't worth it.

I've also tested using a 2level compression for bit sizes, and getting counts on types. However it's not yet successful, but not finnished testing either.

Era

Wednesday, October 24, 2007

Compression #3

A few ideas have come up as of late, a few ideas i thought were pretty good and am currently working on, but haven't been perfected.

First, is using the previously mentioned multiplication to add together several values hidden inside a number, and by using the appropriate divisor and getting it's remainder, you can rebuild the original data. In this way, you could in the 1-8 combinations, store the data in 15.7 bits, instead of 18-24 bits.

The first idea, involves using the above mentioned, and getting appropriate data. I'm still fiddling with it so i will de-classify it as i go. I have a 1-4 matches (2bits) and then can rebuild accordingly, with the next few bits for which matches we are working with. When a match is made, we only need to reference the internal match, therefore saving a lot of space. Example.

1 unique, 2 long (4 * 8) : 0,0
2 unique, 3 long (4*28*2*2) : 1,2,2
3 unique, 4 long (4*56*3*2*3) 5,4,3,4
4 unique, 4 long (4*70*4*3*2) 1,2,3,4

All levels offer some compression, except the fourth one, which takes up about 12.7 bits, and it's source is 12 bits. (I will give exact bit details and how i get the 8,28,56,70 numbers later)

I'm also testing with two current ideas. The first is +1, which goes something like this. Let's say i'm working with 3bits (8), so the multiplier is always +1 larger on all of them. The reason for this, is the final nth is a breakout protocol, and then afterwards is a number telling me what the repeat is. So if the example goes 1,2,5,4,3,2, this covers 3*6=18bits. The max number, is then 9*8*7*6*5*4*6 362880 (58980h) which is about 19 something bits. Obviously we can't get compression, but if the match was one larger, it would be equal or smaller in size, and then can be saved.

In order to save more bits, a very powerful large infinate int library would do well (i have a homemade one, which is working)

I will say more when i have more to say.

Era

Sunday, October 21, 2007

Compression #2 - notes

Here are notes on maximum bits in a larger area. This for example, says that if you have 12 active (on bits) in a 32bit area, there will be X combinations.

32:1 - 32 (5)
32:2 - 496 (9)
32:3 - 4960 (13)
32:4 - 35960 (16)
32:5 - 201376 (18)
32:6 - 906192 (20)
32:7 - 3365856 (22)
32:8 - 10518300 (24)
32:9 - 28048800 (25)
32:10 - 64512240 (26)
32:11 - 129024480 (27)
32:12 - 225792840 (28)
32:13 - 347373600 (29)
32:14 - 471435600 (29)
32:15 - 565722720 (30)
32:16 - 601080390 (30)
32:17 - 565722720 (30)
32:18 - 471435600 (29)
32:19 - 347373600 (29)
32:20 - 225792840 (28)
32:21 - 129024480 (27)
32:22 - 64512240 (26)
32:23 - 28048800 (25)
32:24 - 10518300 (24)
32:25 - 3365856 (22)
32:26 - 906192 (20)
32:27 - 201376 (18)
32:28 - 35960 (16)
32:29 - 4960 (13)
32:30 - 496 (9)
32:31 - 32 (5)
32:32 - 1 (0)

These are the values i've come across using a set of functions. 32: is the bits currently referred to, :x is the bits on. - ?? is the combinations, and (x) is the number of bits to encode it.
In order to get any compression, assuming you need to tell the 5bits (32), you would need to get the on bits between the 1-10 or 22-32. However there might be other things or way to use this, in much larger numbers it might be more useful, but are usually 2-3 bits away from it's middle ground.

full ref 4-64 bits.

Era

Compression #2 - Continued....

More on the topic.

I've tried a few different combinations. I've tried a pair encoding, which is a unique idea. In dense code it failed to be useful, but did successfully compress a text file to half it's size.

It's format is simple, a header with bitsizes, and number of different numbers we are working with. Then it encodes with a 0 to say we are using a number already mentioned. Encoded to best size. When a 1 is used, it encodes a new number, and says how many times it is used. It is then used right away once, and noted as it goes.

If you have already refferenced all different types, the 0/1 is no longer used, and is instead encoded with it's current values instead. When those values reduce to 0 left, they are removed, so the last x types will compress better than the rest.

-

Another experiment i will go into, trying to find the max number of values you can encode in a range while only being used once. It has a chance to succeed, assuming the header information can be encoded right, and a decent size (preferably, half or less than all the combinations. Example, 7bit has 128 combinations, and try to say if they are used with 64 bits or less.)

I'll later on give specifics on other useful information reguarding multiplication to maximize on saved space (bits)

And example of this, is 1-8 conbimations (24 bits, 3 each, all different). Using this, you can easily encode it without a lot of expense to 18 bits, but using multiplication (40k combinations) can be compressed to 15.6 bits. but if all the fields aren't used, and you need to specify which ones you do/don't use. 16+8=24, breaking even with the original encoding and not helping. (not making it worse though either.)

I'll get back with more thoughts and details later.

Era

Tuesday, October 16, 2007

Compression #2 - Continued

As a continued part of my previous post, i shall describe the exact details of the universal code that i'm experimenting with to use in this compression.

The Gamma coding, would easily follow this pattern. Bold is ending, ittalic is the 'continued' and normal is the other code.

0 - 0
1 - 10 - 1 already assumed at highest point
2 - 1100
3 - 1110
4 - 110100
5 - 110110

The second encoding uses higher bits per bit reference, higher compression but looses the first bit since it goes in reverse instead. Example, encodeing a 1101, you would do it 1011 and it would reverse it.

0 - 0
1 - 1100
2 - 1010
3 - 1110
4 - 1001100
5 - 1101100

Other codes are considered but would be tested and experimented later with. However, in the tests it does suggest that if i can compress the heading information (ints used) correctly, it could compress the actual data, from a 1024 bytes to 300 bytes, meaning i need to save the header into 700 bytes or less. Real fun.

Era

Monday, October 15, 2007

Compression #2

I'm making a second attempt, using the following tests and hopefully get something to work.

1) will use a variant on the Elias gamma coding, http://en.wikipedia.org/wiki/Elias_gamma_coding , but instead of using it the way it is referenced in wikipedia, it uses 2 changes. I specify how many bits will be used per every first true bit. (the 1, can be extra condensed when the first true will assume 1, since you wouldn't reference a 0, the lowest numbers save the most space)

2) For a single bit, as a separator i can reference all combinations, (in a 9-13 bitsize) and get the ones that are only used once. The compressor will see and decrease those numbers, and in the end reduce the size later required to compress it. The second run through would actually either have to a) give all numbers of the values, or b) all other values never go lower, which could be bad. (when it exceeds 50%, it references ones that are not present instead.)

In cases where no matter how it compresses it, it exceeds the number of bits for the combinations (10 bit is 1024), then it can use an alternate to reference all on/off that way, but no numbering index would be avaliable.

3) Uses a variant highest compression for combination of bits (not weighted or self modifying like gz, which is really good). This would yeild, say there's a max of 6 numbers, it would determine the lowest number of bytes to represent all of the numbers equally. Example:

0-5 (6 combinations)

type 1: 000, 001, 010, 011, 10, 11 (4*3 =12 + 4=16 bits)
type 2: 00, 01, 100, 101, 110 , 1110, 1111 (2*2=4 + 3*3=(4+9)13 + (4*2) =(8+13) 23 bits)

Some combinations alter, but in lower combinations type 1 works, later type 2 works for larger ones, which give extra savings of a larger number of bits. When decided it always uses it.

Some tests have shown promise, but at the same time the compression was done on a file that i was able to save an additional 300 bytes, from gziping it. Truely random, uncompressable data needs to be re-tested with it.

Monday, October 8, 2007

Compression?

I've attempted an experimental compression idea, although it doesn't do very well.

Here's an overview of my thoughts. I would take a stream of dense encrypted data, using my OTP pad, then find all matches and large matches and use a structure to handle it. it goes through several rotations, before it just comes up kinda dead. The way i was considering using it, although it seems to be failing, is it would use the following format.

1bit - 1 for encoded, 0 for skip and comeback later at different window.
length is not known ahead of time, so it's kinda... eh?
12bits - lookup table back
2bits - 2-5 bytes long match
= 15bits.

Each 0 on the first bit, increases the bitsize by one, so if you do 6 rotations, eventually it's 21bits, shy of 3 bytes.

in small chunks, (4k and less) it's 1/2 the numbers are 1 byte, 100-200 are 2 byte, and the rest are 3-4 byte (1-3 overall).

In order to test it's usefulness i've attempted to do a bit count, using it's best compression outputting, to express how big it would be when done. when it days 8000bits original, it comes back 10,000-12,000 bits. Ocationally i get it closer or equal to the original value, but at that point there's no compression or it is simply far too much work (16Meg of data for a 24-26bit data structure hoping for 3-6 byte matches). It doesn't seem to work

I will consider different forms of compression to add in the cipher, but dense data just equals, probability of 0 and 1 is 50% each.. Equally. No matter how many levels it goes, it is about even and when tested in very large and different sized numbers, it's very equal and nearly exactly what the value should be.

Era

Sunday, September 30, 2007

Semeir Encryption

Greetings,

I have currently been working on a cipher for many years, on and off. It currently holds many options and a very large strength in it's encryption as well as random number generation properties that are probably ahead of it's time.

I'm not quite ready to boast it's 'the last cipher you ever need', but it does offer many options not avaliable in current ciphers. You can use it for your email, pictures and videos, Multi layered Chat rooms, buisness transations, secure data, Generate random numbers for just about anything.

I've tested the RNG with Diehard, and was unable to test with Diehard2 (due to minor problems, i will figure those out soon). In my personal tests, it has the largest set of Psudo-RNG numbers you can input, as well as for keysize. RNG tests suggest (but not fully tested) show it going into 2^65540 bytes of random data from a single stream, and each stream can be expanded to exceed that value very easily, without having to re-seed it.

Other options, it includes common and uncommon blocksizes, OTP (One time pad), Streaming, and a MD5 like Hashing function using encryption generate unique 64-?? bit Hexidecimal hashes for comparing.

I will leave it at that for now, and i await any and all interest in the current project.

Era Scarecrow