Monday, November 5, 2007

Compression #3 - Continued.. Part B

Further notes. This will take a while, however it's somewhere to start.
I've compiled a list i will give later on specifics, on how many bits it takes to do x number, and then to encode that number. I've seen it jump from 1 bit to 12 bits, or 180 bits. However the larger it gets, the more data i need to try and represent that. Not only that, it doesn't include required numbers of how many times it's used. If failed, the compression rate is much depreciated, not to mention the header information is a failure as well.

I'm also working on the following thoughts. Exact way to encode to get smallest size in all ways is currently up for ideas, and requires a lot of math, or large INT's that require some type of library. I will submit it later, although there are others avaliable for free, this is one i know works that i built, and is still in the process of being built.

K, the value is 5, using it in a 4bit (16) area. Simply, you add all other values on top, and when it goes too large to represent the number you minus it. (for 8bit, it was 8,28,56,70 Ect)

I've gotten numbers in the 12k (15bit) range, for starting, but representing 5-6 data parts (20-24 bits). Depending on what they were, 1-6 combinations, determined how it would be stored.

If you think of a better way to store the number, please tell me and i'll tell you if it will work. If i get this to work, GPL goes the compression idea :)

The 1 wouldn't include any further information since it's the same number 6 times.
The 2, would simply do the (2x2x2x2x2x2 = 64), which is within reach

The 3, (3x3x3x3x3x3 = 729)

The 4, (10 (4+6 for 1 and 2 reuses) x 6 (2 repeat inserts) x 2 (worse case, order of repeats) x 4x3x2x1 = 2880)

The 5, The best way to represent this is to give the proper output of (5x5 for duplicate and offset where to put it. Then 5x4x3x2x1 = 3000)

The 6, Simply each fall off as they go. (6x5x4x3x2x1 = 720)

Ok, So we compare Vs the worse case. if it's 24bit, then it can't exceed 16Mil, if it does, hopefully not higher than 32Mil, since that's 25+ bits.

Another way to help with getting the proper size, is get your max value to represent onbits in their combinations, then divide the area (16mil) against it. Then try to keep all your other represented values less than this value, and if you can do it you are guarenteed compression :)

I've also tried where i dropped the lowest value (1) and left an extra value in, if that was hit, we manually determine the 4bit value, which increases the value from 12bit, to about 17-18 bit, however it's still less than 24bit. However likely hood of hitting. Not high. (We're talking a 3byte combination here) But it did lower the main hit numbers quite a bit, but not nearly enough.

I am increasing the values, but verifying them can be very hard, considering the math involved, and manually building all the possible combinations for worse case senario (which is the only one i compare with).

A Note. If the higher chance of hitting a <=1 bitsize rather than a >=1 bitsize, it is a success, and will be built and tested to see if it is worth it. But don't expect huge compression in small files, rather a decent compression over a large area, but even that can fail.

Era

No comments: