Sunday, October 21, 2007

Compression #2 - Continued....

More on the topic.

I've tried a few different combinations. I've tried a pair encoding, which is a unique idea. In dense code it failed to be useful, but did successfully compress a text file to half it's size.

It's format is simple, a header with bitsizes, and number of different numbers we are working with. Then it encodes with a 0 to say we are using a number already mentioned. Encoded to best size. When a 1 is used, it encodes a new number, and says how many times it is used. It is then used right away once, and noted as it goes.

If you have already refferenced all different types, the 0/1 is no longer used, and is instead encoded with it's current values instead. When those values reduce to 0 left, they are removed, so the last x types will compress better than the rest.

-

Another experiment i will go into, trying to find the max number of values you can encode in a range while only being used once. It has a chance to succeed, assuming the header information can be encoded right, and a decent size (preferably, half or less than all the combinations. Example, 7bit has 128 combinations, and try to say if they are used with 64 bits or less.)

I'll later on give specifics on other useful information reguarding multiplication to maximize on saved space (bits)

And example of this, is 1-8 conbimations (24 bits, 3 each, all different). Using this, you can easily encode it without a lot of expense to 18 bits, but using multiplication (40k combinations) can be compressed to 15.6 bits. but if all the fields aren't used, and you need to specify which ones you do/don't use. 16+8=24, breaking even with the original encoding and not helping. (not making it worse though either.)

I'll get back with more thoughts and details later.

Era

No comments: