Monday, October 15, 2007

Compression #2

I'm making a second attempt, using the following tests and hopefully get something to work.

1) will use a variant on the Elias gamma coding, http://en.wikipedia.org/wiki/Elias_gamma_coding , but instead of using it the way it is referenced in wikipedia, it uses 2 changes. I specify how many bits will be used per every first true bit. (the 1, can be extra condensed when the first true will assume 1, since you wouldn't reference a 0, the lowest numbers save the most space)

2) For a single bit, as a separator i can reference all combinations, (in a 9-13 bitsize) and get the ones that are only used once. The compressor will see and decrease those numbers, and in the end reduce the size later required to compress it. The second run through would actually either have to a) give all numbers of the values, or b) all other values never go lower, which could be bad. (when it exceeds 50%, it references ones that are not present instead.)

In cases where no matter how it compresses it, it exceeds the number of bits for the combinations (10 bit is 1024), then it can use an alternate to reference all on/off that way, but no numbering index would be avaliable.

3) Uses a variant highest compression for combination of bits (not weighted or self modifying like gz, which is really good). This would yeild, say there's a max of 6 numbers, it would determine the lowest number of bytes to represent all of the numbers equally. Example:

0-5 (6 combinations)

type 1: 000, 001, 010, 011, 10, 11 (4*3 =12 + 4=16 bits)
type 2: 00, 01, 100, 101, 110 , 1110, 1111 (2*2=4 + 3*3=(4+9)13 + (4*2) =(8+13) 23 bits)

Some combinations alter, but in lower combinations type 1 works, later type 2 works for larger ones, which give extra savings of a larger number of bits. When decided it always uses it.

Some tests have shown promise, but at the same time the compression was done on a file that i was able to save an additional 300 bytes, from gziping it. Truely random, uncompressable data needs to be re-tested with it.

No comments: