Better Compression On Modern Systems?

Discussion in 'Software' started by HarryPotter, Sep 16, 2023.

  1. HarryPotter

    HarryPotter MajorGeek

    I'm working on some better compression techniques and am doing pretty well at it: so far, I'm doing much better than my competition but significantly off my goals, as a recent bug fix cost me a lot of ground. :( If I reach my goal, is there any call on modern computers for a much better compression technique? Unfortunately, my main reason for this is to compress floppies and Zip disks, but I would be better off using flash drives and my hard drives. Any input would be appreciated.
     
  2. Sir Humphrey Appleby

    Sir Humphrey Appleby Private E-2

    I would say there is a call for better compression techniques, but better depends on the use case. Compression ratio is just one factor, albeit a very important one. Compression/decompression time, the ability to stream data into an archive and randomly access its content are other factors in deciding what compression techniques to use. A permissive license (MIT/BSD) would also be beneficial.

    Zstandard and LZFSE, from Facebook and Apple respectively, were both released in 2015. Zstandard has been widely adopted.
     
  3. HarryPotter

    HarryPotter MajorGeek

    Thank you. :) My main compression techniques are going to be both slower and larger than my competition, as I'm doing a lot more than my competition, but I have other, lesser techniques, and if I am right about a theory with lz77 optimizations, they might be slightly faster at least sometimes. BTW, some of the lesser techniques are planned to be open source early, so I'm not too nervous about revealing some of its ideas. If I give you a few of my ideas here, would you give me tips on how to better them?
     
  4. Sir Humphrey Appleby

    Sir Humphrey Appleby Private E-2

    I would be interested in your ideas, but I 'm far from an expert on such things. The compression libraries I looked at for my software were primarily for high throughput compression/decompression of streamed text-based data (such as JSON). In the end, I decided other optimisations were more beneficial than compression for my particular needs.
     
  5. HarryPotter

    HarryPotter MajorGeek

    I'm also working on text compression for 8-bit systems for text adventures such that each string can't refer to previous strings. Right now, it sucks. :( The compression ratio is poor, and you have to compress the strings yourself. I'm working on a better version right now. As far as my techniques go, some of them follow:

    * Adaptive Huffman
    * LZ77 variants
    * Last16, where a repeat of a recent LZ77 block is shortened to the number of blocks ago where it occurred
    * Placement Offset Basic, where some lits that occurred recently are shortened to an offset to the last occurrence
    * A modification of Elias
    * A BPE-like technique, where a recent two-byte LZ77 occurrence is shortened further
    * On some techniques, I use a method to shorten some numbers that are not in the range of a power of 2

    I'm sure there are others, but they escape me at the moment.

    BTW, I'm using MTF on some of my least compression techniques, but it resulted in poor yeilds.

    BTW, should I reveal what I have regarding text compression? It is for the cc65 C compiler, but it might be legible enough to be adaptable.
     
  6. Sir Humphrey Appleby

    Sir Humphrey Appleby Private E-2

    You almost had me convinced to try out some of these techniques, but I've got too many other projects to finish. There are likely more appropriate forums with a larger developer community that would be interested in the source and discussing techniques in more detail.

    The only compression algorithm I recall implementing myself is that used in DNS, which uses a placement offset as records often refer multiple times to the same domain name or sub-domain. Many years ago I did some language analysis to determine the most common two and three-byte combinations in English. This was used for a spam filter implementation. At the time, a lot of spam just contained random characters to make the content unique and this detected spam quite well. That could probably have been used for the basis of a simple compression algorithm.
     
  7. HarryPotter

    HarryPotter MajorGeek

    I made some significant progress with file compression. I used to have some variants of the Stac technique, but they were doing far worse than even Deflate. Then, I forgot how, but I gained big point there, to the point that they were doing better than my main technique. Moreover, the new techniques were also simpler and smaller than what I had. I found some bug fixes, but they didn't cost me big time. Should I continue updating you about my compression techniques?
     
  8. HarryPotter

    HarryPotter MajorGeek

    Good news! I gained over 1% on several Stac variants! :D Right now, I'm doing way too good to be true and suspect a bug. Some of my compression techniques are to be open-source, so if I succeed, I can share some of my secrets. :)
     
  9. Sir Humphrey Appleby

    Sir Humphrey Appleby Private E-2

    When you have something ready, I'd like to see some sample data and comparisons in compression ratio and compress/decompress time.
     
  10. HarryPotter

    HarryPotter MajorGeek

    I was wrong: there was a bug. :( I'd better stop updating you for now.
     

MajorGeeks.Com Menu

Downloads All In One Tweaks \ Android \ Anti-Malware \ Anti-Virus \ Appearance \ Backup \ Browsers \ CD\DVD\Blu-Ray \ Covert Ops \ Drive Utilities \ Drivers \ Graphics \ Internet Tools \ Multimedia \ Networking \ Office Tools \ PC Games \ System Tools \ Mac/Apple/Ipad Downloads

Other News: Top Downloads \ News (Tech) \ Off Base (Other Websites News) \ Way Off Base (Offbeat Stories and Pics)

Social: Facebook \ YouTube \ Twitter \ Tumblr \ Pintrest \ RSS Feeds