Pop songs compressed by the Lempel-Ziv algorithm
Lempel-Ziv is a lossless compression algorithm. PNG, GIF, and gzip are all powered by variants of Lempel-Ziv. The algorithm works by exploiting repeated sequences of characters (or bytes). Pop songs happen to have lots of these, so LZ compresses them very efficiently.
For more context, and a brief explanation of Lempel-Ziv compression, check out my essay on Measuring repetitiveness of song lyrics.
Size reduction is calculated by counting matches (i.e. the blue circles) as taking up the same space as 3 characters. For example, replacing a string of length 4 saves 1 character, replacing a length 10 string saves 7 characters etc. A typical implementation of LZ encodes a match as an 8-bit length and a 15-bit distance, so 3 bytes is pretty realistic. (A compressor optimized for pop songs would likely use fewer bits for distance, given that songs are usually shorter than 215 characters in total.)
The compressions shown here were generated by zopfli, which is similar to gzip but produces slightly more efficient compressions (at the cost of speed).
In real applications (like gzip) the output of the LZ algorithm is usually further compressed using Huffman coding - this combo is embodied in the DEFLATE algorithm/format. That step isn't illustrated here, and isn't taken into account when calculating the size reduction. (I wanted to use compressibility as a proxy for measuring lyrical repetitiveness. The LZ algorithm has a clear connection with repeated words and phrases, but Huffman coding does not - it exploits differences in character frequencies.) I used infgen to parse out the pre-Huffman'd data from DEFLATE files generated by zopfli/gzip.