I came across this prize for compressing human knowledge.
The challenge is to compress the 1GB file enwik9 to less than the current record of about 114MB
This contest is motivated by the fact that "being able to compress well is closely related to intelligence"
And to encourage the development of intelligent compressors/programs as a path to AGI.
Why use wikipedia? Wikipedia is an extensive snapshot of human knowledge. And if you can compress the first 1GB of Wikipedia better than your predecessors, your (de)compressor likely has to be smarter.
This FAQ page goes a lot into what compression is, why you need AI to solve it, lossless compression, etc.
I liked this snippet about what are good compressors for:
- storage
- transmission
- prediction
- understanding
- induction
- intelligence
Some resources for getting started with compression.
- Read Data Compression Explained
- If too dense, read C1-6 of Handbook of Data Compression
- Understand current SOTA such as PAQ
- Information theory, ML, probability and statistics since most modern compression algos are based on arithmetic coding based on estimated probabilistic predictions
- start by implement simple compression like Run-length encoding and LZ77 and LZ78
- Move on to Context tree weighting
- Look at past winners (2023 and 2021), and this page of Data Compression Programs