5.2. Types of Compression
There are different types of compression:
- Run-length compression
Run-length compression, for example, changes the sequence AAAAA to 5A.
Escape characters are needed to distinguish between plain text and compressed parts. Run-length compression is simple but makes sense only for few kinds of byte sequences.
- Frequency-dependent coding of single characters
When there is a text document in the English language, the characters ‘e,’ ‘n,’ ‘r,’ and ‘s’ occur more often than other characters, for example, ‘q’ or ‘z’. The space character also occurs more often. Normally, a character is encoded in 8 bits, giving 256 possible representations. An ‘e,’ for example, is encoded in 5 or 6 bits with frequency-dependent coding. To still enable the encoding of 256 possible values, the less frequently used characters are encoded in more than 8 bits. So for the entire document, the number of bits used in total is reduced.
This may be called Shannon-fano coding.