Free Samples

Ideas about zLib

my name is Klaus Brandstätter, I was for 40 years CEO, chief software architect and senior programmer of the mid-size German software company HOB. HOB had around 60 software developers. It products use zLib and also other compression routines.

HOB has an implementation of V.42bis which is Lempel-Ziv-Welch. V.42bis needs less resources compared to Lempel-Ziv, but compresses not as good and has also implementations of MPPC which is used in Windows Terminal Server products.

HOB, for testing, has run-length compression and also a dummy compression routine.As HOB has many different compression- and de-compression routines, it uses its own header files.

These are hob-cd-file-2.h for the so-called file-oriented compression; FLUSH is not available. And hob-cd-record-2.h for the so-called record-oriented compression; FLUSH may be used. HOB develops big projects with many functions. So HOB tries to keep the number of source files small, even if they get bigger.

All header and source files of zLib have been put to a single file, xs-cdx-zlib-3.pre. zLib version 1.2.11 is used.

HOB PRECOMP is used, similar to the C/C++ precompiler. HOB PRECOMP gets active thru the control-character “percent”, %, and HOB PRECOMP becames active only after a present-sign. The C/C++ precompiler because active after the control-character “hash”, #, and may become active everywhere.

There is GEN-CDX-ZLIB-3.bat to generate the two source file from the .pre-file. The original zLib has single-linked-lists for every single byte in the dictionary. HOB changed zLib so that there are single-linked-lists only for every second byte in the dictionary.

This is enough since there bytes are the shortest sequence for matches. De-compression stays unchanged. These changes aim to reduce the number of CPU-cycles needed for compression.

As side-effect the size of the memory needed for compression is reduced. Compression-ratio nearly stays the same. The changes are in the file deflate.c, in the function deflate_slow().

In the original zLib-source you can find:

if (s->match_length <= s->max_insert_length && s->lookahead >= MIN_MATCH) {
s->match_length–; /* string at strstart already in table */
 do {
s->strstart++;
INSERT_STRING(s, s->strstart, hash_head);
/* strstart never exceeds WSIZE-MAX_MATCH, so there are
* always MIN_MATCH bytes ahead.
*/
    } while (–s->match_length != 0);
s->strstart++;

In the corresponding modified HOB zLib-source you instead find:

/* Insert in hash table all strings up to the end of the match.
* strstart-1 and strstart are already inserted. If there is not
* enough lookahead, the last two strings are not inserted in
* the hash table.
*/
s->lookahead -= s->prev_length1;
s->prev_length -= 2;

do {
  if (++s->strstart <= max_insert) {
  // INSERT_STRING(s, s->strstart, hash_head);
  /* INSERT_STRING() start -PRECOMP- */
  (s->ins_h = (((s->ins_h)<<s->hash_shift)
  ^ (s->window[(s->strstart) + (31)])) & s->hash_mask);
  hash_head = s->prev[(s->strstart) & s->w_mask] = s->head[s->ins_h];
  s->head[s->ins_h] = (Pos)(s->strstart);
  /* INSERT_STRING() end -PRECOMP- */
  #ifdef TRACEHL1
      printf( “xs-cdx-zlib-3.pre-l%05d deflate_slow() max_insert=%p.\n, __LINE__, max_insert );
      printf( “xs-cdx-zlib-3.pre-l%05d deflate_slow() -2- s->strstart=%p s->prev[(s->strstart) & s->w_mask]=%p.\n, __LINE__, s->strstart, s->prev[(s->strstart) & s->w_mask] );
  #endif
  }
} while (–s->prev_length != 0);
s->match_available = 0;
s->match_length = MIN_MATCH-1;
s->strstart++;

HOB, for testing of the different compression-routines, has developed programs to test the compression. One of these is xbt-cdf-comp-3.cpp with the Windows-Visual-Sudio-project xbt-cdf-comp-3-zlib-3.sln / .vcxproj.

xbt-cdf-comp-3 opens a file, in pieces reads this file, compresses a chunk and then de-compresses the result. The original content and the output of the de-compression are compared.

When these are equal, xbt-cdf-comp-3 proceeds. Otherwise it gives an error-message and stops. xbt-cdf-comp-3-zlib-3 works fine with most input-files, but with some input-files it shows that the compression in HOB’s zLib-compression does not work well.

I have spent some hours to debug xbt-cdf-comp-3-zlib-3.exe. zLib-compression contains macros, and in some of these macros there is “goto”.

So it is very hard to debug zLib because you cannot single-step in these macros nor can you set breakpoints, at least not with Microsoft-Visual-Studio that I use. So I gave it up.

zLib gives problems when the output area is less than six bytes in size. This should be changed.

In the 1990ies I worked with LZHUF, see https://github.com/topics/lzhuf. LZHUF is an implementation of Lempel-Ziv-Hufman, just like zLib.

I modified LZHUF so that only every second character in the dictionary is checked, the same what I later did with zLib.

This results in less CPU-cycles needed for compression, and also less memory is needed. To my surprise, the modified LZHUF compressed slidely better. zLib uses singly-liked-lists to access the dictionary, LZHUF uses binary trees.

Binary trees need less CPU-cycles than singly-liked-lists, so zLib should be modified also to use binary trees.

Binary trees may get un-balanced. Against that is the usage of AVL-trees.

When zLib would use AVL-trees, the number of CPU-cycles to access the trees would get smaller, but extra instructions are needed for the AVL.

So I cannot predict if zLib would get faster thru AVL-trees, compared to un-balanced trees. This has to be tried out.

About Author

Klaus Brandstätter

Author/Writer

The Author, Klaus Brandstätter, was born in 1954 in Nuremberg, Germany.

During his studies, Klaus Brandstätter worked with many different computer systems and many different software development projects using Fortran and Assembler programming languages.

Klaus worked in the young computer department. After finishing University, he began his career with HOB, where he worked from 1981 till 2018 as CEO and Chief Software Architect at HOB. The first HOB project was hardware terminals for IBM Mainframes and software managing these terminals. In the 1980s and 1990s, HOB concentrated on products for IBM Mainframes.

Later, HOB developed products for Windows Terminal Servers and Secure Remote Access.
Klaus Brandstätter himself worked with the software developers and developed software in the programming languages Assembler, C/C++, and Java.

Get a copy