Bzip2 in Rust - Basic infrastructure and CRC32 computation

- bzip2, rust

I have started a little experiment in porting bits of the widely-used bzip2/bzlib to Rust. I hope this can serve to refresh bzip2, which had its last release in 2010 and has been nominally unmaintained for years.

I hope to make several posts detailing how this port is done. In this post, I'll talk about setting up a Rust infrastructure for bzip2 and my experiments in replacing the C code that does CRC32 computations.

Super-quick summary of how librsvg was ported to Rust

  • Add the necessary autotools infrastructure to build a Rust sub-library that gets linked into the main public library.

  • Port bit by bit to Rust. Add unit tests as appropriate. Refactor endlessly.

  • MAINTAIN THE PUBLIC API/ABI AT ALL COSTS so callers don't notice that the library is being rewritten under their feet.

I have no idea of how bzip2 works internally, but I do know how to maintain ABIs, so let's get started.

Bzip2's source tree

As a very small project that just builds a library and couple of executables, bzip2 was structured with all the source files directly under a toplevel directory.

The only tests in there are three reference files that get compressed, then uncompressed, and then compared to the original ones.

As the rustification proceeds, I'll move the files around to better places. The scheme from librsvg worked well in this respect, so I'll probably be copying many of the techniques and organization from there.

Deciding what to port first

I looked a bit at the bzip2 sources, and the code to do CRC32 computations seemed isolated enough from the rest of the code to port easily.

The CRC32 code was arranged like this. First, a lookup table in crc32table.c:

UInt32 BZ2_crc32Table[256] = {
   0x00000000L, 0x04c11db7L, 0x09823b6eL, 0x0d4326d9L,
   0x130476dcL, 0x17c56b6bL, 0x1a864db2L, 0x1e475005L,
   ...
}

And then, three macros in bzlib_private.h which make up all the CRC32 code in the library:

extern UInt32 BZ2_crc32Table[256];

#define BZ_INITIALISE_CRC(crcVar)              \
{                                              \
   crcVar = 0xffffffffL;                       \
}

#define BZ_FINALISE_CRC(crcVar)                \
{                                              \
   crcVar = ~(crcVar);                         \
}

#define BZ_UPDATE_CRC(crcVar,cha)              \
{                                              \
   crcVar = (crcVar << 8) ^                    \
            BZ2_crc32Table[(crcVar >> 24) ^    \
                           ((UChar)cha)];      \
}

Initially I wanted to just remove this code and replace it with one of the existing Rust crates to do CRC32 computations, but first I needed to know which variant of CRC32 this is.

Preparing the CRC32 port so it will not break

I needed to set up tests for the CRC32 code so the replacement code would compute exactly the same values as the original:

Then I needed a test that computed the CRC32 values of several strings, so I could capture the results and make them part of the test.

static const UChar buf1[] = "";
static const UChar buf2[] = " ";
static const UChar buf3[] = "hello world";
static const UChar buf4[] = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, ";

int
main (void)
{
    printf ("buf1: %x\n", crc32_buffer(buf1, strlen(buf1)));
    printf ("buf2: %x\n", crc32_buffer(buf2, strlen(buf2)));
    printf ("buf3: %x\n", crc32_buffer(buf3, strlen(buf3)));
    printf ("buf4: %x\n", crc32_buffer(buf4, strlen(buf4)));
    // ...
}

This computes the CRC32 values of some strings using the original algorithm, and prints their results. Then I could cut&paste those results, and turn the printf into assert — and that gives me a test.

int
main (void)
{
    assert (crc32_buffer (buf1, strlen (buf1)) == 0x00000000);
    assert (crc32_buffer (buf2, strlen (buf2)) == 0x29d4f6ab);
    assert (crc32_buffer (buf3, strlen (buf3)) == 0x44f71378);
    assert (crc32_buffer (buf4, strlen (buf4)) == 0xd31de6c9);
    // ...
}

Setting up a Rust infrastructure for bzip2

Two things made this reasonably easy:

I.e. "copy and paste from somewhere that I know works well". Wonderful!

This is the commit that adds a Rust infrastructure for bzip2. It does the following:

  1. Create a Cargo workspace (a Cargo.toml in the toplevel) with a single member, a bzlib_rust directory where the Rustified parts of the code will live.
  2. Create bzlib_rust/Cargo.toml and bzlib_rust/src for the Rust sources. This will generate a staticlib for libbzlib_rust.a, that can be linked into the main libbz2.la.
  3. Puts in automake hooks so that make clean, make check, etc. all do what you expect for the Rust part.

As a side benefit, librsvg's Autotools+Rust infrastructure already handled things like cross-compilation correctly, so I have high hopes that this will be good enough for bzip2.

Can I use a Rust crate for CRC32?

There are many Rust crates to do CRC computations. I was hoping especially to be able to use crc32fast, which is SIMD-accelerated.

I wrote a Rust version of the "CRC me a buffer" test from above to see if crc32fast produced the same values as the C code, and of course it didn't. Eventually, after asking on Mastodon, Kepstin figured out what variant of CRC32 is being used in the original code.

It turns out that this is directly doable in Rust with the git version of the crc crate. This crate lets one configure the CRC32 polynomial and the mode of computation; there are many variants of CRC32 and I wasn't fully aware of them.

The magic incantation is this:

let mut digest = crc32::Digest::new_custom(crc32::IEEE, !0u32, !0u32, crc::CalcType::Normal);

With that, the Rust test produces the same values as the C code. Yay!

But it can't be that easy

Bzlib stores its internal state in the EState struct, defined in bzlib_private.h.

That struct stores several running CRC32 computations, and the state for each one of those is a single UInt32 value. However, I cannot just replace those struct fields with something that comes from Rust, since the C code does not know the size of a crc32::Digest from Rust.

The normal way to do this (say, like in librsvg) would be to turn UInt32 some_crc into void *some_crc and heap-allocate that on the Rust side, with whatever size it needs.

However!

It turns out that bzlib lets the caller define a custom allocator so that bzlib doesn't use plain malloc() by default.

Rust lets one define a global, custom allocator. However, bzlib's concept of a custom allocator includes a bit of context:

typedef struct {
    // ...

    void *(*bzalloc)(void *opaque, int n, int m);
    void (*bzfree)(void *opaque, void *ptr);
    void *opaque;
} bz_stream;

The caller sets up bzalloc/bzfree callbacks and an optional opaque context for the allocator. However, Rust's GlobalAlloc is set up at compilation time, and we can't pass that context in a good, thread-safe fashion to it.

Who uses the bzlib custom allocator, anyway?

If one sets bzalloc/bzfree to NULL, bzlib will use the system's plain malloc()/free() by default. Most software does this.

I am looking in Debian's codesearch for where bzalloc gets set, hoping that I can figure out if that software really needs a custom allocator, or if they are just dressing up malloc() with logging code or similar (ImageMagick seems to do this; Python seems to have a genuine concern about the Global Interpreter Lock). Debian's codesearch is a fantastic tool!

The first rustified code

I cut&pasted the CRC32 lookup table and fixed it up for Rust's syntax, and also ported the CRC32 computation functions. I gave them the same names as the original C ones, and exported them, e.g.

const TABLE: [u32; 256] = [
   0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9,
   ...
};

#[no_mangle]
pub unsafe extern "C" fn BZ2_update_crc(crc_var: &mut u32, cha: u8) {
    *crc_var = (*crc_var << 8) ^ TABLE[((*crc_var >> 24) ^ u32::from(cha)) as usize];
}

This is a straight port of the C code. Rust is very strict about integer sizes, and arrays can only be indexed with a usize, not any random integer — hence the explicit conversions.

And with this, and after fixing the linkage, the tests pass!

First pass at rustifying CRC32: done.

But that does one byte at a time

Indeed; the original C code to do CRC32 only handled one byte at a time. If I replace this with a SIMD-enabled Rust crate, it will want to process whole buffers at once. I hope the code in bzlib can be refactored to do that. We'll see!

How to use an existing Rust crate for this

I just found out that one does not in fact need to use a complete crc32::Digest to do equivalent computations; one can call crc32::update() by hand and maintain a single u32 state, just like the original UInt32 from the C code.

So, I may not need to mess around with a custom allocator just yet. Stay tuned.

In the meantime, I've filed a bug against crc32fast to make it possible to use a custom polynomial and order and still get the benefits of SIMD.