Librsvg, Rust, and non-mainstream architectures

Translations: es - Tags: librsvg, rust

Almost five years ago librsvg introduced Rust into its source code. Around the same time, Linux distributions started shipping the first versions of Firefox that also required Rust. I unashamedly wanted to ride that wave: distros would have to integrate a new language in their build infrastructure, or they would be left without Firefox. I was hoping that having a working Rust toolchain would make it easier for the rustified librsvg to get into distros.

Two years after that, someone from Debian complained that this made it hard or impossible to build librsvg (and all the software that depends on it, which is A Lot) on all the architectures that Debian builds on — specifically, on things like HP PA-RISC or Alpha, which even Debian marks as "discontinued" now.

Recently there was a similar kerfuffle, this time from someone from Gentoo, specifically about how Python's cryptography package now requires Rust. So, it doesn't build for platforms that Rust/LLVM don't support, like hppa, alpha, and Itanium. It also doesn't build for platforms for which there are no Rust packages from Gentoo yet (mips, s390x, riscv among them).

Memories of discontinued architectures

Let me reminisce about a couple of discontinued architectures. If I'm reading Wikipedia correctly, the DEC Alpha ceased to be developed in 2001, and HP, who purchased Compaq, who purchased DEC, stopped selling Alpha systems in 2007. Notably, Compaq phased out the Alpha in favor of the Itanium, which stopped being developed in 2017.

I used an Alpha machine in 1997-1998, back at the University. Miguel kindly let me program and learn from him at the Institute where he worked, and the computer lab there got an Alpha box to let the scientists run mathematical models on a machine with really fast floating-point. This was a time when people actually regularly ssh'ed into machines to run X11 applications remotely — in their case, I think it was Matlab and Mathematica. Good times.

The Alpha had fast floating point, much faster than Intel x86 CPUs, and I was delighted to do graphics work on it. That was the first 64-bit machine I used, and it let me learn how to fix code that only assumed 32 bits. It had a really picky floating-point unit. Whereas x86 would happily throw you a NaN if you used uninitialized memory as floats, the Alpha would properly fault and crash the program. I fixed so many bugs thanks to that!

I also have fond memories of the 32-bit SPARC boxes at the University and their flat-screen fixed-frequency CRT displays, but you know, I haven't seen one of those machines since 1998. Because I was doing graphics work, I used the single SPARC machine in the computer lab at the Institute that had 24-bit graphics, with a humongous 21" CRT display. PCs at the time still had 8-bit video cards and shitty little monitors.

At about the same time that the Institute got its Alpha, it also got one of the first 64-bit UltraSPARCs from Sun — a very expensive machine definitely not targeted to hobbyists. I think it had two CPUs! Multicore did not exist!

I think I saw a single Itanium machine in my life, probably around 2002-2005. The Ximian/Novell office in Mexico City got one, for QA purposes — an incredibly loud and unstable machine. I don't think we ever did any actual development on that box; it was a "can you reproduce this bug there" kind of thing. I think Ximian/Novell had a contract with HP to test the distro there, I don't remember.

Unsupported architectures at the LLVM level

Platforms like the Alpha and Itanium that Rust/LLVM don't support — those platforms are dead in the water. The compiler cannot target them, as Rust generates machine code via LLVM, and LLVM doesn't support them.

I don't know why distributions maintained by volunteers give themselves the responsibility to keep their software running on platforms that have not been manufactured for years, and that were never even hobbyist machines.

I read the other day, and now I regret not keeping the link, something like this: don't assume that your hobby computing entitles you to free labor on the part of compiler writers, software maintainers, and distro volunteers. (If someone helps me find the source, I'll happily link to it and quote it properly.)

Non-tier-1 platforms and "$distro does not build Rust there yet"

I think people are discovering these once again:

  • Writing and supporting a compiler for a certain architecture takes Real Work.

  • Supporting a distro for a certain architecture takes Real Work.

  • Fixing software to work on a certain architecture takes Real Work.

Rust divides its support for different platforms into tiers, going from tier 1, the most supported, to tier 3, the least supported. Or, I should say, taken care of, which is a combination of people who actually have the hardware in question, and whether the general CI and build tooling is prepared to deal with them as effectively as it does for tier 1 platforms.

In other words: there are more people capable of paying attention to, and testing things on, x86_64 PCs than there are for sparc-unknown-linux-gnu.

Some anecdotes from Suse

At Suse we actually support IBM's s390x big iron; those mainframes run Suse Linux Enterprise Server. You have to pay a lot of money to get a machine like that and support for it. It's a room-sized beast that requires professional babysitting.

When librsvg and Firefox started getting rustified, there was of course concern about getting Rust to work properly on the s390x. I worked sporadically with the people who made the distro work there, and who had to deal with building Rust and Firefox on it (librsvg was a non-issue after getting Rust and Firefox to work).

I think all the LLVM work for the s390x was done at IBM. There were probably a couple of miscompilations that affected Firefox; they got fixed.

One would expect bugs in software for IBM mainframes to be fixed by IBM or its contractors, not by volunteers maintaining a distro in their spare time.

Giving computing time on mainframes to volunteers in distros could seem like a good samaritan move, or a trap to extract free labor from unsuspecting people.

Endianness bugs

Firefox's problems on the s390x were more around big-endian bugs than anything. You see, all the common architectures these days (x86_64 and arm64) are little-endian. However, s390x is big-endian, which means that all multi-byte numbers in memory are stored backwards from what most software expects.

It is not a problem to write software that assumes little-endian or big-endian all the time, but it takes a little care to write software that works on either.

Most of the software that volunteers and paid people write assumes little-endian CPUs, because that is likely what they are targeting. It is a pain in the ass to encounter code that works incorrectly on big-endian — a pain because knowing where to look for evidence of bugs is tricky, and fixing existing code to work with either endianness can be either very simple, or a major adventure in refactoring and testing.

Two cases in point:

Firefox. When Suse started dealing with Rust and Firefox in the s390x, there were endianness bugs in the graphics code in Firefox that deals with pixel formats. Whether pixels get stored in memory as ARGB/ABGR/RGBA/etc. is a platform-specific thing, and is generally a combination of the graphics hardware for that platform, plus the actual CPU architecture. At that time, it looked like the C++ code in Firefox that deals with pixels had been rewritten/refactored, and had lost big-endian support along the way. I don't know the current status (not a single big-endian CPU in my vincinity), but I haven't seen related bugs come in the Suse bug tracker? Maybe it's fixed now?

Librsvg had two root causes of bugs for big-endian. One was in the old code for SVG filter effects that was written in C; it never supported big-endian. The initial port to Rust inherited the same bug (think of a line-by-line port, althought it wasn't exactly like that), but it got fixed when my Summer of Code intern Ivan Molodetskikh refactored the code to have a Pixel abstraction that works for little-endian and big-endian, and wraps Cairo's funky requirements.

The other endian-related bug in librsvg was when computing masks. Again, a little refactoring with that Pixel abstraction fixed it.

I knew that the original C code for SVG filter effects didn't work on big-endian. But even back then, at Suse we never got reports of it producing incorrect results on the s390x... maybe people don't use their mainframes to run rsvg-convert? I was hoping that the port to Rust of that code would automatically fix that bug, and it kind of happened that way through Ivan's careful work.

And the code for masks? There were two bugs reported with that same root cause: one from Debian as a failure in librsvg's test suite (yay, it caught that bug!), and one from someone running an Apple PowerBook G4 with a MATE desktop and seeing incorrectly-rendered SVG icons.

And you know what? I am delighted to see people trying to keep those lovely machines alive. A laptop that doesn't get warm enough to burn your thighs, what a concept. A perfectly serviceable 32-bit laptop with a maximum of about 1 GB of RAM and a 40 GB hard drive (it didn't have HDMI!)... But you know, it's the same kind of delight I feel when people talk about doing film photography on a Rollei 35. A lot of nostalgia for hardware of days past, and a lot of mixed feelings about not throwing out working things and creating more trash.

As a graphics programmer I feel the responsibility to write code that works on little-endian and big-endian, but you know, it's not exactly an everyday concern anymore. The last big-endian machine I used on an everyday basis was the SPARCs in the university, more than 20 years ago.

Who gets paid to fix this?

That's the question. Suse got paid to support Firefox on the s390x; I suppose IBM has an interest in fixing LLVM there; both actually have people and hardware and money to that effect.

Within Suse, I am by default responsible for keeping librsvg working for the s390x as well — it gets built as part of the distro, after all. I have never gotten an endianness bug report from the Suse side of things.

Which leads me to suspect that, probably similar to Debian and Gentoo, we build a lot of software because it's in the build chain, but we don't run it to its fullest extent. Do people run GNOME desktops on s390x virtual machines? Maybe? Did they not notice endianness bugs because they were not in the code path that most GNOME icons actually use? Who knows!

I'm thankful to Simon from the Debian bug for pointing out the failure in librsvg's test case for masks, and to Mingcong for actually showing a screenshot of a MATE desktop running on a PPC PowerBook. Those were useful things for them to do.

Also — they were kind about it. It was a pleasure to interact with them.