Federico's Blog

  1. On responsible vulnerability disclosure

    - kde, security

    Recently KDE had an unfortunate event. Someone found a vulnerability in the code that processes .desktop and .directory files, through which an attacker could create a malicious file that causes shell command execution (analysis). They went for immediate, full disclosure, where KDE didn't even get a chance of fixing the bug before it was published.

    There are many protocols for disclosing vulnerabilities in a coordinated, responsible fashion, but the gist of them is this:

    1. Someone finds a vulnerability in some software through studying some code, or some other mechanism.

    2. They report the vulnerability to the software's author through some private channel. For free softare in particular, researchers can use Openwall's recommended process for researchers, which includes notifying the author/maintainer and distros and security groups. Free software projects can follow a well-established process.

    3. The author and reporter agree on a deadline for releasing a public report of the vulnerability, or in semi-automated systems like Google Zero, a deadline is automatically established.

    4. The author works on fixing the vulnerability.

    5. The deadline is reached; the patch has been publically released, the appropriate people have been notified, systems have been patched. If there is no patch, the author and reporter can agree on postponing the date, or the reporter can publish the vulnerability report, thus creating public pressure for a fix.

    The steps above gloss over many practicalities and issues from the real world, but the idea is basically this: the author or maintainer of the software is given a chance to fix a security bug before information on the vulnerability is released to the hostile world. The idea is to keep harm from being done by not publishing unpatched vulnerabilities until there is a fix for them (... or until the deadline expires).

    What happened instead

    Around the beginning of July, the reporter posts about looking for bugs in KDE.

    On July 30, he posts a video with the proof of concept.

    On August 3, he makes a Twitter poll about what to do with the vulnerability.

    On August 4, he publishes the vulnerability.

    KDE is left with having to patch this in emergency mode. On August 7, KDE releases a security advisory in perfect form:

    • Description of exactly what causes the vulnerability.

    • Description of how it was solved.

    • Instructions on what to do for users of various versions of KDE libraries.

    • Links to easy-to-cherry-pick patches for distro vendors.

    Now, distro vendors are, in turn, in emergency mode, as they must apply the patch, run it through QA, release their own advisories, etc.

    What if this had been done with coordinated disclosure?

    The bug would have been fixed, probably in the same way, but it would not be in emergency mode. KDE's advisory contains this:

    Thanks to Dominik Penner for finding and documenting this issue (we wish however that he would have contacted us before making the issue public) and to David Faure for the fix.

    This is an extremely gracious way of thanking the reporter.

    I am not an infosec person...

    ... but some behaviors in the infosec sphere are deeply uncomfortable to me. I don't like it when security "research" is hard to tell from vandalism. "Excuse me, you left your car door unlocked" vs. "Hey everyone, this car is unlocked, have at it".

    I don't know the details of the discourse in the infosec sphere around full disclosure against irresponsible vendors of proprietary software or services. However, KDE is free software! There is no need to be an asshole to them.

  2. Constructors

    - GObject, rust

    Have you ever had these annoyances with GObject-style constructors?

    • From a constructor, calling a method on a partially-constructed object is dangerous.

    • A constructor needs to set up "not quite initialized" values in the instance struct until a construct-time property is set.

    • You actually need to override GObjectClass::constructed (or was it ::constructor?) to take care of construct-only properties which need to be considered together, not individually.

    • Constructors can't report an error, unless you derive from GInitable, which is not in gobject, but in gio instead. (Also, why does that force the constructor to take a GCancellable...?)

    • You need more than one constructor, but that needs to be done with helper functions.

    This article, Perils of Constructors, explains all of these problems very well. It is not centered on GObject, but rather on constructors in object-oriented languages in general.

    (Spoiler: Rust does not have constructors or partially-initialized structs, so these problems don't really exist there.)

    (Addendum: that makes it somewhat awkward to convert GObject code in C to Rust, but librsvg was able to solve it nicely with <buzzword>the typestate pattern</buzzword>.)

  3. Removing rsvg-view

    - librsvg

    I am preparing the 2.46.0 librsvg release. This will no longer have the rsvg-view-3 program.

    History of rsvg-view

    Rsvg-view started out as a 71-line C program to aid development of librsvg. It would just render an SVG file to a pixbuf, stick that pixbuf in a GtkImage widget, and show a window with that.

    Over time, it slowly acquired most of the command-line options that rsvg-convert supports. And I suppose, as a way of testing the Cairo-ification of librsvg, it also got the ability to print SVG files to a GtkPrintContext. At last count, it was a 784-line C program that is not really the best code in the world.

    What makes rsvg-view awkward?

    Rsvg-view requires GTK. But GTK requires librsvg, indirectly, through gdk-pixbuf! There is not a hard circular dependency because GTK goes, "gdk-pixbuf, load me this SVG file" without knowing how it will be loaded. In turn, gdk-pixbuf initializes the SVG loader provided by librsvg, and that loader reads/renders the SVG file.

    Ideally librsvg would only depend on gdk-pixbuf, so it would be able to provide the SVG loader.

    The rsvg-view source code still has a few calls to GTK functions which are now deprecated. The program emits GTK warnings during normal use.

    Rsvg-view is... not a very good SVG viewer. It doesn't even start up with the window scaled properly to the SVG's dimensions! If used for quick testing during development, it cannot even aid in viewing the transparent background regions which the SVG does not cover. It just sticks a lousy custom widget inside a GtkScrolledWindow, and does not have the conventional niceties to view images like zooming with the scroll wheel.

    EOG is a much better SVG viewer than rsvg-view, and people actually invest effort in making it pleasant to use.

    Removal of rsvg-view

    So, the next version of librsvg will not provide the rsvg-view-3 binary. Please update your packages accordingly. Distros may be able to move the compilation of librsvg to a more sensible place in the platform stack, now that it doesn't depend on GTK being available.

    What can you use instead? Any other image viewer. EOG works fine; there are dozens of other good viewers, too.

  4. Bzip2 1.0.7 is released

    - bzip2

    Bzip2 1.0.7 has been released by Mark Wielaard. We have a slight change of plans since my last post:

    • The 1.0.x series is in strict maintenance mode and will not change build systems. This is targeted towards embedded use, as in projects which already embed the bzip2-1.0.6 sources and undoubtedly patch the build system. Right now this series, and the tagged 1.0.7 release, live in the sourceware repository for bzip2.

    • The 1.1.x series has Meson and CMake build systems, and a couple of extra changes to modernize the C code but which were not fit for the 1.0.7 release. This is targeted towards operating system distributions. This lives in the master branch of the gitlab repository for bzip2.

    Distros and embedded users should start using bzip2-1.0.7 immediately. The patches they already have for the bzip2's traditional build system should still apply. The release includes bug fixes and security fixes that have accumulated over the years, including the new CVE-2019-12900.

    Once 1.1.0 is released, distributions should be able to remove their patches to the build system and just start using Meson or CMake. You may want to monitor the 1.1.0 milestone — help is appreciated fixing the issues there so we can make the first release with the new build systems!

  5. Preparing the bzip2-1.0.7 release

    - bzip2

    ATTENTION ALL DISTRIBUTIONS: this is for you. THE SONAME MAY CHANGE!

    I am preparing a bzip2-1.0.7 release. You can see the release notes, which should be of interest:

    • Many historical patches from various distributions are integrated now.

    • We have a new fix for the just-published CVE-2019-12900, courtesy of Albert Astals Cid.

    • Bzip2 has moved to Meson for its preferred build system, courtesy of Dylan Baker. For special situations, a CMake build system is also provided, courtesy of Micah Snyder.

    What's with the soname?

    From bzip2-1.0.1 (from the year 2000), until bzip2-1.0.6 (from 2010), release tarballs came with a special Makefile-libbz2_so to generate a shared library instead of a static one.

    This never used libtool or anything; it specified linker flags by hand. Various distributions either patched this special makefile, or replaced it by another one, or outright replaced the complete build system for a different one.

    Some things to note:

    • This hand-written Makefile-libbz2_so used a link line like $(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.6. This means, make the DT_SONAME field inside the ELF file be libbz2.so.1.0 (note the two digits in 1.0), and make the filename of the shared library be libbz2.so.1.0.6.

    • Fedora patched the soname in a patch called "saneso" to just be libbz2.so.1.

    • Stanislav Brabec, from openSUSE, replaced the hand-written makefiles with autotools, which meant using libtool. It has this interesting note:

    Incompatible changes:

    soname change. Libtool has no support for two parts soname suffix (e. g. libbz2.so.1.0). It must be a single number (e. g. libbz2.so.1). That is why soname must change. But I see not a big problem with it. Several distributions already use the new number instead of the non-standard number from Makefile-libbz2_so.

    (In fact, if I do objdump -x /usr/lib64/*.so | grep SONAME, I see that most libraries have single-digit sonames.)

    In my experience, both Fedora and openSUSE are very strict, and correct, about obscure things like library sonames.

    With the switch to Meson, bzip2 no longer uses libtool. It will have a single-digit soname — this is not in the meson.build yet, but expect it to be there within the next couple of days.

    I don't know what distros which decided to preserve the 1.0 soname will need to do; maybe they will need to patch meson.build on their own.

    Fortunately, the API/ABI are still exactly the same. You can preserve the old soname which your distro was using and linking libbz2 will probably keep working as usual.

    (This is a C-only release as usual. The Rust branch is still experimental.)

  6. Bzip2 in Rust: porting the randomization table

    - bzip2, rust

    Here is a straightforward port of some easy code.

    randtable.c has a lookup table with seemingly-random numbers. This table is used by the following macros in bzlib_private.h:

    extern Int32 BZ2_rNums[512];
    
    #define BZ_RAND_DECLS                          \
       Int32 rNToGo;                               \
       Int32 rTPos                                 \
    
    #define BZ_RAND_INIT_MASK                      \
       s->rNToGo = 0;                              \
       s->rTPos  = 0                               \
    
    #define BZ_RAND_MASK ((s->rNToGo == 1) ? 1 : 0)
    
    #define BZ_RAND_UPD_MASK                       \
       if (s->rNToGo == 0) {                       \
          s->rNToGo = BZ2_rNums[s->rTPos];         \
          s->rTPos++;                              \
          if (s->rTPos == 512) s->rTPos = 0;       \
       }                                           \
       s->rNToGo--;
    

    Here, BZ_RAND_DECLS is used to declare two fields, rNToGo and rTPos, into two structs (1, 2). Both are similar to this:

    typedef struct {
       ...
       Bool     blockRandomised;
       BZ_RAND_DECLS
       ...
    } DState;
    

    Then, the code that needs to initialize those fields calls BZ_RAND_INIT_MASK, which expands into code to set the two fields to zero.

    At several points in the code, BZ_RAND_UPD_MASK gets called, which expands into code that updates the randomization state, or something like that, and uses BZ_RAND_MASK to get a useful value out of the randomization state.

    I have no idea yet what the state is about, but let's port it directly.

    Give things a name

    It's interesting to see that no code except for those macros uses the fields rNToGo and rTPos, which are declared via BZ_RAND_DECLS. So, let's make up a type with a name for that. Since I have no better name for it, I shall call it just RandState. I added that type definition in the C code, and replaced the macro-which-creates-struct-fields with a RandState-typed field:

    -#define BZ_RAND_DECLS                          \
    -   Int32 rNToGo;                               \
    -   Int32 rTPos                                 \
    +typedef struct {
    +   Int32 rNToGo;
    +   Int32 rTPos;
    +} RandState;
    
    ...
    
    -      BZ_RAND_DECLS;
    +      RandState rand;
    

    Since the fields now live inside a sub-struct, I changed the other macros to use s->rand.rNToGo instead of s->rNToGo, and similarly for the other field.

    Turn macros into functions

    Now, three commits (1, 2, 3) to turn the macros BZ_RAND_INIT_MASK, BZ_RAND_MASK, and BZ_RAND_UPD_MASK into functions.

    And now that the functions live in the same C source file as the lookup table they reference, the table can be made static const to avoid having it as read/write unshared data in the linked binary.

    Premature optimization concern: doesn't de-inlining those macros cause performance problems? At first, we will get the added overhead from a function call. When the whole code is ported to Rust, the Rust compiler will probably be able to figure out that those tiny functions can be inlined (or we can #[inline] them by hand if we have proof, or if we have more hubris than faith in LLVM).

    Port functions and table to Rust

    The functions are so tiny, and the table so cut-and-pasteable, that it's easy to port them to Rust in a single shot:

    #[no_mangle]
    pub unsafe extern "C" fn BZ2_rand_init() -> RandState {
        RandState {
            rNToGo: 0,
            rTPos: 0,
        }
    }
    
    #[no_mangle]
    pub unsafe extern "C" fn BZ2_rand_mask(r: &RandState) -> i32 {
        if r.rNToGo == 1 {
            1
        } else {
            0
        }
    }
    
    #[no_mangle]
    pub unsafe extern "C" fn BZ2_rand_update_mask(r: &mut RandState) {
        if r.rNToGo == 0 {
            r.rNToGo = RAND_TABLE[r.rTPos as usize];
            r.rTPos += 1;
            if r.rTPos == 512 {
                r.rTPos = 0;
            }
        }
        r.rNToGo -= 1;
    }
    

    Also, we define the RandState type as a Rust struct with a C-compatible representation, so it will have the same layout in memory as the C struct. This is what allows us to have a RandState in the C struct, while in reality the C code doesn't access it directly; it is just used as a struct field.

    // Keep this in sync with bzlib_private.h:
    #[repr(C)]
    pub struct RandState {
        rNToGo: i32,
        rTPos: i32,
    }
    

    See the commit for the corresponding extern declarations in bzlib_private.h. With those functions and the table ported to Rust, we can remove randtable.c. Yay!

    A few cleanups

    After moving to another house one throws away useless boxes; we have to do some cleanup in the Rust code after the initial port, too.

    Rust prefers snake_case fields rather than camelCase ones, and I agree. I renamed the fields to n_to_go and table_pos.

    Then, I discovered that the EState struct doesn't actually use the fields for the randomization state. I just removed them.

    Exegesis

    What is that randomization state all about?

    And why does DState (the struct used during decompression) need the randomization state, but EState (used during compression) doesn't need it?

    I found this interesting comment:

          /*-- 
             Now a single bit indicating (non-)randomisation. 
             As of version 0.9.5, we use a better sorting algorithm
             which makes randomisation unnecessary.  So always set
             the randomised bit to 'no'.  Of course, the decoder
             still needs to be able to handle randomised blocks
             so as to maintain backwards compatibility with
             older versions of bzip2.
          --*/
          bsW(s,1,0);
    

    Okay! So compression no longer uses randomization, but decompression has to support files which were compressed with randomization. Here, bsW(s,1,0) always writes a 0 bit to the file.

    However, the decompression code actually reads the blockRandomised bit from the file so that it can see whether it is dealing with an old-format file:

    GET_BITS(BZ_X_RANDBIT, s->blockRandomised, 1);
    

    Later in the code, this s->blockRandomised field gets consulted; if the bit is on, the code calls BZ2_rand_update_mask() and friends as appropriate. If one is using files compressed with Bzip2 0.9.5 or later, those randomization functions are not even called.

    Talk about preserving compatibility with the past.

    Explanation, or building my headcanon

    Bzip2's compression starts by running a Burrows-Wheeler Transform on a block of data to compress, which is a wonderful algorithm that I'm trying to fully understand. Part of the BWT involves sorting all the string rotations of the block in question.

    Per the comment I cited, really old versions of bzip2 used a randomization helper to make sorting perform well in extreme cases, but not-so-old versions fixed this.

    This explains why the decompression struct DState has a blockRandomised bit, but the compression struct EState doesn't need one. The fields that the original macro was pasting into EState were just a vestige from 1999, which is when Bzip2 0.9.5 was released.

  7. Bzip2 uses Meson and Autotools now — and a plea for help

    - bzip2, meson

    There is a lot of activity in the bzip2 repository!

    Perhaps the most exciting thing is that Dylan Baker made a merge request to add Meson as a build system for bzip2; this is merged now into the master branch.

    The current status is this:

    • Both Meson and Autotools are supported.
    • We have CI runs for both build systems.

    A plea for help: add CI runners for other platforms!

    Do you use *BSD / Windows / Solaris / etc. and know how to make Gitlab's CI work for them?

    The only runners we have now for bzip2 are for well-known Linux distros. I would really like to keep bzip2 working on non-Linux platforms. If you know how to make Gitlab CI runners for other systems, please send a merge request!

    Why two build systems?

    Mainly uncertainty on my part. I haven't used Meson extensively; people tell me that it works better than Autotools out of the box for Windows.

    Bzip2 runs on all sorts of ancient systems, and I don't know whether Meson or Autotools will be a better fit for them. Time will tell. Hopefully in the future we can have only a single supported build system for bzip2.

  8. Bzip2 repository reconstructed

    - bzip2

    I have just done a git push --force-with-lease to bzip2's master branch, which means that if you had a previous clone of this repository, you'll have to re-fetch it and rebase any changes you may have on top.

    I apologize for the inconvenience!

    But I have a good excuse: Julian Seward pointed me to a repository at sourceware where Mark Wielaard reconstructed a commit history for bzip2, based on the historical tarballs starting from bzip2-0.1. Bzip2 was never maintained under revision control, so the reconstructed repository should be used mostly for historical reference (go look for bzip2.exe in the initial commit!).

    I have rebased all the post-1.0.6 commits on top of Mark's repository; this is what is in the master branch now.

    There is a new rustify branch as well, based on master, which is where I will do the gradual port to Rust.

    I foresee no other force-pushes to the master branch in the future. Apologies again if this disrupts your workflow.

    Update: Someone did another reconstruction. If they weave the histories together, I'll do another force-push, the very last one, I promise. If you send merge requests, I'll rebase them myself if that happens.

  9. Maintaining bzip2

    - bzip2

    Today I had a very pleasant conversation with Julian Seward, of bzip2 and Valgrind fame. Julian has kindly agreed to cede the maintainership of bzip2 to me.

    Bzip2 has not had a release since 2010. In the meantime, Linux distros have accumulated a number of bug/security fixes for it. Seemingly every distributor of bzip2 patches its build system. The documentation generation step is a bit creaky. There is no source control repository, nor bug tracker. I hope to fix these things gradually.

    This is the new repository for bzip2.

    Ways in which you can immediately help by submitting merge requests:

    • Look at the issues; currently they are around auto-generating the version number.

    • Create a basic continuous integration pipeline that at least builds the code and runs the tests.

    • Test the autotools setup, courtesy of Stanislav Brabec, and improve it as you see fit.

    The rustification will happen in a separate branch for now, at least until the Autotools setup settles down.

    I hope to have a 1.0.7 release soon, but this really needs your help. Let's revive this awesome little project.

Page 1 / 7 »