Federico's Blog

  1. Moving gnome-shell's styles to Rust

    - gnome, gnome-shell, rust

    Gnome-shell uses CSS processing code that dates from HippoCanvas, a CSS-aware canvas from around 2006. It uses libcroco to parse CSS, and implements selector matching by hand in C.

    This code is getting rather dated, and libcroco is unmaintained.

    I've been reading the code for StTheme and StThemeNode, and it looks very feasible to port it gradually to Rust, by using the same crates that librsvg uses, and eventually removing libcroco altogether: gnome-shell is the last module that uses libcroco in distro packages.

    Strategy

    StTheme and StThemeNode use libcroco to load CSS stylesheets and keep them in memory. The values of individual properties are just tokenized and kept around as a linked list of CRTerm; this struct represents a single token.

    Later, the drawing code uses functions like st_theme_node_lookup_color(node, "property_name") or st_theme_node_lookup_length() to query the various properties that it needs. It is then that the type of each property gets determined: prior to that step, property values are just tokenized, not parsed into usable values.

    I am going to start by porting the individual parsers to Rust, similar to what Paolo and I did for librsvg. It turns out that there's some code we can share.

    So far I have the parser for colors implemented in Rust. This removes a little bunch of code from the C parsers, and replaces it with a little Rust code, since the cssparser crate can already parse CSS colors with alpha with no extra work — libcroco didn't support alpha.

    As a bonus, this supports hsl() colors in addition to rgb() ones out of the box!

    After all the parsers are done, the next step would be to convert the representation of complete stylesheets into pure Rust code.

    What can we expect?

    A well-maintained CSS stack. Firefox and Servo both use the crates in question, so librsvg and gnome-shell should get maintenance of a robust CSS stack "for free", for the foreseeable future.

    Speed. Caveat: I have no profile data for gnome-shell yet, so I don't know how much time it spends doing CSS parsing and cascading, but it looks like the Rust version has a good chance of being more efficient.

    The selectors crate has some very interesting optimizations from Mozilla Servo, and it is also now used in Firefox. It supports doing selector matching using Bloom filters, and can also avoid re-cascading child nodes if a change to a parent would not cause its children to change.

    All the parsing is done with zero-copy parsers thanks to Rust's string slices; without so many malloc() calls in the parsing code path, the parsing stage should really fly.

    More CSS features. The selectors crate can do matching on basically all kinds of selectors as defined by recent CSS specs; one just has to provide the correct hooks into the calling code's representation of the DOM tree. The kind of matching that StTheme can do is somewhat limited; the rustification should make it match much more closely to what people expect from CSS engines in web browsers.

    A well-defined model of property inheritance. StThemeNode's model for CSS property inheritance is a bit ad-hoc and inconsistent. I haven't quite tested it, but from looking at the code, it seems that not all properties get inherited in the same way. I hope to move it to something closer to what librsvg already does, which should make it match people's expectations from the web.

    In the meantime

    I have a merge request ready to simply move the libcroco source code directly inside gnome-shell's source tree. This should let distros remove their libcroco package as soon as possible. That MR does not require Rust yet.

    My playground is here:

    This does not compile yet! I'll plug things together tomorrow.

    (Oh, yes, the project to redo Firefox's CSS stack in Rust used to be called Stylo. I'm calling this Stylish, as in Styles for the Shell.)

  2. Refactoring the Length type

    - librsvg, rust

    CSS length values have a number and a unit, e.g. 5cm or 6px. Sometimes the unit is a percentage, like 50%, and SVG says that lengths with percentage units should be resolved with respect to a certain rectangle. For example, consider this circle element:

    <circle cx="50%" cy="75%" r="4px" fill="black"/>
    

    This means, draw a solid black circle whose center is at 50% of the width and 75% of the height of the current viewport. The circle should have a 4-pixel radius.

    The process of converting that kind of units into absolute pixels for the final drawing is called normalization. In SVG, percentage units sometimes need to be normalized with respect to the current viewport (a local coordinate system), or with respect to the size of another object (e.g. when a clipping path is used to cut the current shape in half).

    One detail about normalization is that it can be with respect to the horizontal dimension of the current viewport, the vertical dimension, or both. Keep this in mind: at normalization time, we need to be able to distinguish between those three modes.

    The original C version

    I have talked about the original C code for lengths before; the following is a small summary.

    The original C code had this struct to represent lengths:

    typedef struct {
        double length;
        char factor;
    } RsvgLength;
    

    The parsing code would set the factor field to a character depending on the length's unit: 'p' for percentages, 'i' for inches, etc., and '\0' for the default unit, which is pixels.

    Along with that, the normalization code needed to know the direction (horizontal, vertical, both) to which the length in question refers. It did this by taking another character as an argument to the normalization function:

    double
    _rsvg_css_normalize_length (const RsvgLength * in, RsvgDrawingCtx * ctx, char dir)
    {
        if (in->factor == '\0')            /* pixels, no need to normalize */
            return in->length;
        else if (in->factor == 'p') {      /* percentages; need to consider direction */
            if (dir == 'h')                                     /* horizontal */
                return in->length * ctx->vb.rect.width;
            if (dir == 'v')                                     /* vertical */
                return in->length * ctx->vb.rect.height;
            if (dir == 'o')                                     /* both */
                return in->length * rsvg_viewport_percentage (ctx->vb.rect.width,
                                                              ctx->vb.rect.height);
        } else { ... }
    }
    

    The original post talks about how I found a couple of bugs with how the directions are identified at normalization time. The function above expects one of 'h'/'v'/'o' for horizontal/vertical/both, and one or two places in the code passed the wrong character.

    Making the C version cleaner

    Before converting that code to Rust, I removed the pesky characters and made the code use proper enums to identify a length's units.

    +typedef enum {
    +    LENGTH_UNIT_DEFAULT,
    +    LENGTH_UNIT_PERCENT,
    +    LENGTH_UNIT_FONT_EM,
    +    LENGTH_UNIT_FONT_EX,
    +    LENGTH_UNIT_INCH,
    +    LENGTH_UNIT_RELATIVE_LARGER,
    +    LENGTH_UNIT_RELATIVE_SMALLER
    +} LengthUnit;
    +
     typedef struct {
         double length;
    -    char factor;
    +    LengthUnit unit;
     } RsvgLength;
    

    Then, do the same for the normalization function, so it will get the direction in which to normalize as an enum instead of a char.

    +typedef enum {
    +    LENGTH_DIR_HORIZONTAL,
    +    LENGTH_DIR_VERTICAL,
    +    LENGTH_DIR_BOTH
    +} LengthDir;
    
     double
    -_rsvg_css_normalize_length (const RsvgLength * in, RsvgDrawingCtx * ctx, char dir)
    +_rsvg_css_normalize_length (const RsvgLength * in, RsvgDrawingCtx * ctx, LengthDir dir)
    

    Making the C version easier to get right

    While doing the last change above, I found a place in the code that used the wrong direction by mistake, probably due to a cut&paste error. Part of the problem here is that the code was specifying the direction at normalization time.

    I decided to change it so that each direction value carried its own direction since initialization, so that subsequent code wouldn't have to worry about that. Hopefully, initializing a width field should make it obvious that it needed LENGTH_DIR_HORIZONTAL.

     typedef struct {
         double length;
         LengthUnit unit;
    +    LengthDir dir;
     } RsvgLength;
    

    That is, so that instead of

      /* at initialization time */
      foo.width = _rsvg_css_parse_length (str);
    
      ...
    
      /* at rendering time */
      double final_width = _rsvg_css_normalize_length (&foo.width, ctx, LENGTH_DIR_HORIZONTAL);
    

    we would instead do this:

      /* at initialization time */
      foo.width = _rsvg_css_parse_length (str, LENGTH_DIR_HORIZONTAL);
    
      ...
    
      /* at rendering time */
      double final_width = _rsvg_css_normalize_length (&foo.width, ctx);
    

    This made the drawing code, which deals with a lot of coordinates at the same time, a lot less noisy.

    Initial port to Rust

    To recap, this was the state of the structs after the initial refactoring in C:

    typedef enum {
        LENGTH_UNIT_DEFAULT,
        LENGTH_UNIT_PERCENT,
        LENGTH_UNIT_FONT_EM,
        LENGTH_UNIT_FONT_EX,
        LENGTH_UNIT_INCH,
        LENGTH_UNIT_RELATIVE_LARGER,
        LENGTH_UNIT_RELATIVE_SMALLER
    } LengthUnit;
    
    typedef enum {
        LENGTH_DIR_HORIZONTAL,
        LENGTH_DIR_VERTICAL,
        LENGTH_DIR_BOTH
    } LengthDir;
    
    typedef struct {
        double length;
        LengthUnit unit;
        LengthDir dir;
    } RsvgLength;
    

    This ported to Rust in a straightforward fashion:

    pub enum LengthUnit {
        Default,
        Percent,
        FontEm,
        FontEx,
        Inch,
        RelativeLarger,
        RelativeSmaller
    }
    
    pub enum LengthDir {
        Horizontal,
        Vertical,
        Both
    }
    
    pub struct RsvgLength {
        length: f64,
        unit: LengthUnit,
        dir: LengthDir
    }
    

    It got a similar constructor that took the direction and produced an RsvgLength:

    impl RsvgLength {
        pub fn parse (string: &str, dir: LengthDir) -> RsvgLength { ... }
    }
    

    (This was before using Result; remember that the original C code did very little error checking!)

    The initial Parse trait

    It was at that point that it seemed convenient to introduce a Parse trait, which all CSS value types would implement to parse themselves from string.

    However, parsing an RsvgLength also needed an extra piece of data, the LengthDir. My initial version of the Parse trait had an associated called Data, through which one could pass an extra piece of data during parsing/initialization:

    pub trait Parse: Sized {
        type Data;
        type Err;
    
        fn parse (s: &str, data: Self::Data) -> Result<Self, Self::Err>;
    }
    

    This was explicitly to be able to pass a LengthDir to the parser for RsvgLength:

    impl Parse for RsvgLength {
        type Data = LengthDir;
        type Err = AttributeError;
    
        fn parse (string: &str, dir: LengthDir) -> Result <RsvgLength, AttributeError> { ... }
    }
    

    This was okay for lengths, but very noisy for everything else that didn't require an extra bit of data. In the rest of the code, the helper type was Data = () and there was a pair of extra parentheses () in every place that parse() was called.

    Removing the helper Data type

    Introducing one type per direction

    Over a year later, that () bit of data everywhere was driving me nuts. I started refactoring the Length module to remove it.

    First, I introduced three newtypes to wrap Length, and indicate their direction at the same time:

    pub struct LengthHorizontal(Length);
    pub struct LengthVertical(Length);
    pub struct LengthBoth(Length);
    

    This was done with a macro because now each wrapper type needed to know the relevant LengthDir.

    Now, for example, the declaration for the <circle> element looked like this:

    pub struct NodeCircle {
        cx: Cell<LengthHorizontal>,
        cy: Cell<LengthVertical>,
        r: Cell<LengthBoth>,
    }
    

    (Ignore the Cell everywhere; we got rid of that later.)

    Removing the dir field

    Since now the information about the length's direction is embodied in the LengthHorizontal/LengthVertical/LengthBoth types, this made it possible to remove the dir field from the inner Length struct.

     pub struct RsvgLength {
         length: f64,
         unit: LengthUnit,
    -    dir: LengthDir
     }
    
    +pub struct LengthHorizontal(Length);
    +pub struct LengthVertical(Length);
    +pub struct LengthBoth(Length);
    +
    +define_length_type!(LengthHorizontal, LengthDir::Horizontal);
    +define_length_type!(LengthVertical, LengthDir::Vertical);
    +define_length_type!(LengthBoth, LengthDir::Both);
    

    Note the use of a define_length_type! macro to generate code for those three newtypes.

    Removing the Data associated type

    And finally, this made it possible to remove the Data associated type from the Parse trait.

     pub trait Parse: Sized {
    -    type Data;
         type Err;
    
    -    fn parse(parser: &mut Parser<'_, '_>, data: Self::Data) -> Result<Self, Self::Err>;
    +    fn parse(parser: &mut Parser<'_, '_>) -> Result<Self, Self::Err>;
     }
    

    The resulting mega-commit removed a bunch of stray parentheses () from all calls to parse(), and the code ended up a lot easier to read.

    Removing the newtypes

    This was fine for a while. Recently, however, I figured out that it would be possible to embody the information for a length's direction in a different way.

    But to get there, I first needed a temporary refactor.

    Replacing the macro with a trait with a default implementation

    Deep in the guts of length.rs, the key function that does something different based on LengthDir is its scaling_factor method:

    enum LengthDir {
        Horizontal,
        Vertical,
        Both,
    }
    
    impl LengthDir {
        fn scaling_factor(self, x: f64, y: f64) -> f64 {
            match self {
                LengthDir::Horizontal => x,
                LengthDir::Vertical => y,
                LengthDir::Both => viewport_percentage(x, y),
            }
        }
    }
    

    That method gets passed, for example, the width/height of the current viewport for the x/y arguments. The method decides whether to use the width, height, or a combination of both.

    And of course, the interesting part of the define_length_type! macro was to generate code for calling LengthDir::Horizontal::scaling_factor()/etc. as appropriate depending on the LengthDir in question.

    First I made a trait called Orientation with a scaling_factor method, and three zero-sized types that implement that trait. Note how each of these three implementations corresponds to one of the match arms above:

    pub trait Orientation {
        fn scaling_factor(x: f64, y: f64) -> f64;
    }
    
    pub struct Horizontal;
    pub struct Vertical;
    pub struct Both;
    
    impl Orientation for Horizontal {
        fn scaling_factor(x: f64, _y: f64) -> f64 {
            x
        }
    }
    
    impl Orientation for Vertical {
        fn scaling_factor(_x: f64, y: f64) -> f64 {
            y
        }
    }
    
    impl Orientation for Both {
        fn scaling_factor(x: f64, y: f64) -> f64 {
            viewport_percentage(x, y)
        }
    }
    

    Now most of the contents of the define_length_type! macro can go in the default implementation of a new trait LengthTrait. Crucially, this trait has an Orientation associated type, which it uses to call into the Orientation trait:

    pub trait LengthTrait: Sized {
        type Orientation: Orientation;
    
        ...
    
        fn normalize(&self, values: &ComputedValues, params: &ViewParams) -> f64 {
            match self.unit() {
                LengthUnit::Px => self.length(),
    
                LengthUnit::Percent => {
                    self.length() *
                    <Self::Orientation>::scaling_factor(params.view_box_width, params.view_box_height)
                }
    
                ...
        }
    }
    

    Note that the incantation is <Self::Orientation>::scaling_factor(...) to call that method on the associated type.

    Now the define_length_type! macro is shrunk a lot, with the interesting part being just this:

    macro_rules! define_length_type {
        {$name:ident, $orient:ty} => {
            pub struct $name(Length);
    
            impl LengthTrait for $name {
                type Orientation = $orient;
            }
        }
    }
    
    define_length_type! { LengthHorizontal, Horizontal }
    define_length_type! { LengthVertical, Vertical }
    define_length_type! { LengthBoth, Both }
    

    We moved from having three newtypes of length-with-LengthDir to three newtypes with dir-as-associated-type.

    Removing the newtypes and the macro

    After that temporary refactoring, we had the Orientation trait and the three zero-sized types Horizontal, Vertical, Both.

    I figured out that one can use PhantomData as a way to carry around the type that Length needs to normalize itself, instead of using an associated type in an extra LengthTrait. Behold!

    pub struct Length<O: Orientation> {
        pub length: f64,
        pub unit: LengthUnit,
        orientation: PhantomData<O>,
    }
    
    impl<O: Orientation> Length<O> {
        pub fn normalize(&self, values: &ComputedValues, params: &ViewParams) -> f64 {
            match self.unit {
                LengthUnit::Px => self.length,
    
                LengthUnit::Percent => {
                    self.length 
                        * <O as Orientation>::scaling_factor(params.view_box_width, params.view_box_height)
                }
    
                ...
            }
        }
    }
    

    Now the incantation is <O as Orientation>::scaling_factor() to call the method on the generic type; it is no longer an associated type in a trait.

    With that, users of lengths look like this; here, our <circle> element from before:

    pub struct Circle {
        cx: Length<Horizontal>,
        cy: Length<Vertical>,
        r: Length<Both>,
    }
    

    I'm very happy with the readability of all the code now. I used to think of PhantomData as a way to deal with wrapping pointers from C, but it turns out that it is also useful to keep a generic type around should one need it.

    The final Length struct is this:

    pub struct Length<O: Orientation> {
        pub length: f64,
        pub unit: LengthUnit,
        orientation: PhantomData<O>,
    }
    

    And it only takes up as much space as its length and unit fields; PhantomData is zero-sized after all.

    (Later, we renamed Orientation to Normalize, but the code's structure remained the same.)

    Summary

    Over a couple of years, librsvg's type that represents CSS lengths went from a C representation along the lines of "all data in the world is an int", to a Rust representation that uses some interesting type trickery:

    • C struct with char for units.

    • C struct with a LengthUnits enum.

    • C struct without an embodied direction; each place that needs to normalize needs to get the orientation right.

    • C struct with a built-in direction as an extra field, done at initialization time.

    • Same struct but in Rust.

    • An ugly but workable Parse trait so that the direction can be set at parse/initialization time.

    • Three newtypes LengthHorizontal, LengthVertical, LengthBoth with a common core. A cleaned-up Parse trait. A macro to generate those newtypes.

    • Replace the LengthDir enum with an Orientation trait, and three zero-sized types Horizontal/Vertical/Both that implement the trait.

    • Replace most of the macro with a helper trait LengthTrait that has an Orientation associated type.

    • Replace the helper trait with a single Length<T: Orientation> type, which puts the orientation as a generic parameter. The macro disappears and there is a single implementation for everything.

    Refactoring never ends!

  3. CSS in librsvg is now in Rust, courtesy of Mozilla Servo

    - gnome, librsvg, rust

    Summary: after an epic amount of refactoring, librsvg now does all CSS parsing and matching in Rust, without using libcroco. In addition, the CSS engine comes from Mozilla Servo, so it should be able to handle much more complex CSS than librsvg ever could before.

    This is the story of CSS support in librsvg.

    Introduction

    The first commit to introduce CSS parsing in librsvg dates from 2002. It was as minimal as possible, written to support a small subset of what was then CSS2.

    Librsvg handled CSS stylesheets more "piecing them apart" than "parsing them". You know, when g_strsplit() is your best friend. The basic parsing algorithm was to turn a stylesheet like this:

    rect { fill: blue; }
    
    .classname {
        fill: green;
        stroke-width: 4;
    }
    

    Into a hash table whose keys are strings like rect and .classname, and whose values are everything inside curly braces.

    The selector matching phase was equally simple. The code only handled a few possible match types as follows. If it wanted to match a certain kind of CSS selector, it would say, "what would this selector look like in CSS syntax", it would make up a string with that syntax, and compare it to the key strings it had stored in the hash table from above.

    So, to match an element name selector, it would sprintf("%s", element->name), obtain something like rect and see if the hash table had such a key.

    To match a class selector, it would sprintf(".%s", element->class), obtain something like .classname, and look it up in the hash table.

    This scheme supported only a few combinations. It handled tag, .class, tag.class, and a few combinations with #id in them. This was enough to support very simple stylesheets.

    The value corresponding to each key in the hash table was the stuff between curly braces in the stylesheet, so the second rule from the example above would contain fill: green; stroke-width: 4;. Once librsvg decided that an SVG element matched that CSS rule, it would re-parse the string with the CSS properties and apply them to the element's style.

    I'm amazed that so little code was enough to deal with a good number of SVG files with stylesheets. I suspect that this was due to a few things:

    • While people were using complex CSS in HTML all the time, it was less common for SVG...

    • ... because CSS2 was somewhat new, and the SVG spec was still being written...

    • ... and SVGs created with illustration programs don't really use stylesheets; they include the full style information inside each element instead of symbolically referencing it from a stylesheet.

    From the kinds of bugs that librsvg has gotten around "CSS support is too limited", it feels like SVGs which use CSS features are either hand-written, or machine-generated from custom programs like data plotting software. Illustration programs tend to list all style properties explicitly in each SVG element, and don't use CSS.

    Libcroco appears

    The first commit to introduce libcroco was to do CSS parsing, from March 2003.

    At the same time, libcroco was introducing code to do CSS matching. However, this code never got used in librsvg; it still kept its simple string-based matcher. Maybe libcroco's API was not ready?

    Libcroco fell out of maintainership around the first half of 2005, and volunteers have kept fixing it since then.

    Problems with librsvg's string matcher for CSS

    The C implementation of CSS matching in librsvg remained basically untouched until 2018, when Paolo Borelli and I started porting the surrounding code to Rust.

    I had a lot of trouble figuring out the concepts from the code. I didn't know all the terminology of CSS implementations, and librsvg didn't use it, either.

    I think that librsvg's code suffered from what the refactoring literature calls primitive obsession. Instead of having a parsed representation of CSS selectors, librsvg just stored a stringified version of them. So, a selector like rect#classname really was stored with a string like that, instead of an actual decomposition into structs.

    Moreover, things were misnamed. This is the field that stored stylesheet data inside an RsvgHandle:

        GHashTable *css_props;
    

    From just looking at the field declaration, this doesn't tell me anything about what kind of data is stored there. One has to grep the source code for where that field is used:

    static void
    rsvg_css_define_style (RsvgHandle * ctx,
                           const gchar * selector,
                           const gchar * style_name,
                           const gchar * style_value,
                           gboolean important)
    {
        GHashTable *styles;
    
        styles = g_hash_table_lookup (ctx->priv->css_props, selector);
    

    Okay, it looks up a selector by name in the css_props, and it gives back... another hash table styles? What's in there?

            g_hash_table_insert (styles,
                                 g_strdup (style_name),
                                 style_value_data_new (style_value, important));
    

    Another string key called style_name, whose key is a StyleValueData; what's in it?

    typedef struct _StyleValueData {
        gchar *value;
        gboolean important;
    } StyleValueData;
    

    The value is another string. Strings all the way!

    At the time, I didn't really figure out what each level of nested hash tables was supposed to mean. I didn't understand why we handled style properties in a completely different part of the code, and yet this part had a css_props field that didn't seem to store properties at all.

    It took a while to realize that css_props was misnamed. It wasn't storing a mapping of selector names to properties; it was storing a mapping of selector names to declaration lists, which are lists of property/value pairs.

    So, when I started porting the CSS parsing code to Rust, I started to create real types with for each concept.

    // Maps property_name -> Declaration
    type DeclarationList = HashMap<String, Declaration>;
    
    pub struct CssStyles {
        selectors_to_declarations: HashMap<String, DeclarationList>,
    }
    

    Even though the keys of those HashMaps are still strings, because librsvg didn't have a better way to represent their corresponding concepts, at least those declarations let one see what the hell is being stored without grepping the rest of the code. This is a part of the code that I didn't really touch very much, so it was nice to have that reminder.

    The first port of the CSS matching code to Rust kept the same algorithm as the C code, the one that created strings with element.class and compared them to the stored selector names. Ugly, but it still worked in the same limited fashion.

    Rustifying the CSS parsers

    It turns out that CSS parsing is divided in two parts. One can have a style attribute inside an element, for example

    <rect x="0" y="0" width="100" height="100"
          style="fill: green; stroke: magenta; stroke-width: 4;"/>
    

    This is a plain declaration list which is not associated to any selectors, and which is applied directly to just the element in which it appears.

    Then, there is the <style> element itself, with a normal-looking CSS stylesheet

    <style type="text/css">
      rect {
        fill: green;
        stroke: magenta;
        stroke-width: 4;
      }
    </style>
    

    This means that all <rect> elements will get that style applied.

    I started to look for existing Rust crates to parse and handle CSS data. The cssparser and selectors crates come from Mozilla, so I thought they should do a pretty good job of things.

    And they do! Except that they are not a drop-in replacement for anything. They are what gets used in Mozilla's Servo browser engine, so they are optimized to hell, and the code can be pretty intimidating.

    Out of the box, cssparser provides a CSS tokenizer, but it does not know how to handle any properties/values in particular. One must use the tokenizer to implement a parser for each kind of CSS property one wants to support — Servo has mountains of code for all of HTML's style properties, and librsvg had to provide a smaller mountain of code for SVG style properties.

    Thus started the big task of porting librsvg's string-based parsers for CSS properties into ones based on cssparser tokens. Cssparser provides a Parser struct, which extracts tokens out of a CSS stream. Out of this, librsvg defines a Parse trait for parsable things:

    use cssparser::Parser;
    
    pub trait Parse: Sized {
        type Err;
    
        fn parse(parser: &mut Parser<'_, '_>) -> Result<Self, Self::Err>;
    }
    

    What's with those two default lifetimes in Parser<'_, '_>? Cssparser tries very hard to be a zero-copy tokenizer. One of the lifetimes refers to the input string which is wrapped in a Tokenizer, which is wrapped in a ParserInput. The other lifetime is for the ParserInput itself.

    In the actual implementation of that trait, the Err type also uses the lifetime that refers to the input string. For example, there is a BasicParseErrorKind::UnexpectedToken(Token<'i>), which one returns when there is an unexpected token. And to avoid copying the substring into the error, one returns a slice reference into the original string, thus the lifetime.

    I was more of a Rust newbie back then, and it was very hard to make sense of how cssparser was meant to be used.

    The process was more or less this:

    • Port the C parsers to Rust; implement types for each CSS property.

    • Port the &str-based parsers into ones that use cssparser.

    • Fix the error handling scheme to match what cssparser's high-level traits expect.

    This last point was... hard. Again, I wasn't comfortable enough with Rust lifetimes and nested generics; in the end it was all right.

    Moving declaration lists to Rust

    With the individual parsers for CSS properties done, and with them already using a different type for each property, the next thing was to implement cssparser's traits to parse declaration lists.

    Again, a declaration list looks like this:

    fill: blue;
    stroke-width: 4;
    

    It's essentially a key/value list.

    The trait that cssparser wants us to implement is this:

    pub trait DeclarationParser<'i> {
        type Declaration;
        type Error: 'i;
    
        fn parse_value<'t>(
            &mut self,
            name: CowRcStr<'i>,
            input: &mut Parser<'i, 't>,
        ) -> Result<Self::Declaration, ParseError<'i, Self::Error>>;
    }
    

    That is, define a type for a Declaration, and implement a parse_value() method that takes a name and a Parser, and outputs a Declaration or an error.

    What this really means is that the type you implement for Declaration needs to be able to represent all the CSS property types that you care about. Thus, a struct plus a big enum like this:

    pub struct Declaration {
        pub prop_name: String,
        pub property: ParsedProperty,
        pub important: bool,
    }
    
    pub enum ParsedProperty {
        BaselineShift(SpecifiedValue<BaselineShift>),
        ClipPath(SpecifiedValue<ClipPath>),
        ClipRule(SpecifiedValue<ClipRule>),
        Color(SpecifiedValue<Color>),
        ColorInterpolationFilters(SpecifiedValue<ColorInterpolationFilters>),
        Direction(SpecifiedValue<Direction>),
        ...
    }
    

    This gives us declaration lists (the stuff inside curly braces in a CSS stylesheet), but it doesn't give us qualified rules, which are composed of selector names plus a declaration list.

    Refactoring towards real CSS concepts

    Paolo Borelli has been steadily refactoring librsvg and fixing things like the primitive obsession I mentioned above. We now have real concepts like a Document, Stylesheet, QualifiedRule, Rule, AtRule.

    This refactoring took a long time, because it involved redoing the XML loading code and its interaction with the CSS parser a few times.

    Implementing traits from the selectors crate

    The selectors crate contains Servo's code for parsing CSS selectors and doing matching. However, it is extremely generic. Using it involves implementing a good number of concepts.

    For example, this SelectorImpl trait has no methods, and is just a collection of types that refer to your implementation of an element tree. How do you represent an attribute/value? How do you represent an identifier? How do you represent a namespace and a local name?

    pub trait SelectorImpl {
        type ExtraMatchingData: ...;
        type AttrValue: ...;
        type Identifier: ...;
        type ClassName: ...;
        type PartName: ...;
        type LocalName: ...;
        type NamespaceUrl: ...;
        type NamespacePrefix: ...;
        type BorrowedNamespaceUrl: ...;
        type BorrowedLocalName: ...;
        type NonTSPseudoClass: ...;
        type PseudoElement: ...;
    }
    

    A lot of those can be String, but Servo has smarter things in store. I ended up using the markup5ever crate, which provides a string interning framework for markup and XML concepts like a LocalName, a Namespace, etc. This reduces memory consumption, because instead of storing string copies of element names everywhere, one just stores tokens for interned strings.

    (In the meantime I had to implement support for XML namespaces, which the selectors code really wants, but which librsvg never supported.)

    Then, the selectors crate wants you to say how your code implements an element tree. It has a monster trait Element:

    pub trait Element {
        type Impl: SelectorImpl;
    
        fn opaque(&self) -> OpaqueElement;
    
        fn parent_element(&self) -> Option<Self>;
    
        fn parent_node_is_shadow_root(&self) -> bool;
    
        ...
    
        fn prev_sibling_element(&self) -> Option<Self>;
        fn next_sibling_element(&self) -> Option<Self>;
    
        fn has_local_name(
            &self, 
            local_name: &<Self::Impl as SelectorImpl>::BorrowedLocalName
        ) -> bool;
    
        fn has_id(
            &self,
            id: &<Self::Impl as SelectorImpl>::Identifier,
            case_sensitivity: CaseSensitivity,
        ) -> bool;
    
        ...
    }
    

    That is, when you provide an implementation of Element and SelectorImpl, the selectors crate will know how to navigate your element tree and ask it questions like, "does this element have the id #foo?"; "does this element have the name rect?". It makes perfect sense in the end, but it is quite intimidating when you are not 100% comfortable with webs of traits and associated types and generics with a bunch of trait bounds!

    I tried implementing that trait twice in the last year, and failed. It turns out that its API needed a key fix that landed last June, but I didn't notice until a couple of weeks ago.

    So?

    Two days ago, Paolo and I committed the last code to be able to completely replace libcroco.

    And, after implementing CSS specificity (which was easy now that we have real CSS concepts and a good pipeline for the CSS cascade), a bunch of very old bugs started falling down (1 2 3 4 5 6).

    Now it is going to be easy to implement things like letting the application specify a user stylesheet. In particular, this should let GTK remove the rather egregious hack it has to recolor SVG icons while using librsvg indirectly.

    Conclusion

    This will appear in librsvg 2.47.1 — that version will no longer require libcroco.

    As far as I know, the only module that still depends on libcroco (in GNOME or otherwise) is gnome-shell. It uses libcroco to parse CSS and get the basic structure of selectors so it can implement matching by hand.

    Gnome-shell has some code which looks awfully similar to what librsvg had when it was written in C:

    • StTheme has the high-level CSS stylesheet parser and the selector matching code.

    • StThemeNode has the low-level CSS property parsers.

    ... and it turns out that those files come all the way from HippoCanvas, the CSS-aware canvas that Mugshot used! Mugshot was a circa-2006 pre-Facebook aggregator for social media data like blogs, Flickr pictures, etc. HippoCanvas also got used in Sugar, the GUI for One Laptop Per Child. Yes, our code is that old.

    Libcroco is unmaintained, and has outstanding CVEs. I would be very happy to assist someone in porting gnome-shell's CSS code to Rust :)

  4. Gdk-pixbuf modules - call for help

    - gdk-pixbuf

    I've been doing a little refactoring of gdk-pixbuf's crufty code, to see if the gripes from my braindump can be solved. For things where it is not obvious how to proceed, I've started taking more detailed notes in a gdk-pixbuf survey.

    Today I was looking at which gdk-pixbuf modules are implemented by third parties, that is, which external projects provide their own image codecs pluggable into gdk-pixbuf.

    And there are not that many!

    The only four that I found are libheif, libopenraw, libwmf, librsvg (this last one, of course).

    Update 2019/Sep/12 - Added apng, exif-raw, psd, pvr, vtf, webp, xcf.

    All of those use the gdk-pixbuf module API in a remarkably similar fashion. Did they cut&paste each other's code? Did they do the simplest thing that didn't crash in gdk-pixbuf's checks for buggy loaders, which happens to be exactly what they do? Who knows! Either way, this makes future API changes in the modules a lot easier, since they all do the same right now.

    I'm trying to decide between these:

    • Keep modules as they are; find a way to sandbox them from gdk-pixbuf itself. This is hard because the API is "chatty"; modules and calling code go back and forth peeking at each other's structures.

    • Decide that third-party modules are only useful for thumbnailers; modify them to be thumbnailers instead of generic gdk-pixbuf modules. This would mean that those formats would stop working automatically in gdk-pixbuf based viewers like EOG.

    • Have "blessed" codecs inside gdk-pixbuf which are not modules so their no longer have API/ABI stability constraints. Keep third-party modules separate. Sandbox the internal ones with a non-chatty API.

    • If all third-party modules work indeed as I found, the module API can be simplified quite a lot since no third-party modules implement animations or saving. If so, simplify the module API and the gdk-pixbuf internals rather drastically.

    Do you know any other image formats which provide gdk-pixbuf modules? Mail me, please!

  5. On responsible vulnerability disclosure

    - kde, security

    Recently KDE had an unfortunate event. Someone found a vulnerability in the code that processes .desktop and .directory files, through which an attacker could create a malicious file that causes shell command execution (analysis). They went for immediate, full disclosure, where KDE didn't even get a chance of fixing the bug before it was published.

    There are many protocols for disclosing vulnerabilities in a coordinated, responsible fashion, but the gist of them is this:

    1. Someone finds a vulnerability in some software through studying some code, or some other mechanism.

    2. They report the vulnerability to the software's author through some private channel. For free softare in particular, researchers can use Openwall's recommended process for researchers, which includes notifying the author/maintainer and distros and security groups. Free software projects can follow a well-established process.

    3. The author and reporter agree on a deadline for releasing a public report of the vulnerability, or in semi-automated systems like Google Zero, a deadline is automatically established.

    4. The author works on fixing the vulnerability.

    5. The deadline is reached; the patch has been publically released, the appropriate people have been notified, systems have been patched. If there is no patch, the author and reporter can agree on postponing the date, or the reporter can publish the vulnerability report, thus creating public pressure for a fix.

    The steps above gloss over many practicalities and issues from the real world, but the idea is basically this: the author or maintainer of the software is given a chance to fix a security bug before information on the vulnerability is released to the hostile world. The idea is to keep harm from being done by not publishing unpatched vulnerabilities until there is a fix for them (... or until the deadline expires).

    What happened instead

    Around the beginning of July, the reporter posts about looking for bugs in KDE.

    On July 30, he posts a video with the proof of concept.

    On August 3, he makes a Twitter poll about what to do with the vulnerability.

    On August 4, he publishes the vulnerability.

    KDE is left with having to patch this in emergency mode. On August 7, KDE releases a security advisory in perfect form:

    • Description of exactly what causes the vulnerability.

    • Description of how it was solved.

    • Instructions on what to do for users of various versions of KDE libraries.

    • Links to easy-to-cherry-pick patches for distro vendors.

    Now, distro vendors are, in turn, in emergency mode, as they must apply the patch, run it through QA, release their own advisories, etc.

    What if this had been done with coordinated disclosure?

    The bug would have been fixed, probably in the same way, but it would not be in emergency mode. KDE's advisory contains this:

    Thanks to Dominik Penner for finding and documenting this issue (we wish however that he would have contacted us before making the issue public) and to David Faure for the fix.

    This is an extremely gracious way of thanking the reporter.

    I am not an infosec person...

    ... but some behaviors in the infosec sphere are deeply uncomfortable to me. I don't like it when security "research" is hard to tell from vandalism. "Excuse me, you left your car door unlocked" vs. "Hey everyone, this car is unlocked, have at it".

    I don't know the details of the discourse in the infosec sphere around full disclosure against irresponsible vendors of proprietary software or services. However, KDE is free software! There is no need to be an asshole to them.

  6. Constructors

    - GObject, rust

    Have you ever had these annoyances with GObject-style constructors?

    • From a constructor, calling a method on a partially-constructed object is dangerous.

    • A constructor needs to set up "not quite initialized" values in the instance struct until a construct-time property is set.

    • You actually need to override GObjectClass::constructed (or was it ::constructor?) to take care of construct-only properties which need to be considered together, not individually.

    • Constructors can't report an error, unless you derive from GInitable, which is not in gobject, but in gio instead. (Also, why does that force the constructor to take a GCancellable...?)

    • You need more than one constructor, but that needs to be done with helper functions.

    This article, Perils of Constructors, explains all of these problems very well. It is not centered on GObject, but rather on constructors in object-oriented languages in general.

    (Spoiler: Rust does not have constructors or partially-initialized structs, so these problems don't really exist there.)

    (Addendum: that makes it somewhat awkward to convert GObject code in C to Rust, but librsvg was able to solve it nicely with <buzzword>the typestate pattern</buzzword>.)

  7. Removing rsvg-view

    - librsvg

    I am preparing the 2.46.0 librsvg release. This will no longer have the rsvg-view-3 program.

    History of rsvg-view

    Rsvg-view started out as a 71-line C program to aid development of librsvg. It would just render an SVG file to a pixbuf, stick that pixbuf in a GtkImage widget, and show a window with that.

    Over time, it slowly acquired most of the command-line options that rsvg-convert supports. And I suppose, as a way of testing the Cairo-ification of librsvg, it also got the ability to print SVG files to a GtkPrintContext. At last count, it was a 784-line C program that is not really the best code in the world.

    What makes rsvg-view awkward?

    Rsvg-view requires GTK. But GTK requires librsvg, indirectly, through gdk-pixbuf! There is not a hard circular dependency because GTK goes, "gdk-pixbuf, load me this SVG file" without knowing how it will be loaded. In turn, gdk-pixbuf initializes the SVG loader provided by librsvg, and that loader reads/renders the SVG file.

    Ideally librsvg would only depend on gdk-pixbuf, so it would be able to provide the SVG loader.

    The rsvg-view source code still has a few calls to GTK functions which are now deprecated. The program emits GTK warnings during normal use.

    Rsvg-view is... not a very good SVG viewer. It doesn't even start up with the window scaled properly to the SVG's dimensions! If used for quick testing during development, it cannot even aid in viewing the transparent background regions which the SVG does not cover. It just sticks a lousy custom widget inside a GtkScrolledWindow, and does not have the conventional niceties to view images like zooming with the scroll wheel.

    EOG is a much better SVG viewer than rsvg-view, and people actually invest effort in making it pleasant to use.

    Removal of rsvg-view

    So, the next version of librsvg will not provide the rsvg-view-3 binary. Please update your packages accordingly. Distros may be able to move the compilation of librsvg to a more sensible place in the platform stack, now that it doesn't depend on GTK being available.

    What can you use instead? Any other image viewer. EOG works fine; there are dozens of other good viewers, too.

  8. Bzip2 1.0.7 is released

    - bzip2

    Bzip2 1.0.7 has been released by Mark Wielaard. We have a slight change of plans since my last post:

    • The 1.0.x series is in strict maintenance mode and will not change build systems. This is targeted towards embedded use, as in projects which already embed the bzip2-1.0.6 sources and undoubtedly patch the build system. Right now this series, and the tagged 1.0.7 release, live in the sourceware repository for bzip2.

    • The 1.1.x series has Meson and CMake build systems, and a couple of extra changes to modernize the C code but which were not fit for the 1.0.7 release. This is targeted towards operating system distributions. This lives in the master branch of the gitlab repository for bzip2.

    Distros and embedded users should start using bzip2-1.0.7 immediately. The patches they already have for the bzip2's traditional build system should still apply. The release includes bug fixes and security fixes that have accumulated over the years, including the new CVE-2019-12900.

    Once 1.1.0 is released, distributions should be able to remove their patches to the build system and just start using Meson or CMake. You may want to monitor the 1.1.0 milestone — help is appreciated fixing the issues there so we can make the first release with the new build systems!

  9. Preparing the bzip2-1.0.7 release

    - bzip2

    ATTENTION ALL DISTRIBUTIONS: this is for you. THE SONAME MAY CHANGE!

    I am preparing a bzip2-1.0.7 release. You can see the release notes, which should be of interest:

    • Many historical patches from various distributions are integrated now.

    • We have a new fix for the just-published CVE-2019-12900, courtesy of Albert Astals Cid.

    • Bzip2 has moved to Meson for its preferred build system, courtesy of Dylan Baker. For special situations, a CMake build system is also provided, courtesy of Micah Snyder.

    What's with the soname?

    From bzip2-1.0.1 (from the year 2000), until bzip2-1.0.6 (from 2010), release tarballs came with a special Makefile-libbz2_so to generate a shared library instead of a static one.

    This never used libtool or anything; it specified linker flags by hand. Various distributions either patched this special makefile, or replaced it by another one, or outright replaced the complete build system for a different one.

    Some things to note:

    • This hand-written Makefile-libbz2_so used a link line like $(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.6. This means, make the DT_SONAME field inside the ELF file be libbz2.so.1.0 (note the two digits in 1.0), and make the filename of the shared library be libbz2.so.1.0.6.

    • Fedora patched the soname in a patch called "saneso" to just be libbz2.so.1.

    • Stanislav Brabec, from openSUSE, replaced the hand-written makefiles with autotools, which meant using libtool. It has this interesting note:

    Incompatible changes:

    soname change. Libtool has no support for two parts soname suffix (e. g. libbz2.so.1.0). It must be a single number (e. g. libbz2.so.1). That is why soname must change. But I see not a big problem with it. Several distributions already use the new number instead of the non-standard number from Makefile-libbz2_so.

    (In fact, if I do objdump -x /usr/lib64/*.so | grep SONAME, I see that most libraries have single-digit sonames.)

    In my experience, both Fedora and openSUSE are very strict, and correct, about obscure things like library sonames.

    With the switch to Meson, bzip2 no longer uses libtool. It will have a single-digit soname — this is not in the meson.build yet, but expect it to be there within the next couple of days.

    I don't know what distros which decided to preserve the 1.0 soname will need to do; maybe they will need to patch meson.build on their own.

    Fortunately, the API/ABI are still exactly the same. You can preserve the old soname which your distro was using and linking libbz2 will probably keep working as usual.

    (This is a C-only release as usual. The Rust branch is still experimental.)

Page 1 / 7 »