Federico's Blog

  1. How glib-rs works, part 3: Boxed types

    - gnome, rust

    (First part of the series, with index to all the articles)

    Now let's get on and see how glib-rs handles boxed types.

    Boxed types?

    Let's say you are given a sealed cardboard box with something, but you can't know what's inside. You can just pass it on to someone else, or burn it. And since computers are magic duplication machines, you may want to copy the box and its contents... and maybe some day you will get around to opening it.

    That's a boxed type. You get a pointer to something, who knows what's inside. You can just pass it on to someone else, burn it — I mean, free it — or since computers are magic, copy the pointer and whatever it points to.

    That's exactly the API for boxed types.

    typedef gpointer (*GBoxedCopyFunc) (gpointer boxed);
    typedef void (*GBoxedFreeFunc) (gpointer boxed);
    
    GType g_boxed_type_register_static (const gchar   *name,
                                        GBoxedCopyFunc boxed_copy,
                                        GBoxedFreeFunc boxed_free);
    

    Simple copying, simple freeing

    Imagine you have a color...

    typedef struct {
        guchar r;
        guchar g;
        guchar b;
    } Color;
    

    If you had a pointer to a Color, how would you copy it? Easy:

    Color *copy_color (Color *a)
    {
        Color *b = g_new (Color, 1);
        *b = *a;
        return b;
    }
    

    That is, allocate a new Color, and essentially memcpy() the contents.

    And to free it? A simple g_free() works — there are no internal things that need to be freed individually.

    Complex copying, complex freeing

    And if we had a color with a name?

    typedef struct {
        guchar r;
        guchar g;
        guchar b;
        char *name;
    } ColorWithName;
    

    We can't just *a = *b here, as we actually need to copy the string name. Okay:

    ColorWithName *copy_color_with_name (ColorWithName *a)
    {
        ColorWithName *b = g_new (ColorWithName, 1);
        b->r = a->r;
        b->g = a->g;
        b->b = a->b;
        b->name = g_strdup (a->name);
        return b;
    }
    

    The corresponding free_color_with_name() would g_free(b->name) and then g_free(b), of course.

    Glib-rs and boxed types

    Let's look at this by parts. First, a BoxedMemoryManager trait to define the basic API to manage the memory of boxed types. This is what defines the copy and free functions, like above.

    pub trait BoxedMemoryManager<T>: 'static {
        unsafe fn copy(ptr: *const T) -> *mut T;
        unsafe fn free(ptr: *mut T);
    }
    

    Second, the actual representation of a Boxed type:

    pub struct Boxed<T: 'static, MM: BoxedMemoryManager<T>> {
        inner: AnyBox<T>,
        _dummy: PhantomData<MM>,
    }
    

    This struct is generic over T, the actual type that we will be wrapping, and MM, something which must implement the BoxedMemoryManager trait.

    Inside, it stores inner, an AnyBox, which we will see shortly. The _dummy: PhantomData<MM> is a Rust-ism to indicate that although this struct doesn't actually store a memory manager, it acts as if it does — it does not concern us here.

    The actual representation of boxed data

    Let's look at that AnyBox that is stored inside a Boxed:

    enum AnyBox<T> {
        Native(Box<T>),
        ForeignOwned(*mut T),
        ForeignBorrowed(*mut T),
    }
    

    We have three cases:

    • Native(Box<T>) - this boxed value T comes from Rust itself, so we know everything about it!

    • ForeignOwned(*mut T) - this boxed value T came from the outside, but we own it now. We will have to free it when we are done with it.

    • ForeignBorrowed(*mut T) - this boxed value T came from the outside, but we are just borrowing it temporarily: we don't want to free it when we are done with it.

    For example, if we look at the implementation of the Drop trait for the Boxed struct, we will indeed see that it calls the BoxedMemoryManager::free() only if we have a ForeignOwned value:

    impl<T: 'static, MM: BoxedMemoryManager<T>> Drop for Boxed<T, MM> {
        fn drop(&mut self) {
            unsafe {
                if let AnyBox::ForeignOwned(ptr) = self.inner {
                    MM::free(ptr);
                }
            }
        }
    }
    

    If we had a Native(Box<T>) value, it means it came from Rust itself, and Rust knows how to Drop its own Box<T> (i.e. a chunk of memory allocated in the heap).

    But for external resources, we must tell Rust how to manage them. Again: in the case where the Rust side owns the reference to the external boxed data, we have a ForeignOwned and Drop it by free()ing it; in the case where the Rust side is just borrowing the data temporarily, we have a ForeignBorrowed and don't touch it when we are done.

    Copying

    When do we have to copy a boxed value? For example, when we transfer from Rust to Glib with full transfer of ownership, i.e. the to_glib_full() pattern that we saw before. This is how that trait method is implemented for Boxed:

    impl<'a, T: 'static, MM: BoxedMemoryManager<T>> ToGlibPtr<'a, *const T> for Boxed<T, MM> {
        fn to_glib_full(&self) -> *const T {
            use self::AnyBox::*;
            let ptr = match self.inner {
                Native(ref b) => &**b as *const T,
                ForeignOwned(p) | ForeignBorrowed(p) => p as *const T,
            };
            unsafe { MM::copy(ptr) }
        }
    }
    

    See the MM:copy(ptr) in the last line? That's where the copy happens. The lines above just get the appropriate pointer to the data data from the AnyBox and cast it.

    There is extra boilerplate in boxed.rs which you can look at; it's mostly a bunch of trait implementations to copy the boxed data at the appropriate times (e.g. the FromGlibPtrNone trait), also an implementation of the Deref trait to get to the contents of a Boxed / AnyBox easily, etc. The trait implementations are there just to make it as convenient as possible to handle Boxed types.

    Who implements BoxedMemoryManager?

    Up to now, we have seen things like the implementation of Drop for Boxed, which uses BoxedMemoryManager::free(), and the implementation of ToGlibPtr which uses ::copy().

    But those are just the trait's "abstract" methods, so to speak. What actually implements them?

    Glib-rs has a general-purpose macro to wrap Glib types. It can wrap boxed types, shared pointer types, and GObjects. For now we will just look at boxed types.

    Glib-rs comes with a macro, glib_wrapper!(), that can be used in different ways. You can use it to automatically write the boilerplate for a boxed type like this:

    glib_wrapper! {
        pub struct Color(Boxed<ffi::Color>);
    
        match fn {
            copy => |ptr| ffi::color_copy(mut_override(ptr)),
            free => |ptr| ffi::color_free(ptr),
            get_type => || ffi::color_get_type(),
        }
    }
    

    This expands to an internal glib_boxed_wrapper!() macro that does a few things. We will only look at particularly interesting bits.

    First, the macro creates a newtype around a tuple with 1) the actual data type you want to box, and 2) a memory manager. In the example above, the newtype would be called Color, and it would wrap an ffi:Color (say, a C struct).

            pub struct $name(Boxed<$ffi_name, MemoryManager>);
    

    Aha! And that MemoryManager? The macro defines it as a zero-sized type:

            pub struct MemoryManager;
    

    Then it implements the BoxedMemoryManager trait for that MemoryManager struct:

            impl BoxedMemoryManager<$ffi_name> for MemoryManager {
                #[inline]
                unsafe fn copy($copy_arg: *const $ffi_name) -> *mut $ffi_name {
                    $copy_expr
                }
    
                #[inline]
                unsafe fn free($free_arg: *mut $ffi_name) {
                    $free_expr
                }
            }
    

    There! This is where the copy/free methods are implemented, based on the bits of code with which you invoked the macro. In the call to glib_wrapper!() we had this:

            copy => |ptr| ffi::color_copy(mut_override(ptr)),
            free => |ptr| ffi::color_free(ptr),
    

    In the impl aboe, the $copy_expr will expand to ffi::color_copy(mut_override(ptr)) and $free_expr will expand to ffi::color_free(ptr), which defines our implementation of a memory manager for our Color boxed type.

    Zero-sized what?

    Within the macro's definition, let's look again at the definitions of our boxed type and the memory manager object that actually implements the BoxedMemoryManager trait. Here is what the macro would expand to with our Color example:

            pub struct Color(Boxed<ffi::Color, MemoryManager>);
    
            pub struct MemoryManager;
    
            impl BoxedMemoryManager<ffi::Color> for MemoryManager {
                unsafe fn copy(...) -> *mut ffi::Color { ... }
                unsafe fn free(...) { ... }
            }
    

    Here, MemoryManager is a zero-sized type. This means it doesn't take up any space in the Color tuple! When a Color is allocated in the heap, it is really as if it contained an ffi::Color (the C struct we are wrapping) and nothing else.

    All the knowledge about how to copy/free ffi::Color lives only in the compiler thanks to the trait implementation. When the compiler expands all the macros and monomorphizes all the generic functions, the calls to ffi::color_copy() and ffi::color_free() will be inlined at the appropriate spots. There is no need to have auxiliary structures taking up space in the heap, just to store function pointers to the copy/free functions, or anything like that.

    Next up

    You may have seen that our example call to glib_wrapper!() also passed in a ffi::color_get_type() function. We haven't talked about how glib-rs wraps Glib's GType, GValue, and all of that. We are getting closer and closer to being able to wrap GObject.

    Stay tuned!

  2. Initial posts about librsvg's C to Rust conversion

    - librsvg, rust

    The initial articles about librsvg's conversion to Rust are in my old blog, so they may be a bit hard to find from this new blog. Here is a list of those posts, just so they are easier to find:

    Within this new blog, you can look for articles with the librsvg tag.

  3. The Magic of GObject Introspection

    - gnome, gobject-introspection, rust

    Before continuing with the glib-rs architecture, let's take a detour and look at GObject Introspection. Although it can seem like an obscure part of the GNOME platform, it is an absolutely vital part of it: it is what lets people write GNOME applications in any language.

    Let's start with a bit of history.

    Brief history of language bindings in GNOME

    When we started GNOME in 1997, we didn't want to write all of it in C. We had some inspiration from elsewhere.

    Prehistory: GIMP and the Procedural Database

    There was already good precedent for software written in a combination of programming languages. Emacs, the flagship text editor of the GNU project, was written with a relatively small core in C, and the majority of the program in Emacs Lisp.

    In similar fashion, we were very influenced by the design of the GIMP, which was very innovative at that time. The GIMP has a large core written in C. However, it supports plug-ins or scripts written in a variety of languages. Initially the only scripting language available for the GIMP was Scheme.

    The GIMP's plug-ins and scripts run as separate processes, so they don't have immediate access to the data of the image being edited, or to the core functions of the program like "paint with a brush at this location". To let plug-ins and scripts access these data and these functions, the GIMP has what it calls a Procedural Database (PDB). This is a list of functions that the core program or plug-ins wish to export. For example, there are functions like gimp-scale-image and gimp-move-layer. Once these functions are registered in the PDB, any part of the program or plug-ins can call them. Scripts are often written to automate common tasks — for example, when one wants to adjust the contrast of photos and scale them in bulk. Scripts can call functions in the PDB easily, irrespective of the programming language they are written in.

    We wanted to write GNOME's core libraries in C, and write a similar Procedural Database to allow those libraries to be called from any programming language. Eventually it turned out that a PDB was not necessary, and there were better ways to go about enabling different programming languages.

    Enabling sane memory management

    GTK+ started out with a very simple scheme for memory management: a container owned its child widgets, and so on recursively. When you freed a container, it would be responsible for freeing its children.

    However, consider what happens when a widget needs to hold a reference to another widget that is not one of its children. For example, a GtkLabel with an underlined mnemonic ("_N_ame:") needs to have a reference to the GtkEntry that should be focused when you press Alt-N. In the very earliest versions of GTK+, how to do this was undefined: C programmers were already used to having shared pointers everywhere, and they were used to being responsible for managing their memory.

    Of course, this was prone to bugs. If you have something like

    typedef struct {
        GtkWidget parent;
    
        char *label_string;
        GtkWidget *widget_to_focus;
    } GtkLabel;
    

    then if you are writing the destructor, you may simply want to

    static void
    gtk_label_free (GtkLabel *label)
    {
        g_free (label_string);
        gtk_widget_free (widget_to_focus);          /* oops, we don't own this */
    
        free_parent_instance (&label->parent);
    }
    

    Say you have a GtkBox with the label and its associated GtkEntry. Then, freeing the GtkBox would recursively free the label with that gtk_label_free(), and then the entry with its own function. But by the time the entry gets freed, the line gtk_widget_free (widget_to_focus) has already freed the entry, and we get a double-free bug!

    Madness!

    That is, we had no idea what we were doing. Or rather, our understanding of widgets had not evolved to the point of acknowledging that a widget tree is not a simply tree, but rather a directed graph of container-child relationships, plus random-widget-to-random-widget relationships. And of course, other parts of the program which are not even widget implementations may need to keep references to widgets and free them or not as appropriate.

    I think Marius Vollmer was the first person to start formalizing this. He came from the world of GNU Guile, a Scheme interpreter, and so he already knew how garbage collection and seas of shared references ought to work.

    Marius implemented reference-counting for GTK+ — that's where gtk_object_ref() and gtk_object_unref() come from; they eventually got moved to the base GObject class, so we now have g_object_ref() and g_object_unref() and a host of functions to have weak references, notification of destruction, and all the things required to keep garbage collectors happy.

    The first language bindings

    The very first language bindings were written by hand. The GTK+ API was small, and it seemed feasible to take

    void gtk_widget_show (GtkWidget *widget);
    void gtk_widget_hide (GtkWidget *widget);
    
    void gtk_container_add (GtkContainer *container, GtkWidget *child);
    void gtk_container_remove (GtkContainer *container, GtkWidget *child);
    

    and just wrap those functions in various languages, by hand, on an as-needed basis.

    Of course, there is a lot of duplication when doing things that way. As the C API grows, one needs to do more and more manual work to keep up with it.

    Also, C structs with public fields are problematic. If we had

    typedef struct {
        guchar r;
        guchar g;
        guchar b;
    } GdkColor;
    

    and we expect program code to fill in a GdkColor by hand and pass it to a drawing function like

    void gdk_set_foreground_color (GdkDrawingContext *gc, GdkColor *color);
    

    then it is no problem to do that in C:

    GdkColor magenta = { 255, 0, 255 };
    
    gdk_set_foreground_color (gc, &magenta);
    

    But to do that in a high level language? You don't have access to C struct fields! And back then, libffi wasn't generally available.

    Authors of language bindings had to write some glue code, in C, by hand, to let people access a C struct and then pass it on to GTK+. For example, for Python, they would need to write something like

    PyObject *
    make_wrapped_gdk_color (PyObject *args, PyObject *kwargs)
    {
        GdkColor *g_color;
        PyObject *py_color;
    
        g_color = g_new (GdkColor, 1);
        /* ... fill in g_color->r, g, b from the Python args */
    
        py_color = wrap_g_color (g_color);
        return py_color;
    }
    

    Writing that by hand is an incredible amount of drudgery.

    What language bindings needed was a description of the API in a machine-readable format, so that the glue code could be written by a code generator.

    The first API descriptions

    I don't remember if it was the GNU Guile people, or the PyGTK people, who started to write descriptions of the GNOME API by hand. For ease of parsing, it was done in a Scheme-like dialect. A description may look like

    (class GtkWidget
           ;;; void gtk_widget_show (GtkWidget *widget);
           (method show
                   (args nil)
                   (retval nil))
    
           ;;; void gtk_widget_hide (GtkWidget *widget);
           (method hide
                   (args nil)
                   (retval nil)))
    
    (class GtkContainer
           ;;; void gtk_container_add (GtkContainer *container, GtkWidget *child);
           (method add
                   (args GtkWidget)
                   (retval nil)))
    
    (struct GdkColor
            (field r (type 'guchar))
            (field g (type 'guchar))
            (field b (type 'guchar))) 
    

    Again, writing those descriptions by hand (and keeping up with the C API) was a lot of work, but the glue code to implement the binding could be done mostly automatically. The generated code may need subsequent tweaks by hand to deal with details that the Scheme-like descriptions didn't contemplate, but it was better than writing everything by hand.

    Glib gets a real type system

    Tim Janik took over the parts of Glib that implement objects/signals/types, and added a lot of things to create a good type system for C. This is where things like GType, GValue, GParamSpec, and fundamental types come from.

    For example, a GType is an identifier for a type, and a GValue is a type plus, well, a value of that type. You can ask a GValue, "are you an int? are you a GObject?".

    You can register new types: for example, there would be code in Gdk that registers a new GType for GdkColor, so you can ask a value, "are you a color?".

    Registering a type involves telling the GObject system things like how to copy values of that type, and how to free them. For GdkColor this may be just g_new() / g_free(); for reference-counted objects it may be g_object_ref() / g_object_unref().

    Objects can be queried about some of their properties

    A widget can tell you when you press a mouse button mouse on it: it will emit the button-press-event signal. When GtkWidget's implementation registers this signal, it calls something like

        g_signal_new ("button-press-event",
            gtk_widget_get_type(), /* type of object for which this signal is being created */
            ...
            G_TYPE_BOOLEAN,  /* type of return value */
            1,               /* number of arguments */
            GDK_TYPE_EVENT); /* type of first and only argument */
    

    This tells GObject that GtkWidget will have a signal called button-press-event, with a return type of G_TYPE_BOOLEAN, and with a single argument of type GDK_TYPE_EVENT. This lets GObject do the appropriate marshalling of arguments when the signal is emitted.

    But also! You can query the signal for its argument types! You can run g_signal_query(), which will then tell you all the details of the signal: its name, return type, argument types, etc. A language binding could run g_signal_query() and generate a description of the signal automatically to the Scheme-like description language. And then generate the binding from that.

    Not all of an object's properties can be queried

    Unfortunately, although GObject signals and properties can be queried, methods can't be. C doesn't have classes with methods, and GObject does not really have any provisions to implement them.

    Conventionally, for a static method one would just do

    void
    gtk_widget_set_flags (GtkWidget *widget, GtkWidgetFlags flags)
    {
        /* modify a struct field within "widget" or whatever */
        /* repaint or something */
    }
    

    And for a virtual method one would put a function pointer in the class structure, and provide a convenient way to call it:

    typedef struct {
        GtkObjectClass parent_class;
    
        void (* draw) (GtkWidget *widget, cairo_t *cr);
    } GtkWidgetClass;
    
    void
    gtk_widget_draw (GtkWidget *widget, cairo_t *cr)
    {
        GtkWidgetClass *klass = find_widget_class (widget);
    
        (* klass->draw) (widget, cr);
    }
    

    And GObject has no idea about this method — there is no way to query it; it just exists in C-space.

    Now, historically, GTK+'s header files have been written in a very consistent style. It is quite possible to write a tool that will take a header file like

    /* gtkwidget.h */
    typedef struct {
        GtkObject parent_class;
    
        void (* draw) (GtkWidget *widget, cairo_t *cr);
    } GtkWidgetClass;
    
    void gtk_widget_set_flags (GtkWidget *widget, GtkWidgetFlags flags);
    void gtk_widget_draw (GtkWidget *widget, cairo_t *cr);
    

    and parse it, even if it is with a simple parser that does not completely understand the C language, and have heuristics like

    • Is there a class_name_foo() function prototype with no corresponding foo field in the Class structure? It's probably a static method.

    • Is there a class_name_bar() function with a bar field in the Class structure? It's probably a virtual method.

    • Etc.

    And in fact, that's what we had. C header files would get parsed with those heuristics, and the Scheme-like description files would get generated.

    Scheme-like descriptions get reused, kind of

    Language binding authors started reusing the Scheme-like descriptions. Sometimes they would cannibalize the descriptions from PyGTK, or Guile (again, I don't remember where the canonical version was maintained) and use them as they were.

    Other times they would copy the files, modify them by hand some more, and then use them to generate their language binding.

    C being hostile

    From just reading/parsing a C function prototype, you cannot know certain things. If one function argument is of type Foo *, does it mean:

    • the function gets a pointer to something which it should not modify ("in" parameter)

    • the function gets a pointer to uninitialized data which it will set ("out" parameter)

    • the function gets a pointer to initialized data which it will use and modify ("inout" parameter)

    • the function will copy that pointer and hold a reference to the pointed data, and not free it when it's done

    • the function will take over the ownership of the pointed data, and free it when it's done

    • etc.

    Sometimes people would include these annotations in the Scheme-like description language. But wouldn't it be better if those annotations came from the C code itself?

    GObject Introspection appears

    For GNOME 3, we wanted a unified solution for language bindings:

    • Have a single way to extract the machine-readable descriptions of the C API.

    • Have every language binding be automatically generated from those descriptions.

    • In the descriptions, have all the information necessary to generate a correct language binding...

    • ... including documentation.

    We had to do a lot of work to accomplish this. For example:

    • Remove C-isms from the public API. Varargs functions, those that have foo (int x, ...), can't be easily described and called from other languages. Instead, have something like foov (int x, int num_args, GValue *args_array) that can be easily consumed by other languages.

    • Add annotations throughout the code so that the ad-hoc C parser can know about in/out/inout arguments, and whether pointer arguments are borrowed references or a full transfership of ownership.

    • Take the in-line documentation comments and store them as part of the machine-readable description of the API.

    • When compiling a library, automatically do all the things like g_signal_query() and spit out machine-readable descriptions of those parts of the API.

    So, GObject Introspection is all of those things.

    Annotations

    If you have looked at the C code for a GNOME library, you may have seen something like this:

    /**
     * gtk_widget_get_parent:
     * @widget: a #GtkWidget
     *
     * Returns the parent container of @widget.
     *
     * Returns: (transfer none) (nullable): the parent container of @widget, or %NULL
     **/
    GtkWidget *
    gtk_widget_get_parent (GtkWidget *widget)
    {
        ...
    }
    

    See that "(transfer none) (nullable)" in the documentation comments? The (transfer none) means that the return value is a pointer whose ownership does not get transferred to the caller, i.e. the widget retains ownership. Finally, the (nullable) indicates that the function can return NULL, when the widget has no parent.

    A language binding will then use this information as follows:

    • It will not unref() the parent widget when it is done with it.

    • It will deal with a NULL pointer in a special way, instead of assuming that references are not null.

    Every now and then someone discovers a public function which is lacking an annotation of that sort — for GNOME's purposes this is a bug; fortunately, it is easy to add that annotation to the C sources and regenerate the machine-readable descriptions.

    Machine-readable descriptions, or repository files

    So, what do those machine-readable descriptions actually look like? They moved away from a Scheme-like language and got turned into XML, because early XXIst century.

    The machine-readable descriptions are called GObject Introspection Repository files, or GIR for short.

    Let's look at some parts of Gtk-3.0.gir, which your distro may put in /usr/share/gir-1.0/Gtk-3.0.gir.

    <repository version="1.2" ...>
    
      <namespace name="Gtk"
                 version="3.0"
                 shared-library="libgtk-3.so.0,libgdk-3.so.0"
                 c:identifier-prefixes="Gtk"
                 c:symbol-prefixes="gtk">
    

    For the toplevel "Gtk" namespace, this is what the .so library is called. All identifiers have "Gtk" or "gtk" prefixes.

    A class with methods and a signal

    Let's look at the description for GtkEntry...

        <class name="Entry"
               c:symbol-prefix="entry"
               c:type="GtkEntry"
               parent="Widget"
               glib:type-name="GtkEntry"
               glib:get-type="gtk_entry_get_type"
               glib:type-struct="EntryClass">
    
          <doc xml:space="preserve">The #GtkEntry widget is a single line text entry
    widget. A fairly large set of key bindings are supported
    by default. If the entered text is longer than the allocation
    ...
           </doc>
    

    This is the start of the description for GtkEntry. We already know that everything is prefixed with "Gtk", so the name is just given as "Entry". Its parent class is Widget and the function which registers it against the GObject type system is gtk_entry_get_type.

    Also, there are the toplevel documentation comments for the Entry class.

    Onwards!

          <implements name="Atk.ImplementorIface"/>
          <implements name="Buildable"/>
          <implements name="CellEditable"/>
          <implements name="Editable"/>
    

    GObject classes can implement various interfaces; this is the list that GtkEntry supports.

    Next, let's look at a single method:

          <method name="get_text" c:identifier="gtk_entry_get_text">
            <doc xml:space="preserve">Retrieves the contents of the entry widget. ... </doc>
    
            <return-value transfer-ownership="none">
              <type name="utf8" c:type="const gchar*"/>
            </return-value>
    
            <parameters>
              <instance-parameter name="entry" transfer-ownership="none">
                <type name="Entry" c:type="GtkEntry*"/>
              </instance-parameter>
            </parameters>
          </method>
    

    The method get_text and its corresponding C symbol. Its return value is an UTF-8 encoded string, and ownership of the memory for that string is not transferred to the caller.

    The method takes a single parameter which is the entry instance itself.

    Now, let's look at a signal:

          <glib:signal name="activate" when="last" action="1">
            <doc xml:space="preserve">The ::activate signal is emitted when the user hits
    the Enter key. ...</doc>
    
            <return-value transfer-ownership="none">
              <type name="none" c:type="void"/>
            </return-value>
          </glib:signal>
    
        </class>
    

    The "activate" signal takes no arguments, and has a return value of type void, i.e. no return value.

    A struct with public fields

    The following comes from Gdk-3.0.gir; it's the description for GdkRectangle.

        <record name="Rectangle"
                c:type="GdkRectangle"
                glib:type-name="GdkRectangle"
                glib:get-type="gdk_rectangle_get_type"
                c:symbol-prefix="rectangle">
    
          <field name="x" writable="1">
            <type name="gint" c:type="int"/>
          </field>
          <field name="y" writable="1">
            <type name="gint" c:type="int"/>
          </field>
          <field name="width" writable="1">
            <type name="gint" c:type="int"/>
          </field>
          <field name="height" writable="1">
            <type name="gint" c:type="int"/>
          </field>
    
        </record>
    

    So that's the x/y/width/height fields in the struct, in the same order as they are defined in the C code.

    And so on. The idea is for the whole API exported by a GObject library to be describable by that format. If something can't be described, it's a bug in the library, or a bug in the format.

    Making language bindings start up quickly: typelib files

    As we saw, the GIR files are the XML descriptions of GObject APIs. Dynamic languages like Python would prefer to generate the language binding on the fly, as needed, instead of pre-generating a huge binding.

    However, GTK+ is a big API: Gtk-3.0.gir is 7 MB of XML. Parsing all of that just to be able to generate gtk_widget_show() on the fly would be too slow. Also, there are GTK+'s dependencies: Atk, Gdk, Cairo, etc. You don't want to parse everything just to start up!

    So, we have an extra step that compiles the GIR files down to binary .typelib files. For example, /usr/lib64/girepository-1.0/Gtk-3.0.typelib is about 600 KB on my machine. Those files get mmap()ed for fast access, and can be shared between processes.

    How dynamic language bindings use typelib files

    GObject Introspection comes with a library that language binding implementors can use to consume those .typelib files. The libgirepository library has functions like "list all the classes available in this namespace", or "call this function with these values for arguments, and give me back the return value here".

    Internally, libgirepository uses libffi to actually call the C functions in the dynamically-linked libraries.

    So, when you write foo.py and do

    import gi
    gi.require_version('Gtk', '3.0')
    from gi.repository import Gtk
    win = Gtk.Window()
    

    what happens is that pygobject calls libgirepository to mmap() the .typelib, and sees that the constructor for Gtk.Window is a C function called gtk_window_new(). After seeing how that function wants to be called, it calls the function using libffi, wraps the result with a PyObject, and that's what you get on the Python side.

    Static languages

    A static language like Rust prefers to have the whole language binding pre-generated. This is what the various crates in gtk-rs do.

    The gir crate takes a .gir file (i.e. the XML descriptions) and does two things:

    • Reconstructs the C function prototypes and C struct declarations, but in a way Rust can understand them. This gets output to the sys crate.

    • Creates idiomatic Rust code for the language binding. This gets output to the various crates; for example, the gtk one.

    When reconstructing the C structs and prototypes, we get stuff like

    #[repr(C)]
    pub struct GtkWidget {
        pub parent_instance: gobject::GInitiallyUnowned,
        pub priv_: *mut GtkWidgetPrivate,
    }
    
    extern "C" {
        pub fn gtk_entry_new() -> *mut GtkWidget;
    }
    

    And the idiomatic bindings? Stay tuned!

  4. Librsvg's build infrastructure: Autotools and Rust

    - autotools, gnome, librsvg, rust

    Today I released librsvg 2.41.1, and it's a big release! Apart from all the Rust goodness, and the large number of bug fixes, I am very happy with the way the build system works these days. I've found it invaluable to have good examples of Autotools incantations to copy&paste, so hopefully this will be useful to someone else.

    There are some subtleties that a "good" autotools setup demands, and so far I think librsvg is doing well:

    • The configure script checks for cargo and rustc.

    • "make distcheck" works. This means that the build can be performed with builddir != srcdir, and also that make check runs the available tests and they all pass.

    • The rsvg_internals library is built with Rust, and our Makefile.am calls cargo build with the correct options. It is able to handle debug and release builds.

    • "make clean" cleans up the Rust build directories as well.

    • If you change a .rs file and type make, only the necessary stuff gets rebuilt.

    • Etcetera. I think librsvg feels like a normal autotool'ed library. Let's see how this is done.

    Librsvg's basic autotools setup

    Librsvg started out with a fairly traditional autotools setup with a configure.ac and Makefile.am. For historical reasons the .[ch] source files live in the toplevel librsvg/ directory, not in a src subdirectory or something like that.

    librsvg
    ├ configure.ac
    ├ Makefile.am
    ├ *.[ch]
    ├ src/
    ├ doc/
    ├ tests/
    └ win32/
    

    Adding Rust to the build

    The Rust source code lives in librsvg/rust; that's where Cargo.toml lives, and of course there is the conventional src subdirectory with the *.rs files.

    librsvg
    ├ configure.ac
    ├ Makefile.am
    ├ *.[ch]
    ├ src/
    ├ rust/         <--- this is new!
    │ ├ Cargo.toml
    │ └ src/
    ├ doc/
    ├ tests/
    └ win32/
    

    Detecting the presence of cargo and rustc in configure.ac

    This goes in configure.ac:

    AC_CHECK_PROG(CARGO, [cargo], [yes], [no])
    AS_IF(test x$CARGO = xno,
        AC_MSG_ERROR([cargo is required.  Please install the Rust toolchain from https://www.rust-lang.org/])
    )
    AC_CHECK_PROG(RUSTC, [rustc], [yes], [no])
    AS_IF(test x$RUSTC = xno,
        AC_MSG_ERROR([rustc is required.  Please install the Rust toolchain from https://www.rust-lang.org/])
    )
    

    These two try to execute cargo and rustc, respectively, and abort with an error message if they are not present.

    Supporting debug or release mode for the Rust build

    One can call cargo like "cargo build --release" to turn on expensive optimizations, or normally like just "cargo build" to build with debug information. That is, the latter is the default: if you don't pass any options, cargo does a debug build.

    Autotools and C compilers normally work a bit differently; one must call the configure script like "CFLAGS='-g -O0' ./configure" for a debug build, or "CFLAGS='-O2 -fomit-frame-pointer' ./configure" for a release build.

    Linux distros already have all the infrastructure to pass the appropriate CFLAGS to configure. We need to be able to pass the appropriate flag to Cargo. My main requirement for this was:

    • Distros shouldn't have to substantially change their RPM specfiles (or whatever) to accomodate the Rust build.
    • I assume that distros will want to make release builds by default.
    • I as a developer am comfortable with passing extra options to make debug builds on my machine.

    The scheme in librsvg lets you run "configure --enable-debug" to make it call a plain cargo build, or a plain "configure" to make it use cargo build --release instead. The CFLAGS are passed as usual through an environment variable. This way, distros don't have to change their packaging to keep on making release builds as usual.

    This goes in configure.ac:

    dnl Specify --enable-debug to make a development release.  By default,
    dnl we build in public release mode.
    
    AC_ARG_ENABLE(debug,
                  AC_HELP_STRING([--enable-debug],
                                 [Build Rust code with debugging information [default=no]]),
                  [debug_release=$enableval],
                  [debug_release=no])
    
    AC_MSG_CHECKING(whether to build Rust code with debugging information)
    if test "x$debug_release" = "xyes" ; then
        AC_MSG_RESULT(yes)
        RUST_TARGET_SUBDIR=debug
    else
        AC_MSG_RESULT(no)
        RUST_TARGET_SUBDIR=release
    fi
    AM_CONDITIONAL([DEBUG_RELEASE], [test "x$debug_release" = "xyes"])
    
    AC_SUBST([RUST_TARGET_SUBDIR])
    

    This defines an Automake conditional called DEBUG_RELEASE, which we will use in Makefile.am later.

    It also causes @RUST_TARGET_SUBDIR@ to be substituted in Makefile.am with either debug or release; we will see what these are about.

    Adding Rust source files

    The librsvg/rust/src directory has all the *.rs files, and cargo tracks their dependencies and whether they need to be rebuilt if one changes. However, since that directory is not tracked by make, it won't rebuild things if a Rust source file changes! So, we need to tell our Makefile.am about those files:

    RUST_SOURCES =                   \
            rust/build.rs            \
            rust/Cargo.toml          \
            rust/src/aspect_ratio.rs \
            rust/src/bbox.rs         \
            rust/src/cnode.rs        \
            rust/src/color.rs        \
            ...
    
    RUST_EXTRA +=                    \
            rust/Cargo.lock
    
    EXTRA_DIST += $(RUST_SOURCES) $(RUST_EXTRA)
    

    It's a bit unfortunate that the change tracking is duplicated in the Makefile, but we are already used to listing all the C source files in there, anyway.

    Most notably, the rust subdirectory is not listed in the SUBDIRS in Makefile.am, since there is no rust/Makefile at all!

    Cargo release or debug build?

    if DEBUG_RELEASE
    CARGO_RELEASE_ARGS=
    else
    CARGO_RELEASE_ARGS=--release
    endif
    

    We will call cargo build with that argument later.

    Verbose or quiet build?

    Librsvg uses AM_SILENT_RULES([yes]) in configure.ac. This lets you just run "make" for a quiet build, or "make V=1" to get the full command lines passed to the compiler. Cargo supports something similar, so let's add it to Makefile.am:

    CARGO_VERBOSE = $(cargo_verbose_$(V))
    cargo_verbose_ = $(cargo_verbose_$(AM_DEFAULT_VERBOSITY))
    cargo_verbose_0 =
    cargo_verbose_1 = --verbose
    

    This expands the V variable to empty, 0, or 1. The result of expanding that gives us the final command-line argument in the CARGO_VERBOSE variable.

    What's the filename of the library we are building?

    RUST_LIB=@abs_top_builddir@/rust/target/@RUST_TARGET_SUBDIR@/librsvg_internals.a
    

    Remember our @RUST_TARGET_SUBDIR@ from configure.ac? If you call plain "cargo build", it will put the binaries in rust/target/debug. But if you call "cargo build --release", it will put the binaries in rust/target/release.

    With the bit above, the RUST_LIB variable now has the correct path for the built library. The @abs_top_builddir@ makes it work when the build directory is not the same as the source directory.

    Okay, so how do we call cargo?

    @abs_top_builddir@/rust/target/@RUST_TARGET_SUBDIR@/librsvg_internals.a: $(RUST_SOURCES)
        cd $(top_srcdir)/rust && \
        CARGO_TARGET_DIR=@abs_top_builddir@/rust/target cargo build $(CARGO_VERBOSE) $(CARGO_RELEASE_ARGS)
    

    We make the funky library filename depend on $(RUST_SOURCES). That's what will cause make to rebuild the Rust library if one of the Rust source files changes.

    We override the CARGO_TARGET_DIR with Automake's preference, and call cargo build with the correct arguments.

    Linking into the main C library

    librsvg_@RSVG_API_MAJOR_VERSION@_la_LIBADD = \
            $(LIBRSVG_LIBS)                      \
            $(LIBM)                              \
            $(RUST_LIB)
    

    This expands our $(RUST_LIB) from above into our linker line, along with librsvg's other dependencies.

    make check

    This is our hook so that make check will cause cargo test to run:

    check-local:
            cd $(srcdir)/rust && \
            CARGO_TARGET_DIR=@abs_top_builddir@/rust/target cargo test
    

    make clean

    Same thing for make clean and cargo clean:

    clean-local:
            cd $(top_srcdir)/rust && \
            CARGO_TARGET_DIR=@abs_top_builddir@/rust/target cargo clean
    

    Vendoring dependencies

    Linux distros probably want Rust packages to come bundled with their dependencies, so that they can replace them later with newer/patched versions.

    Here is a hook so that make dist will cause cargo vendor to be run before making the tarball. That command will creates a rust/vendor directory with a copy of all the Rust crates that librsvg depends on.

    RUST_EXTRA += rust/cargo-vendor-config
    
    dist-hook:
        (cd $(distdir)/rust && \
        cargo vendor -q && \
        mkdir .cargo && \
        cp cargo-vendor-config .cargo/config)
    

    The tarball needs to have a rust/.cargo/config to know where to find the vendored sources (i.e. the embedded dependencies), but we don't want that in our development source tree. Instead, we generate it from a rust/cargo-vendor-config file in our source tree:

    # This is used after `cargo vendor` is run from `make dist`.
    #
    # In the distributed tarball, this file should end up in
    # rust/.cargo/config
    
    [source.crates-io]
    registry = 'https://github.com/rust-lang/crates.io-index'
    replace-with = 'vendored-sources'
    
    [source.vendored-sources]
    directory = './vendor'
    

    One last thing

    If you put this in your Cargo.toml, release binaries will be a lot smaller. This turns on link-time optimizations (LTO), which removes unused functions from the binary.

    [profile.release]
    lto = true
    

    Summary and thanks

    I think the above is some good boilerplate that you can put in your configure.ac / Makefile.am to integrate a Rust sub-library into your C code. It handles make-y things like make clean and make check; debug and release builds; verbose and quiet builds; builddir != srcdir; all the goodies.

    I think the only thing I'm missing is to check for the cargo-vendor binary. I'm not sure how to only check for that if I'm the one making tarballs... maybe an --enable-maintainer-mode flag?

    This would definitely not have been possible without prior work. Thanks to everyone who figured out Autotools before me, so I could cut&paste your goodies:

  5. How Glib-rs works, part 2: Transferring lists and arrays

    - gnome, rust

    (First part of the series, with index to all the articles)

    In the first part, we saw how glib-rs provides the FromGlib and ToGlib traits to let Rust code convert from/to Glib's simple types, like to convert from a Glib gboolean to a Rust bool and vice-versa. We also saw the special needs of strings; since they are passed by reference and are not copied as simple values, we can use FromGlibPtrNone and FromGlibPtrFull depending on what kind of ownership transfer we want, none for "just make it look like we are using a borrowed reference", or full for "I'll take over the data and free it when I'm done". Going the other way around, we can use ToGlibPtr and its methods to pass things from Rust to Glib.

    In this part, we'll see the tools that glib-rs provides to do conversions of more complex data types. We'll look at two cases:

    And one final case just in passing:

    Passing arrays from Glib to Rust

    We'll look at the case for transferring null-terminated arrays of strings, since it's an interesting one. There are other traits to convert from Glib arrays whose length is known, not implied with a NULL element, but for now we'll only look at arrays of strings.

    Null-terminated arrays of strings

    Look at this function for GtkAboutDialog:

    /**
     * gtk_about_dialog_add_credit_section:
     * @about: A #GtkAboutDialog
     * @section_name: The name of the section
     * @people: (array zero-terminated=1): The people who belong to that section
     * ...
     */
    void
    gtk_about_dialog_add_credit_section (GtkAboutDialog  *about,
                                         const gchar     *section_name,
                                         const gchar    **people)
    

    You would use this like

    const gchar *translators[] = {
        "Alice <alice@example.com>",
        "Bob <bob@example.com>",
        "Clara <clara@example.com>",
        NULL
    };
    
    gtk_about_dialog_add_credit_section (my_about_dialog, _("Translators"), translators);
    

    The function expects an array of gchar *, where the last element is a NULL. Instead of passing an explicit length for the array, it's done implicitly by requiring a NULL pointer after the last element. The gtk-doc annotation says (array zero-terminated=1). When we generate information for the GObject-Introspection Repository (GIR), this is what comes out:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    <method name="add_credit_section"
            c:identifier="gtk_about_dialog_add_credit_section"
            version="3.4">
      ..
        <parameter name="people" transfer-ownership="none">
          <doc xml:space="preserve">The people who belong to that section</doc>
          <array c:type="gchar**">
            <type name="utf8" c:type="gchar*"/>
          </array>
        </parameter>
    

    You can see the transfer-ownership="none" in line 5. This means that the function will not take ownership of the passed array; it will make its own copy instead. By convention, GIR assumes that arrays of strings are NULL-terminated, so there is no special annotation for that here. If we were implementing this function in Rust, how would we read that C array of UTF-8 strings and turn it into a Rust Vec<String> or something? Easy:

    let c_char_array: *mut *mut c_char = ...; // comes from Glib
    let rust_translators = FromGlibPtrContainer::from_glib_none(c_char_array);
    // rust_translators is a Vec<String>
    

    Let's look at how this bad boy is implemented.

    First stage: impl FromGlibPtrContainer for Vec<T>

    We want to go from a "*mut *mut c_char" (in C parlance, a "gchar **") to a Vec<String>. Indeed, there is an implementation of the FromGlibPtrContainer trait for Vecs here. These are the first few lines:

    impl <P: Ptr, PP: Ptr, T: FromGlibPtrArrayContainerAsVec<P, PP>> FromGlibPtrContainer<P, PP> for Vec<T> {
        unsafe fn from_glib_none(ptr: PP) -> Vec<T> {
            FromGlibPtrArrayContainerAsVec::from_glib_none_as_vec(ptr)
        }
    

    So... that from_glib_none() will return a Vec<T>, which is what we want. Let's look at the first few lines of FromGlibPtrArrayContainerAsVec:

    1
    2
    3
    4
        impl FromGlibPtrArrayContainerAsVec<$ffi_name, *mut $ffi_name> for $name {
            unsafe fn from_glib_none_as_vec(ptr: *mut $ffi_name) -> Vec<Self> {
                FromGlibContainerAsVec::from_glib_none_num_as_vec(ptr, c_ptr_array_len(ptr))
            }
    

    Aha! This is inside a macro, thus the $ffi_name garbage. It's done like that so the same trait can be implemented for const and mut pointers to c_char.

    See the call to c_ptr_array_len() in line 3? That's what figures out where the NULL pointer is at the end of the array: it figures out the array's length.

    Second stage: impl FromGlibContainerAsVec::from_glib_none_num_as_vec()

    Now that the length of the array is known, the implementation calls FromGlibContainerAsVec::from_glib_none_num_as_vec()

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
        impl FromGlibContainerAsVec<$ffi_name, *const $ffi_name> for $name {
            unsafe fn from_glib_none_num_as_vec(ptr: *const $ffi_name, num: usize) -> Vec<Self> {
                if num == 0 || ptr.is_null() {
                    return Vec::new();
                }
    
                let mut res = Vec::with_capacity(num);
                for i in 0..num {
                    res.push(from_glib_none(ptr::read(ptr.offset(i as isize)) as $ffi_name));
                }
                res
            }
    

    Lines 3/4: If the number of elements is zero, or the array is NULL, return an empty Vec.

    Line 7: Allocate a Vec of suitable size.

    Lines 8/9: For each of the pointers in the C array, call from_glib_none() to convert it from a *const c_char to a String, like we saw in the first part.

    Done! We started with a *mut *mut c_char or a *const *const c_char and ended up with a Vec<String>, which is what we wanted.

    Passing GLists to Rust

    Some functions don't give you an array; they give you a GList or GSList. There is an implementation of FromGlibPtrArrayContainerAsVec that understands GList:

    impl<T> FromGlibPtrArrayContainerAsVec<<T as GlibPtrDefault>::GlibType, *mut glib_ffi::GList> for T
    where T: GlibPtrDefault + FromGlibPtrNone<<T as GlibPtrDefault>::GlibType> + FromGlibPtrFull<<T as GlibPtrDefault>::GlibType> {
    
        unsafe fn from_glib_none_as_vec(ptr: *mut glib_ffi::GList) -> Vec<T> {
            let num = glib_ffi::g_list_length(ptr) as usize;
            FromGlibContainer::from_glib_none_num(ptr, num)
        }
    

    The impl declaration is pretty horrible, so just look at the method: from_glib_none_as_vec() takes in a GList, then calls g_list_length() on it, and finally calls FromGlibContainer::from_glib_none_num() with the length it computed.

    I have a Glib container and its length

    In turn, that from_glib_none_num() goes here:

    impl <P, PP: Ptr, T: FromGlibContainerAsVec<P, PP>> FromGlibContainer<P, PP> for Vec<T> {
        unsafe fn from_glib_none_num(ptr: PP, num: usize) -> Vec<T> {
            FromGlibContainerAsVec::from_glib_none_num_as_vec(ptr, num)
        }
    

    Okay, getting closer to the actual implementation.

    Give me a vector already

    Finally, we get to the function that walks the GList:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    impl<T> FromGlibContainerAsVec<<T as GlibPtrDefault>::GlibType, *mut glib_ffi::GList> for T
    where T: GlibPtrDefault + FromGlibPtrNone<<T as GlibPtrDefault>::GlibType> + FromGlibPtrFull<<T as GlibPtrDefault>::GlibType> {
    
        unsafe fn from_glib_none_num_as_vec(mut ptr: *mut glib_ffi::GList, num: usize) -> Vec<T> {
            if num == 0 || ptr.is_null() {
                return Vec::new()
            }
            let mut res = Vec::with_capacity(num);
            for _ in 0..num {
                let item_ptr: <T as GlibPtrDefault>::GlibType = Ptr::from((*ptr).data);
                if !item_ptr.is_null() {
                    res.push(from_glib_none(item_ptr));
                }
                ptr = (*ptr).next;
            }
            res
        }
    

    Again, ignore the horrible impl declaration and just look at from_glib_none_num_as_vec().

    Line 4: that function takes in a ptr to a GList, and a num with the list's length, which we already computed above.

    Line 5: Return an empty vector if we have an empty list.

    Line 8: Allocate a vector of suitable capacity.

    Line 9: For each element, convert it with from_glib_none() and push it to the array.

    Line 14: Walk to the next element in the list.

    Passing containers from Rust to Glib

    This post is getting a bit long, so I'll just mention this briefly. There is a trait ToGlibContainerFromSlice that takes a Rust slice, and can convert it to various Glib types.

    • To GSlist and GList. These have methods like to_glib_none_from_slice() and to_glib_full_from_slice()

    • To an array of fundamental types. Here, you can choose between to_glib_none_from_slice(), which gives you a Stash like we saw the last time. Or, you can use to_glib_full_from_slice(), which gives you back a g_malloc()ed array with copied items. Finally, to_glib_container_from_slice() gives you back a g_malloc()ed array of pointers to values rather than plain values themselves. Which function you choose depends on which C API you want to call.

    I hope this post gives you enough practice to be able to "follow the traits" for each of those if you want to look at the implementations.

    Next up

    Passing boxed types, like public structs.

    Passing reference-counted types.

    How glib-rs wraps GObjects.

  6. How Glib-rs works, part 1: Type conversions

    - gnome, rust

    During the GNOME+Rust hackfest in Mexico City, Niko Matsakis started the implementation of gnome-class, a procedural macro that will let people implement new GObject classes in Rust and export them to the world. Currently, if you want to write a new GObject (e.g. a new widget) and put it in a library so that it can be used from language bindings via GObject-Introspection, you have to do it in C. It would be nice to be able to do this in a safe language like Rust.

    How would it be done by hand?

    In a C implementation of a new GObject subclass, one calls things like g_type_register_static() and g_signal_new() by hand, while being careful to specify the correct GType for each value, and being super-careful about everything, as C demands.

    In Rust, one can in fact do exactly the same thing. You can call the same, low-level GObject and GType functions. You can use #[repr(C)]] for the instance and class structs that GObject will allocate for you, and which you then fill in.

    You can see an example of this in gst-plugins-rs. This is where it implements a Sink GObject, in Rust, by calling Glib functions by hand: struct declarations, class_init() function, registration of type and interfaces.

    How would it be done by a machine?

    That's what Niko's gnome-class is about. During the hackfest it got to the point of being able to generate the code to create a new GObject subclass, register it, and export functions for methods. The syntax is not finalized yet, but it looks something like this:

    gobject_gen! {
        class Counter {
            struct CounterPrivate {
                val: Cell<u32>
            }
    
            signal value_changed(&self);
    
            fn set_value(&self, v: u32) {
                let private = self.private();
                private.val.set(v);
                // private.emit_value_changed();
            }
    
            fn get_value(&self) -> u32 {
                let private = self.private();
                private.val.get()
            }
        }
    }
    

    I started adding support for declaring GObject signals — mainly being able to parse them from what goes inside gobject_gen!() — and then being able to call g_signal_newv() at the appropriate time during the class_init() implementation.

    Types in signals

    Creating a signal for a GObject class is basically like specifying a function prototype: the object will invoke a callback function with certain arguments and return value when the signal is emitted. For example, this is how GtkButton registers its button-press-event signal:

      button_press_event_id =
        g_signal_new (I_("button-press-event"),
                      ...
                      G_TYPE_BOOLEAN,    /* type of return value */
                      1,                 /* how many arguments? */
                      GDK_TYPE_EVENT);   /* type of first and only argument */
    

    g_signal_new() creates the signal and returns a signal id, an integer. Later, when the object wants to emit the signal, it uses that signal id like this:

    GtkEventButton event = ...;
    gboolean return_val;
    
    g_signal_emit (widget, button_press_event_id, 0, event, &return_val);
    

    In the nice gobject_gen!() macro, if I am going to have a signal declaration like

    signal button_press_event(&self, event: &ButtonPressEvent) -> bool;
    

    then I will need to be able to translate the type names for ButtonPressEvent and bool into something that g_signal_newv() will understand: I need the GType values for those. Fundamental types like gboolean get constants like G_TYPE_BOOLEAN. Types that are defined at runtime, like GDK_TYPE_EVENT, get GType values generated at runtime, too, when one registers the type with g_type_register_*().

    Rust type GType
    i32 G_TYPE_INT
    u32 G_TYPE_UINT
    bool G_TYPE_BOOLEAN
    etc. etc.

    Glib types in Rust

    How does glib-rs, the Rust binding to Glib and GObject, handle types?

    Going from Glib to Rust

    First we need a way to convert Glib's types to Rust, and vice-versa. There is a trait to convert simple Glib types into Rust types:

    pub trait FromGlib<T>: Sized {
        fn from_glib(val: T) -> Self;
    }
    

    This means, if I have a T which is a Glib type, this trait will give you a from_glib() function which will convert it to a Rust type which is Sized, i.e. a type whose size is known at compilation time.

    For example, this is how it is implemented for booleans:

    impl FromGlib<glib_ffi::gboolean> for bool {
        #[inline]
        fn from_glib(val: glib_ffi::gboolean) -> bool {
            !(val == glib_ffi::GFALSE)
        }
    }
    

    and you use it like this:

    let my_gboolean: glib_ffi::gboolean = g_some_function_that_returns_gboolean ();
    
    let my_rust_bool: bool = from_glib (my_gboolean);
    

    Booleans in glib and Rust have different sizes, and also different values. Glib's booleans use the C convention: 0 is false and anything else is true, while in Rust booleans are strictly false or true, and the size is undefined (with the current Rust ABI, it's one byte).

    Going from Rust to Glib

    And to go the other way around, from a Rust bool to a gboolean? There is this trait:

    pub trait ToGlib {
        type GlibType;
    
        fn to_glib(&self) -> Self::GlibType;
    }
    

    This means, if you have a Rust type that maps to a corresponding GlibType, this will give you a to_glib() function to do the conversion.

    This is the implementation for booleans:

    impl ToGlib for bool {
        type GlibType = glib_ffi::gboolean;
    
        #[inline]
        fn to_glib(&self) -> glib_ffi::gboolean {
            if *self { glib_ffi::GTRUE } else { glib_ffi::GFALSE }
        }
    }
    

    And it is used like this:

    let my_rust_bool: bool = true;
    
    g_some_function_that_takes_gboolean (my_rust_bool.to_glib ());
    

    (If you are thinking "a function call to marshal a boolean" — note how the functions are inlined, and the optimizer basically compiles them down to nothing.)

    Pointer types - from Glib to Rust

    That's all very nice for simple types like booleans and ints. Pointers to other objects are slightly more complicated.

    GObject-Introspection allows one to specify how pointer arguments to functions are handled by using a transfer specifier.

    (transfer none)

    For example, if you call gtk_window_set_title(window, "Hello"), you would expect the function to make its own copy of the "Hello" string. In Rust terms, you would be passing it a simple borrowed reference. GObject-Introspection (we'll abbreviate it as GI) calls this GI_TRANSFER_NOTHING, and it's specified by using (transfer none) in the documentation strings for function arguments or return values.

    The corresponding trait to bring in pointers from Glib to Rust, without taking ownership, is this. It's unsafe because it will be used to de-reference pointers that come from the wild west:

    pub trait FromGlibPtrNone<P: Ptr>: Sized {
        unsafe fn from_glib_none(ptr: P) -> Self;
    }
    

    And you use it via this generic function:

    #[inline]
    pub unsafe fn from_glib_none<P: Ptr, T: FromGlibPtrNone<P>>(ptr: P) -> T {
        FromGlibPtrNone::from_glib_none(ptr)
    }
    

    Let's look at how this works. Here is the FromGlibPtrNone trait implemented for strings.

    1
    2
    3
    4
    5
    6
    7
    impl FromGlibPtrNone<*const c_char> for String {
        #[inline]
        unsafe fn from_glib_none(ptr: *const c_char) -> Self {
            assert!(!ptr.is_null());
            String::from_utf8_lossy(CStr::from_ptr(ptr).to_bytes()).into_owned()
        }
    }
    

    Line 1: given a pointer to a c_char, the conversion to String...

    Line 4: check for NULL pointers

    Line 5: Use the CStr to wrap the C ptr, like we looked at last time, validate it as UTF-8 and copy the string for us.

    Unfortunately, there's a copy involved in the last step. It may be possible to use Cow<&str> there instead to avoid a copy if the char* from Glib is indeed valid UTF-8.

    (transfer full)

    And how about transferring ownership of the pointed-to value? There is this trait:

    pub trait FromGlibPtrFull<P: Ptr>: Sized {
        unsafe fn from_glib_full(ptr: P) -> Self;
    }
    

    And the implementation for strings is as follows. In Glib's scheme of things, "transferring ownership of a string" means that the recipient of the string must eventually g_free() it.

    1
    2
    3
    4
    5
    6
    7
    8
    impl FromGlibPtrFull<*const c_char> for String {
        #[inline]
        unsafe fn from_glib_full(ptr: *const c_char) -> Self {
            let res = from_glib_none(ptr);
            glib_ffi::g_free(ptr as *mut _);
            res
        }
    }
    

    Line 1: given a pointer to a c_char, the conversion to String...

    Line 4: Do the conversion with from_glib_none() with the trait we saw before, put it in res.

    Line 5: Call g_free() on the original C string.

    Line 6: Return the res, a Rust string which we own.

    Pointer types - from Rust to Glib

    Consider the case where you want to pass a String from Rust to a Glib function that takes a *const c_char — in C parlance, a char *, without the Glib function acquiring ownership of the string. For example, assume that the C version of gtk_window_set_title() is in the gtk_ffi module. You may want to call it like this:

    fn rust_binding_to_window_set_title(window: &Gtk::Window, title: &String) {
        gtk_ffi::gtk_window_set_title(..., make_c_string_from_rust_string(title));
    }
    

    Now, what would that make_c_string_from_rust_string() look like?

    • We have: a Rust String — UTF-8, known length, no nul terminator

    • We want: a *const char — nul-terminated UTF-8

    So, let's write this:

    1
    2
    3
    4
    5
    fn make_c_string_from_rust_string(s: &String) -> *const c_char {
        let cstr = CString::new(&s[..]).unwrap();
        let ptr = cstr.into_raw() as *const c_char;
        ptr
    }
    

    Line 1: Take in a &String; return a *const c_char.

    Line 2: Build a CString like we way a few days ago: this allocates a byte buffer with space for a nul terminator, and copies the string's bytes. We unwrap() for this simple example, because CString::new() will return an error if the String contained nul characters in the middle of the string, which C doesn't understand.

    Line 3: Call into_raw() to get a pointer to the byte buffer, and cast it to a *const c_char. We'll need to free this value later.

    But this kind of sucks, because we the have to use this function, pass the pointer to a C function, and then reconstitute the CString so it can free the byte buffer:

    let buf = make_c_string_from_rust_string(my_string);
    unsafe { c_function_that_takes_a_string(buf); }
    let _ = CString::from_raw(buf as *mut c_char);
    

    The solution that Glib-rs provides for this is very Rusty, and rather elegant.

    Stashes

    We want:

    • A temporary place to put a piece of data
    • A pointer to that buffer
    • Automatic memory management for both of those

    Glib-rs defines a Stash for this:

    1
    2
    3
    4
    5
    6
    pub struct Stash<'a,                                 // we have a lifetime
                     P: Copy,                            // the pointer must be copy-able
                     T: ?Sized + ToGlibPtr<'a, P>> (     // Type for the temporary place
        pub P,                                           // We store a pointer...
        pub <T as ToGlibPtr<'a, P>>::Storage             // ... to a piece of data with that lifetime ...
    );
    

    ... and the piece of data must be of of the associated type ToGlibPtr::Storage, which we will see shortly.

    This struct Stash goes along with the ToGlibPtr trait:

    pub trait ToGlibPtr<'a, P: Copy> {
        type Storage;
    
        fn to_glib_none(&'a self) -> Stash<'a, P, Self>;  // returns a Stash whose temporary storage
                                                          // has the lifetime of our original data
    }
    

    Let's unpack this by looking at the implementation of the "transfer a String to a C function while keeping ownership":

    1
    2
    3
    4
    5
    6
    7
    8
    9
    impl <'a> ToGlibPtr<'a, *const c_char> for String {
        type Storage = CString;
    
        #[inline]
        fn to_glib_none(&self) -> Stash<'a, *const c_char, String> {
            let tmp = CString::new(&self[..]).unwrap();
            Stash(tmp.as_ptr(), tmp)
        }
    }
    

    Line 1: We implement ToGlibPtr<'a *const c_char> for String, declaring the lifetime 'a for the Stash.

    Line 2: Our temporary storage is a CString.

    Line 6: Make a CString like before.

    Line 7: Create the Stash with a pointer to the CString's contents, and the CString itself.

    (transfer none)

    Now, we can use ".0" to extract the first field from our Stash, which is precisely the pointer we want to a byte buffer:

    let my_string = ...;
    unsafe { c_function_which_takes_a_string(my_string.to_glib_none().0); }
    

    Now Rust knows that the temporary buffer inside the Stash has the lifetime of my_string, and it will free it automatically when the string goes out of scope. If we can accept the .to_glib_none().0 incantation for "lending" pointers to C, this works perfectly.

    (transfer full)

    And for transferring ownership to the C function? The ToGlibPtr trait has another method:

    pub trait ToGlibPtr<'a, P: Copy> {
        ...
    
        fn to_glib_full(&self) -> P;
    }
    

    And here is the implementation for strings:

    impl <'a> ToGlibPtr<'a, *const c_char> for String {
        fn to_glib_full(&self) -> *const c_char {
            unsafe {
                glib_ffi::g_strndup(self.as_ptr() as *const c_char, 
                                    self.len() as size_t)
                    as *const c_char
            }
        }
    

    We basically g_strndup() the Rust string's contents from its byte buffer and its len(), and we can then pass this on to C. That code will be responsible for g_free()ing the C-side string.

    Next up

    Transferring lists and arrays. Stay tuned!

  7. Correctness in Rust: building strings

    - rust

    Rust tries to follow the "make illegal states unrepresentable" mantra in several ways. In this post I'll show several things related to the process of building strings, from bytes in memory, or from a file, or from char * things passed from C.

    Strings in Rust

    The easiest way to build a string is to do it directly at compile time:

    let my_string = "Hello, world!";
    

    In Rust, strings are UTF-8. Here, the compiler checks our string literal is valid UTF-8. If we try to be sneaky and insert an invalid character...

    let my_string = "Hello \xf0";
    

    We get a compiler error:

    error: this form of character escape may only be used with characters in the range [\x00-\x7f]
     --> foo.rs:2:30
      |
    2 |     let my_string = "Hello \xf0";
      |                              ^^
    

    Rust strings know their length, unlike C strings. They can contain a nul character in the middle, because they don't need a nul terminator at the end.

    let my_string = "Hello \x00 zero";
    println!("{}", my_string);
    

    The output is what you expect:

    $ ./foo | hexdump -C
    00000000  48 65 6c 6c 6f 20 00 20  7a 65 72 6f 0a           |Hello . zero.|
    0000000d                    ^ note the nul char here
    $
    

    So, to summarize, in Rust:

    • Strings are encoded in UTF-8
    • Strings know their length
    • Strings can have nul chars in the middle

    This is a bit different from C:

    • Strings don't exist!

    Okay, just kidding. In C:

    • A lot of software has standardized on UTF-8.
    • Strings don't know their length - a char * is a raw pointer to the beginning of the string.
    • Strings conventionally have a nul terminator, that is, a zero byte that marks the end of the string. Therefore, you can't have nul characters in the middle of strings.

    Building a string from bytes

    Let's say you have an array of bytes and want to make a string from them. Rust won't let you just cast the array, like C would. First you need to do UTF-8 validation. For example:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    fn convert_and_print(bytes: Vec<u8>) {
        let result = String::from_utf8(bytes);
        match result {
            Ok(string) => println!("{}", string),
            Err(e) => println!("{:?}", e)
        }
    }
    
    fn main() {
        convert_and_print(vec![0x48, 0x65, 0x6c, 0x6c, 0x6f]);
        convert_and_print(vec![0x48, 0x65, 0xf0, 0x6c, 0x6c, 0x6f]);
    }
    

    In lines 10 and 11, we call convert_and_print() with different arrays of bytes; the first one is valid UTF-8, and the second one isn't.

    Line 2 calls String::from_utf8(), which returns a Result, i.e. something with a success value or an error. In lines 3-5 we unpack this Result. If it's Ok, we print the converted string, which has been validated for UTF-8. Otherwise, we print the debug representation of the error.

    The program prints the following:

    $ ~/foo
    Hello
    FromUtf8Error { bytes: [72, 101, 240, 108, 108, 111], error: Utf8Error { valid_up_to: 2, error_len: Some(1) } }
    

    Here, in the error case, the Utf8Error tells us that the bytes are UTF-8 and are valid_up_to index 2; that is the first problematic index. We also get some extra information which lets the program know if the problematic sequence was incomplete and truncated at the end of the byte array, or if it's complete and in the middle.

    And for a "just make this printable, pls" API? We can use String::from_utf8_lossy(), which replaces invalid UTF-8 sequences with U+FFFD REPLACEMENT CHARACTER:

    fn convert_and_print(bytes: Vec<u8>) {
        let string = String::from_utf8_lossy(&bytes);
        println!("{}", string);
    }
    
    fn main() {
        convert_and_print(vec![0x48, 0x65, 0x6c, 0x6c, 0x6f]);
        convert_and_print(vec![0x48, 0x65, 0xf0, 0x6c, 0x6c, 0x6f]);
    }
    

    This prints the following:

    $ ~/foo
    Hello
    He�llo
    

    Reading from files into strings

    Now, let's assume you want to read chunks of a file and put them into strings. Let's go from the low-level parts up to the high level "read a line" API.

    Single bytes and single UTF-8 characters

    When you open a File, you get an object that implements the Read trait. In addition to the usual "read me some bytes" method, it can also give you back an iterator over bytes, or an iterator over UTF-8 characters.

    The Read.bytes() method gives you back a Bytes iterator, whose next() method returns Result<u8, io::Error>. When you ask the iterator for its next item, that Result means you'll get a byte out of it successfully, or an I/O error.

    In contrast, the Read.chars() method gives you back a Chars iterator, and its next() method returns Result<char, CharsError>, not io::Error. This extended CharsError has a NotUtf8 case, which you get back when next() tries to read the next UTF-8 sequence from the file and the file has invalid data. CharsError also has a case for normal I/O errors.

    Reading lines

    While you could build a UTF-8 string one character at a time, there are more efficient ways to do it.

    You can create a BufReader, a buffered reader, out of anything that implements the Read trait. BufReader has a convenient read_line() method, to which you pass a mutable String and it returns a Result<usize, io::Error> with either the number of bytes read, or an error.

    That method is declared in the BufRead trait, which BufReader implements. Why the separation? Because other concrete structs also implement BufRead, such as Cursor — a nice wrapper that lets you use a vector of bytes like an I/O Read or Write implementation, similar to GMemoryInputStream.

    If you prefer an iterator rather than the read_line() function, BufRead also gives you a lines() method, which gives you back a Lines iterator.

    In both cases — the read_line() method or the Lines iterator, the error that you can get back can be of ErrorKind::InvalidData, which indicates that there was an invalid UTF-8 sequence in the line to be read. It can also be a normal I/O error, of course.

    Summary so far

    There is no way to build a String, or a &str slice, from invalid UTF-8 data. All the methods that let you turn bytes into string-like things perform validation, and return a Result to let you know if your bytes validated correctly.

    The exceptions are in the unsafe methods, like String::from_utf8_unchecked(). You should really only use them if you are absolutely sure that your bytes were validated as UTF-8 beforehand.

    There is no way to bring in data from a file (or anything file-like, that implements the Read trait) and turn it into a String without going through functions that do UTF-8 validation. There is not an unsafe "read a line" API without validation — you would have to build one yourself, but the I/O hit is probably going to be slower than validating data in memory, anyway, so you may as well validate.

    C strings and Rust

    For unfortunate historical reasons, C flings around char * to mean different things. In the context of Glib, it can mean

    • A valid, nul-terminated UTF-8 sequence of bytes (a "normal string")
    • A nul-terminated file path, which has no meaningful encoding
    • A nul-terminated sequence of bytes, not validated as UTF-8.

    What a particular char * means depends on which API you got it from.

    Bringing a string from C to Rust

    From Rust's viewpoint, getting a raw char * from C (a "*const c_char" in Rust parlance) means that it gets a pointer to a buffer of unknown length.

    Now, that may not be entirely accurate:

    • You may indeed only have a pointer to a buffer of unknown length
    • You may have a pointer to a buffer, and also know its length (i.e. the offset at which the nul terminator is)

    The Rust standard library provides a CStr object, which means, "I have a pointer to an array of bytes, and I know its length, and I know the last byte is a nul".

    CStr provides an unsafe from_ptr() constructor which takes a raw pointer, and walks the memory to which it points until it finds a nul byte. You must give it a valid pointer, and you had better guarantee that there is a nul terminator, or CStr will walk until the end of your process' address space looking for one.

    Alternatively, if you know the length of your byte array, and you know that it has a nul byte at the end, you can call CStr::from_bytes_with_nul(). You pass it a &[u8] slice; the function will check that a) the last byte in that slice is indeed a nul, and b) there are no nul bytes in the middle.

    The unsafe version of this last function is unsafe CStr::from_bytes_with_nul_unchecked(): it also takes an &[u8] slice, but you must guarantee that the last byte is a nul and that there are no nul bytes in the middle.

    I really like that the Rust documentation tells you when functions are not "instantaneous" and must instead walks arrays, like to do validation or to look for the nul terminator above.

    Turning a CStr into a string-like

    Now, the above indicates that a CStr is a nul-terminated array of bytes. We have no idea what the bytes inside look like; we just know that they don't contain any other nul bytes.

    There is a CStr::to_str() method, which returns a Result<&str, Utf8Error>. It performs UTF-8 validation on the array of bytes. If the array is valid, the function just returns a slice of the validated bytes minus the nul terminator (i.e. just what you expect for a Rust string slice). Otherwise, it returns an Utf8Error with the details like we discussed before.

    There is also CStr::to_string_lossy() which does the replacement of invalid UTF-8 sequences like we discussed before.

    Conclusion

    Strings in Rust are UTF-8 encoded, they know their length, and they can have nul bytes in the middle.

    To build a string from raw bytes, you must go through functions that do UTF-8 validation and tell you if it failed. There are unsafe functions that let you skip validation, but then of course you are on your own.

    The low-level functions which read data from files operate on bytes. On top of those, there are convenience functions to read validated UTF-8 characters, lines, etc. All of these tell you when there was invalid UTF-8 or an I/O error.

    Rust lets you wrap a raw char * that you got from C into something that can later be validated and turned into a string. Anything that manipulates a raw pointer is unsafe; this includes the "wrap me this pointer into a C string abstraction" API, and the "build me an array of bytes from this raw pointer" API. Later, you can validate those as UTF-8 and build real Rust strings — or know if the validation failed.

    Rust builds these little "corridors" through the API so that illegal states are unrepresentable.

  8. GUADEC 2017 presentation

    - gnome, guadec, librsvg, rust, talks

    During GUADEC this year I gave a presentation called Replacing C library code with Rust: what I learned with librsvg. This is the PDF file; be sure to scroll past the full-page presentation pages until you reach the speaker's notes, especially for the code sections!

    Replacing C library code with Rust - link to PDF

    You can also get the ODP file for the presentation. This is released under a CC-BY-SA license.

    For the presentation, my daughter Luciana made some drawings of Ferris, the Rust mascot, also released under the same license:

    Ferris says hi Ferris busy at work Ferris makes a mess Ferris presents her work

  9. Surviving a rust-cssparser API break

    - gnome, librsvg, rust

    Yesterday I looked into updating librsvg's Rust dependencies. There have been some API breaks (!!!) in the unstable libraries that it uses since the last time I locked them. This post is about an interesting case of API breakage.

    rust-cssparser is the crate that Servo uses for parsing CSS. Well, more like tokenizing CSS: you give it a string, it gives you back tokens, and you are supposed to compose CSS selector information or other CSS values from the tokens.

    Librsvg uses rust-cssparser now for most of the micro-languages in SVG's attribute values, instead of its old, fragile C parsers. I hope to be able to use it in conjunction with Servo's rust-selectors crate to fully parse CSS data and replace libcroco.

    A few months ago, rust-cssparser's API looked more or less like the following. This is the old representation of a Token:

    pub enum Token<'a> {
        // an identifier
        Ident(Cow<'a, str>),
    
        // a plain number
        Number(NumericValue),
    
        // a percentage value normalized to [0.0, 1.0]
        Percentage(PercentageValue),
    
        WhiteSpace(&'a str),
        Comma,
    
        ...
    }
    

    That is, a Token can be an Identifier with a string name, or a Number, a Percentage, whitespace, a comma, and many others.

    On top of that is the old API for a Parser, which you construct with a string and then it gives you back tokens:

    impl<'i> Parser<'i> {
        pub fn new(input: &'i str) -> Parser<'i, 'i> {
    
        pub fn next(&mut self) -> Result<Token<'i>, ()> { ... }
    
        ...
    }
    

    This means the following. You create the parser out of a string slice with new(). You can then extract a Result with a Token sucessfully, or with an empty error value. The parser uses a lifetime 'i on the string from which it is constructed: the Tokens that return identifiers, for example, could return sub-string slices that come from the original string, and the parser has to be marked with a lifetime so that it does not outlive its underlying string.

    A few commits later, rust-cssparser got changed to return detailed error values, so that instead of () you get a a BasicParseError with sub-cases like UnexpectedToken or EndOfInput.

    After the changes to the error values for results, I didn't pay much attention to rust-cssparser for while. Yesterday I wanted to update librsvg to use the newest rust-cssparser, and had some interesting problems.

    First, Parser::new() was changed from taking just a &str slice to taking a ParserInput struct. This is an implementation detail which lets the parser cache the last token it saw. Not a big deal:

    // instead of constructing a parser like
    let mut parser = Parser::new (my_string);
    
    // you now construct it like
    let mut input = ParserInput::new (my_string);
    let mut parser = Parser::new (&mut input);
    

    I am not completely sure why this is exposed to the public API, since Rust won't allow you to have two mutable references to a ParserInput, and the only consumer of a (mutable) ParserInput is the Parser, anyway.

    However, the parser.next() function changed:

    // old version
    pub fn next(&mut self) -> Result<Token<'i>, ()> { ... }
    
    // new version
    pub fn next(&mut self) -> Result<&Token<'i>, BasicParseError<'i>> {... }
    // note this bad boy here -------^
    

    The successful Result from next() is now a reference to a Token, not a plain Token value which you now own. The parser is giving you a borrowed reference to its internally-cached token.

    My parsing functions for the old API looked similar to the following. This is a function that parses a string into an angle; it can look like "45deg" or "1.5rad", for example.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    pub fn parse_angle_degrees (s: &str) -> Result <f64, ParseError> {
        let mut parser = Parser::new (s);
    
        let token = parser.next ()
            .map_err (|_| ParseError::new ("expected angle"))?;
    
        match token {
            Token::Number (NumericValue { value, .. }) => Ok (value as f64),
    
            Token::Dimension (NumericValue { value, .. }, unit) => {
                let value = value as f64;
    
                match unit.as_ref () {
                    "deg"  => Ok (value),
                    "grad" => Ok (value * 360.0 / 400.0),
                    "rad"  => Ok (value * 180.0 / PI),
                    _      => Err (ParseError::new ("expected angle"))
                }
            },
    
            _ => Err (ParseError::new ("expected angle"))
        }.and_then (|r|
                    parser.expect_exhausted ()
                    .map (|_| r)
                    .map_err (|_| ParseError::new ("expected angle")))
    }
    

    This is a bit ugly, but it was the first version that passed the tests. Lines 4 and 5 mean, "get the first token or return an error". Line 17 means, "anything except deg, grad, or rad for the units causes the match expression to generate an error". Finally, I was feeling very proud of using and_then() in line 22, with parser.expect_exhausted(), to ensure that the parser would not find any more tokens after the angle/units.

    However, in the new version of rust-cssparser, Parser.next() gives back a Result with a &Token success value — a reference to a token —, while the old version returned a plain Token. No problem, I thought, I'm just going to de-reference the value in the match and be done with it:

        let token = parser.next ()
            .map_err (|_| ParseError::new ("expected angle"))?;
    
        match *token {
        //    ^ dereference here...
            Token::Number { value, .. } => value as f64,
    
            Token::Dimension { value, ref unit, .. } => {
        //                            ^ avoid moving the unit value
    

    The compiler complained elsewhere. The whole function now looked like this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    pub fn parse_angle_degrees (s: &str) -> Result <f64, ParseError> {
        let mut parser = Parser::new (s);
    
        let token = parser.next ()
            .map_err (|_| ParseError::new ("expected angle"))?;
    
        match token {
            // ...
        }.and_then (|r|
                    parser.expect_exhausted ()
                    .map (|_| r)
                    .map_err (|_| ParseError::new ("expected angle")))
    }
    

    But in line 4, token is now a reference to something that lives inside parser, and parser is therefore borrowed mutably. The compiler didn't like that line 10 (the call to parser.expect_exhausted()) was trying to borrow parser mutably again.

    I played a bit with creating a temporary scope around the assignment to token so that it would only borrow parser mutably inside that scope. Things ended up like this, without the call to and_then() after the match:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    pub fn angle_degrees (s: &str) -> Result <f64, ParseError> {
        let mut input = ParserInput::new (s);
        let mut parser = Parser::new (&mut input);
    
        let angle = {
            let token = parser.next ()
                .map_err (|_| ParseError::new ("expected angle"))?;
    
            match *token {
                Token::Number { value, .. } => value as f64,
    
                Token::Dimension { value, ref unit, .. } => {
                    let value = value as f64;
    
                    match unit.as_ref () {
                        "deg"  => value,
                        "grad" => value * 360.0 / 400.0,
                        "rad"  => value * 180.0 / PI,
                        _      => return Err (ParseError::new ("expected 'deg' | 'grad' | 'rad'"))
                    }
                },
    
                _ => return Err (ParseError::new ("expected angle"))
            }
        };
    
        parser.expect_exhausted ().map_err (|_| ParseError::new ("expected angle"))?;
    
        Ok (angle)
    }
    

    Lines 5 through 25 are basically

        let angle = {
            // parse out the angle; return if error
        };
    

    And after that is done, I test for parser.expect_exhausted(). There is no chaining of results with helper functions; instead it's just going through each token linearly.

    The API break was annoying to deal with, but fortunately the calling code ended up cleaner, and I didn't have to change anything in the tests. I hope rust-cssparser can stabilize its API for consumers that are not Servo.

  10. Legacy Systems as Old Cities

    Translations: es - gnome, recompiler, urbanism

    I just realized that I only tweeted about this a couple of months ago, but never blogged about it. Shame on me!

    I wrote an article, Legacy Systems as Old Cities, for The Recompiler magazine. Is GNOME, now at 20 years old, legacy software? Is it different from mainframe software because "everyone" can change it? Does long-lived software have the same patterns of change as cities and physical artifacts? Can we learn from the building trades and urbanism for maintaining software in the long term? Could we turn legacy software into a good legacy?

    You can read the article here.

    Also, let me take this opportunity to recommend The Recompiler magazine. It is the most enjoyable technical publication I read. Their podcast is also excellent!

    Update 2017/06/10 - Spanish version of the article, Los Sistemas Heredados como Ciudades Viejas

Page 1 / 2 »