Federico's Blog

  1. A mini-rant on the lack of string slices in C

    - librsvg, rust

    Porting of librsvg to Rust goes on. Yesterday I started porting the C code that implements SVG's <text> family of elements. I have also been replacing the little parsers in librsvg with Rust code.

    And these days, the lack of string slices in C is bothering me a lot.

    What if...

    It feels like it should be easy to just write something like

    typedef struct {
        const char *ptr;
        size_t len;
    } StringSlice;
    

    And then a whole family of functions. The starting point, where you slice a whole string:

    StringSlice
    make_slice_from_string (const char *s)
    {
        StringSlice slice;
    
        assert (s != NULL);
    
        slice.ptr = s;
        slice.len = strlen (s);
        return slice;
    }
    

    But that wouldn't keep track of the lifetime of the original string. Okay, this is C, so you are used to keeping track of that yourself.

    Onwards. Substrings?

    StringSlice
    make_sub_slice(StringSlice slice, size_t start, size_t len)
    {
        StringSlice sub;
    
        assert (len <= slice.len);
        assert (start <= slice.len - len);  /* Not "start + len <= slice.len" or it can overflow. */
                                            /* The subtraction can't underflow because of the previous assert */
        sub.ptr = slice.ptr + start;
        sub.len = len;
        return sub;
    }
    

    Then you could write a million wrappers for g_strsplit() and friends, or equivalents to them, to give you slices instead of C strings. But then:

    • You have to keep track of lifetimes yourself.

    • You have to wrap every function that returns a plain "char *"...

    • ... and every function that takes a plain "char *" as an argument, without a length parameter, because...

    • You CANNOT take slice.ptr and pass it to a function that just expects a plain "char *", because your slice does not include a nul terminator (the '\0 byte at the end of a C string). This is what kills the whole plan.

    Even if you had a helper library that implements C string slices like that, you would have a mismatch every time you needed to call a C function that expects a conventional C string in the form of a "char *". You need to put a nul terminator somewhere, and if you only have a slice, you need to allocate memory, copy the slice into it, and slap a 0 byte at the end. Then you can pass that to a function that expects a normal C string.

    There is hacky C code that needs to pass a substring to another function, so it overwrites the byte after the substring with a 0, passes the substring, and overwrites the byte back. This is horrible, and doesn't work with strings that live in read-only memory. But that's the best that C lets you do.

    I'm very happy with string slices in Rust, which work exactly like the StringSlice above, but &str is actually at the language level and everything knows how to handle it.

    The glib-rs crate has conversion traits to go from Rust strings or slices into C, and vice-versa. We alredy saw some of those in the blog post about conversions in Glib-rs.

    Sizes of things

    Rust uses usize to specify the size of things; it's an unsigned integer; 32 bits on 32-bit machines, and 64 bits on 64-bit machines; it's like C's size_t.

    In the Glib/C world, we have an assortment of types to represent the sizes of things:

    • gsize, the same as size_t. This is an unsigned integer; it's okay.

    • gssize, a signed integer of the same size as gsize. This is okay if used to represent a negative offset, and really funky in the Glib functions like g_string_new_len (const char *str, gssize len), where len == -1 means "call strlen(str) for me because I'm too lazy to compute the length myself".

    • int - broken, as in libxml2, but we can't change the API. On 64-bit machines, an int to specify a length means you can't pass objects bigger than 2 GB.

    • long - marginally better than int, since it has a better chance of actually being the same size as size_t, but still funky. Probably okay for negative offsets; problematic for sizes which should really be unsigned.

    • etc.

    I'm not sure how old size_t is in the C standard library, but it can't have been there since the beginning of time — otherwise people wouldn't have been using int to specify the sizes of things.

  2. Code Hospitality

    - software with living structure

    Recently on the Greater than Code podcast there was an episode called "Code Hospitality", by Nadia Odunayo.

    Nadia talks about thinking of how to make people comfortable in your code and in your team/organization/etc., and does it in terms of thinking about host/guest relationships. Have you ever stayed in an AirBnB where the host carefully prepares some "welcome instructions" for you, or puts little notes in their apartment to orient/guide you, or gives you basic guidance around their city's transportation system? We can think in similar ways of how to make people comfortable with code bases.

    This of course hit me on so many levels, because in the past I've written about analogies between software and urbanism/architecture. Software that has the Quality Without A Name talks about Christopher Alexander's architecture/urbanism patterns in the context of software, based on Richard Gabriel's ideas, and Nikos Salingaros's formalization of the design process. Legacy Systems as Old Cities talks about how GNOME evolved parts of its user-visible software, and makes an analogy with cities that evolve over time instead of being torn down and rebuilt, based on urbanism ideas by Jane Jacobs, and architecture/construction ideas by Stewart Brand.

    I definitely intend to do some thinking on Nadia's ideas for Code Hospitality and try to connect them with this.

    In the meantime, I've just rewritten the README in gnome-class to make it suitable as an introduction to hacking there.

  3. Rust+GNOME Hackfest in Berlin, 2017

    - Berlin, gnome, hackfests, rust

    Last weekend I was in Berlin for the second Rust+GNOME Hackfest, kindly hosted at the Kinvolk office. This is in a great location, half a block away from the Kottbusser Tor station, right at the entrance of the trendy Kreuzberg neighborhood — full of interesting people, incredible graffitti, and good, diverse food.

    Rug of Kottbusser Tor

    My goals for the hackfest

    Over the past weeks I had been converting gnome-class from the old lalrpop-based parser into the new Procedural Macros framework for Rust, or proc-macro2 for short. To do this the parser for the gnome-class mini-language needs to be rewritten from being specified in a lalrpop grammar, to using Rust's syn crate.

    Syn is a parser for Rust source code, written as a set of nom combinator parser macros. For gnome-class we want to extend the Rust language with a few conveniences to be able to specify GObject classes/subclasses, methods, signals, properties, interfaces, and all the goodies that GObject Introspection would expect.

    During the hackfest, Alex Crichton, from the Rust core team, kindly took over my baby steps in compiler writing and made everything much more functional. It was invaluable to have him there to reason about macro hygiene (we are generating an unhygienic macro!), bugs in the quoting system, and general Rust-iness of the whole thing.

    I was also able to talk to Sebastian Dröge about his work in writing GObjects in Rust by hand, for GStreamer, and what sort of things gnome-class could make easier. Sebastian knows GObject very well, and has been doing awesome work in making it easy to derive GObjects by hand in Rust, without lots of boilerplate — something with which gnome-class can certainly help.

    I was also looking forward to talking again with Guillaume Gomez, one of the maintainers of gtk-rs, and who does so much work in the Rust ecosystem that I can't believe he has time for it all.

    Graffitti heads

    Extend the Rust language for GObject? Like Vala?

    Yeah, pretty much.

    Except that instead of a wholly new language, we use Rust as-is, and we just add syntactic constructs that make it easy to write GObjects without boilerplate. For example, this works right now:

    #![feature(proc_macro)]
    
    extern crate gobject_gen;
    
    #[macro_use]
    extern crate glib;
    use gobject_gen::gobject_gen;
    
    gobject_gen! {
        // Derives from GObject
        class One {
        }
    
        impl One {
            // non-virtual method
            pub fn one(&self) -> u32 {
                1
            }
    
            virtual fn get(&self) -> u32 {
                1
            }
        }
    
        // Inherits from our other class
        class Two: One {
        }
    
        impl One for Two {
            // overrides the virtual method
            // maybe we should use "override" instead of "virtual" here?
            virtual fn get(&self) -> u32 {
                2
            }
        }
    }
    
    #[test]
    fn test() {
        let one = One::new();
        let two = Two::new();
    
        assert!(one.one() == 1);
        assert!(one.get() == 1);
        assert!(two.one() == 1);
        assert!(two.get() == 2);
    }
    

    This generates a little boatload of generated code, including a good number of unsafe calls to GObject functions like g_type_register_static_simple(). It also creates all the traits and paraphernalia that Glib-rs would create for the Rust binding of a normal GObject written in C.

    The idea is that from the outside world, your generated GObject classes are indistinguishable from GObjects implemented in C.

    The idea is to write GObject libraries in a better language than C, which can then be consumed from language bindings.

    Current status of gnome-class

    Up to about two weeks before the hackfest, the syntax for this mini-language was totally ad-hoc and limited. After a very productive discussion on the mailing list, we came up with a better syntax that definitely looks more Rust-like. It is also easier to implement, since the Rust parser in syn can be mostly reused as-is, or pruned down for the parts where we only support GObject-like methods, and not all the Rust bells and whistles (generics, lifetimes, trait bounds).

    Gnome-class supports deriving classes directly from the basic GObject, or from other GObject subclasses in the style of glib-rs.

    You can define virtual and non-virtual methods. You can override virtual methods from your superclasses.

    Not all argument types are supported. In the end we should support argument types which are convertible from Rust to C types. We need to finish figuring out the annotations for ownership transfer of references.

    We don't support GObject signals yet; I think that's my next task.

    We don't support GObject properties yet.

    We don't support defining new GType interfaces yet, but it is planned. It should be easy to support implementing existing interfaces, as it is pretty much the same as implementing a subclass.

    The best way to see what works right now is probably to look at the examples, which also work as tests.

    Digression on macro hygiene

    Rust macros are hygienic, unlike C macros which work just through textual substitution. That is, names declared inside Rust macros will not clash with names in the calling code.

    One peculiar thing about gnome-class is that the user gives us a few names, like a class name Foo and some things inside it, say, a method name bar, and a signal baz and a property qux. From there we want to generate a bunch of boilerplate for GObject registration and implementaiton. Some of the generated names in that boilerplate would be

    Foo              // base name
    FooClass         // generated name for the class struct
    Foo::bar()       // A method
    Foo::emit_baz()  // Generated from the signal name
    Foo::set_qux()   // Generated property setter
    foo_bar()        // Generated C function for a method call
    foo_get_type()   // Generated C function that all GObjects have
    

    However, if we want to actually generate those names inside our gnome-class macro and make them visible to the caller, we need to do so unhygienically. Alex started started a very interesting discussion on macro hygiene, so expect some news in the Rust world soon.

    TL;DR: there is a difference between a code generator, which gnome-class mostly intends to be, and a macro system which is just an aid in typing repetitive code.

    Fuck wars

    People for whom to to be thankful

    During the hackfest, Nirbheek has been porting librsvg from Autotools to the Meson build system, and dealing with Rust peculiarities along the way. This is exactly what I needed! Thanks, Nirbheek!

    Sebastian answered many of my questions about GObject internals and how to use them from the Rust side.

    Zeeshan took us to a bunch of good restaurants. Korean, ramen, Greek, excellent pizza... My stomach is definitely thankful.

    Berlin

    I love Berlin. It is a cosmopolitan, progressive, LGBTQ-friendly city, with lots of things to do, vast distances to be traveled, with good public transport and bike lanes, diverse food to be eaten along the way...

    But damnit, it's also cold at this time of the year. I don't think the weather was ever above 10°C while we were there, and mostly in a constant state of not-quite-rain. This is much different from the Berlin in the summer that I knew!

    Hackers at Kimchi Princess

    This is my third time visiting Berlin. The first one was during the Desktop Summit in 2011, and the second one was when my family and I visited the city two years ago. It is a city that I would definitely like to know better.

    Thanks to the GNOME Foundation...

    ... for sponsoring my travel and accomodation during the hackfest.

    Sponsored by the GNOME Foundation

  4. Compilation notifications in Emacs

    - emacs

    Here is a little Emacs Lisp snippet that I've started using. It makes Emacs pop up a desktop-wide notification when a compilation finishes, i.e. after "M-x compile" is done. Let's see if that keeps me from wasting time in the web when I launch a compilation.

    (setq compilation-finish-functions
          (append compilation-finish-functions
              '(fmq-compilation-finish)))
    
    (defun fmq-compilation-finish (buffer status)
      (call-process "notify-send" nil nil nil
            "-t" "0"
            "-i" "emacs"
            "Compilation finished in Emacs"
            status))
    
  5. How glib-rs works, part 3: Boxed types

    - gnome, rust

    (First part of the series, with index to all the articles)

    Now let's get on and see how glib-rs handles boxed types.

    Boxed types?

    Let's say you are given a sealed cardboard box with something, but you can't know what's inside. You can just pass it on to someone else, or burn it. And since computers are magic duplication machines, you may want to copy the box and its contents... and maybe some day you will get around to opening it.

    That's a boxed type. You get a pointer to something, who knows what's inside. You can just pass it on to someone else, burn it — I mean, free it — or since computers are magic, copy the pointer and whatever it points to.

    That's exactly the API for boxed types.

    typedef gpointer (*GBoxedCopyFunc) (gpointer boxed);
    typedef void (*GBoxedFreeFunc) (gpointer boxed);
    
    GType g_boxed_type_register_static (const gchar   *name,
                                        GBoxedCopyFunc boxed_copy,
                                        GBoxedFreeFunc boxed_free);
    

    Simple copying, simple freeing

    Imagine you have a color...

    typedef struct {
        guchar r;
        guchar g;
        guchar b;
    } Color;
    

    If you had a pointer to a Color, how would you copy it? Easy:

    Color *copy_color (Color *a)
    {
        Color *b = g_new (Color, 1);
        *b = *a;
        return b;
    }
    

    That is, allocate a new Color, and essentially memcpy() the contents.

    And to free it? A simple g_free() works — there are no internal things that need to be freed individually.

    Complex copying, complex freeing

    And if we had a color with a name?

    typedef struct {
        guchar r;
        guchar g;
        guchar b;
        char *name;
    } ColorWithName;
    

    We can't just *a = *b here, as we actually need to copy the string name. Okay:

    ColorWithName *copy_color_with_name (ColorWithName *a)
    {
        ColorWithName *b = g_new (ColorWithName, 1);
        b->r = a->r;
        b->g = a->g;
        b->b = a->b;
        b->name = g_strdup (a->name);
        return b;
    }
    

    The corresponding free_color_with_name() would g_free(b->name) and then g_free(b), of course.

    Glib-rs and boxed types

    Let's look at this by parts. First, a BoxedMemoryManager trait to define the basic API to manage the memory of boxed types. This is what defines the copy and free functions, like above.

    pub trait BoxedMemoryManager<T>: 'static {
        unsafe fn copy(ptr: *const T) -> *mut T;
        unsafe fn free(ptr: *mut T);
    }
    

    Second, the actual representation of a Boxed type:

    pub struct Boxed<T: 'static, MM: BoxedMemoryManager<T>> {
        inner: AnyBox<T>,
        _dummy: PhantomData<MM>,
    }
    

    This struct is generic over T, the actual type that we will be wrapping, and MM, something which must implement the BoxedMemoryManager trait.

    Inside, it stores inner, an AnyBox, which we will see shortly. The _dummy: PhantomData<MM> is a Rust-ism to indicate that although this struct doesn't actually store a memory manager, it acts as if it does — it does not concern us here.

    The actual representation of boxed data

    Let's look at that AnyBox that is stored inside a Boxed:

    enum AnyBox<T> {
        Native(Box<T>),
        ForeignOwned(*mut T),
        ForeignBorrowed(*mut T),
    }
    

    We have three cases:

    • Native(Box<T>) - this boxed value T comes from Rust itself, so we know everything about it!

    • ForeignOwned(*mut T) - this boxed value T came from the outside, but we own it now. We will have to free it when we are done with it.

    • ForeignBorrowed(*mut T) - this boxed value T came from the outside, but we are just borrowing it temporarily: we don't want to free it when we are done with it.

    For example, if we look at the implementation of the Drop trait for the Boxed struct, we will indeed see that it calls the BoxedMemoryManager::free() only if we have a ForeignOwned value:

    impl<T: 'static, MM: BoxedMemoryManager<T>> Drop for Boxed<T, MM> {
        fn drop(&mut self) {
            unsafe {
                if let AnyBox::ForeignOwned(ptr) = self.inner {
                    MM::free(ptr);
                }
            }
        }
    }
    

    If we had a Native(Box<T>) value, it means it came from Rust itself, and Rust knows how to Drop its own Box<T> (i.e. a chunk of memory allocated in the heap).

    But for external resources, we must tell Rust how to manage them. Again: in the case where the Rust side owns the reference to the external boxed data, we have a ForeignOwned and Drop it by free()ing it; in the case where the Rust side is just borrowing the data temporarily, we have a ForeignBorrowed and don't touch it when we are done.

    Copying

    When do we have to copy a boxed value? For example, when we transfer from Rust to Glib with full transfer of ownership, i.e. the to_glib_full() pattern that we saw before. This is how that trait method is implemented for Boxed:

    impl<'a, T: 'static, MM: BoxedMemoryManager<T>> ToGlibPtr<'a, *const T> for Boxed<T, MM> {
        fn to_glib_full(&self) -> *const T {
            use self::AnyBox::*;
            let ptr = match self.inner {
                Native(ref b) => &**b as *const T,
                ForeignOwned(p) | ForeignBorrowed(p) => p as *const T,
            };
            unsafe { MM::copy(ptr) }
        }
    }
    

    See the MM:copy(ptr) in the last line? That's where the copy happens. The lines above just get the appropriate pointer to the data data from the AnyBox and cast it.

    There is extra boilerplate in boxed.rs which you can look at; it's mostly a bunch of trait implementations to copy the boxed data at the appropriate times (e.g. the FromGlibPtrNone trait), also an implementation of the Deref trait to get to the contents of a Boxed / AnyBox easily, etc. The trait implementations are there just to make it as convenient as possible to handle Boxed types.

    Who implements BoxedMemoryManager?

    Up to now, we have seen things like the implementation of Drop for Boxed, which uses BoxedMemoryManager::free(), and the implementation of ToGlibPtr which uses ::copy().

    But those are just the trait's "abstract" methods, so to speak. What actually implements them?

    Glib-rs has a general-purpose macro to wrap Glib types. It can wrap boxed types, shared pointer types, and GObjects. For now we will just look at boxed types.

    Glib-rs comes with a macro, glib_wrapper!(), that can be used in different ways. You can use it to automatically write the boilerplate for a boxed type like this:

    glib_wrapper! {
        pub struct Color(Boxed<ffi::Color>);
    
        match fn {
            copy => |ptr| ffi::color_copy(mut_override(ptr)),
            free => |ptr| ffi::color_free(ptr),
            get_type => || ffi::color_get_type(),
        }
    }
    

    This expands to an internal glib_boxed_wrapper!() macro that does a few things. We will only look at particularly interesting bits.

    First, the macro creates a newtype around a tuple with 1) the actual data type you want to box, and 2) a memory manager. In the example above, the newtype would be called Color, and it would wrap an ffi:Color (say, a C struct).

            pub struct $name(Boxed<$ffi_name, MemoryManager>);
    

    Aha! And that MemoryManager? The macro defines it as a zero-sized type:

            pub struct MemoryManager;
    

    Then it implements the BoxedMemoryManager trait for that MemoryManager struct:

            impl BoxedMemoryManager<$ffi_name> for MemoryManager {
                #[inline]
                unsafe fn copy($copy_arg: *const $ffi_name) -> *mut $ffi_name {
                    $copy_expr
                }
    
                #[inline]
                unsafe fn free($free_arg: *mut $ffi_name) {
                    $free_expr
                }
            }
    

    There! This is where the copy/free methods are implemented, based on the bits of code with which you invoked the macro. In the call to glib_wrapper!() we had this:

            copy => |ptr| ffi::color_copy(mut_override(ptr)),
            free => |ptr| ffi::color_free(ptr),
    

    In the impl aboe, the $copy_expr will expand to ffi::color_copy(mut_override(ptr)) and $free_expr will expand to ffi::color_free(ptr), which defines our implementation of a memory manager for our Color boxed type.

    Zero-sized what?

    Within the macro's definition, let's look again at the definitions of our boxed type and the memory manager object that actually implements the BoxedMemoryManager trait. Here is what the macro would expand to with our Color example:

            pub struct Color(Boxed<ffi::Color, MemoryManager>);
    
            pub struct MemoryManager;
    
            impl BoxedMemoryManager<ffi::Color> for MemoryManager {
                unsafe fn copy(...) -> *mut ffi::Color { ... }
                unsafe fn free(...) { ... }
            }
    

    Here, MemoryManager is a zero-sized type. This means it doesn't take up any space in the Color tuple! When a Color is allocated in the heap, it is really as if it contained an ffi::Color (the C struct we are wrapping) and nothing else.

    All the knowledge about how to copy/free ffi::Color lives only in the compiler thanks to the trait implementation. When the compiler expands all the macros and monomorphizes all the generic functions, the calls to ffi::color_copy() and ffi::color_free() will be inlined at the appropriate spots. There is no need to have auxiliary structures taking up space in the heap, just to store function pointers to the copy/free functions, or anything like that.

    Next up

    You may have seen that our example call to glib_wrapper!() also passed in a ffi::color_get_type() function. We haven't talked about how glib-rs wraps Glib's GType, GValue, and all of that. We are getting closer and closer to being able to wrap GObject.

    Stay tuned!

  6. Initial posts about librsvg's C to Rust conversion

    - librsvg, rust

    The initial articles about librsvg's conversion to Rust are in my old blog, so they may be a bit hard to find from this new blog. Here is a list of those posts, just so they are easier to find:

    Within this new blog, you can look for articles with the librsvg tag.

  7. The Magic of GObject Introspection

    - gnome, gobject-introspection, rust

    Before continuing with the glib-rs architecture, let's take a detour and look at GObject Introspection. Although it can seem like an obscure part of the GNOME platform, it is an absolutely vital part of it: it is what lets people write GNOME applications in any language.

    Let's start with a bit of history.

    Brief history of language bindings in GNOME

    When we started GNOME in 1997, we didn't want to write all of it in C. We had some inspiration from elsewhere.

    Prehistory: GIMP and the Procedural Database

    There was already good precedent for software written in a combination of programming languages. Emacs, the flagship text editor of the GNU project, was written with a relatively small core in C, and the majority of the program in Emacs Lisp.

    In similar fashion, we were very influenced by the design of the GIMP, which was very innovative at that time. The GIMP has a large core written in C. However, it supports plug-ins or scripts written in a variety of languages. Initially the only scripting language available for the GIMP was Scheme.

    The GIMP's plug-ins and scripts run as separate processes, so they don't have immediate access to the data of the image being edited, or to the core functions of the program like "paint with a brush at this location". To let plug-ins and scripts access these data and these functions, the GIMP has what it calls a Procedural Database (PDB). This is a list of functions that the core program or plug-ins wish to export. For example, there are functions like gimp-scale-image and gimp-move-layer. Once these functions are registered in the PDB, any part of the program or plug-ins can call them. Scripts are often written to automate common tasks — for example, when one wants to adjust the contrast of photos and scale them in bulk. Scripts can call functions in the PDB easily, irrespective of the programming language they are written in.

    We wanted to write GNOME's core libraries in C, and write a similar Procedural Database to allow those libraries to be called from any programming language. Eventually it turned out that a PDB was not necessary, and there were better ways to go about enabling different programming languages.

    Enabling sane memory management

    GTK+ started out with a very simple scheme for memory management: a container owned its child widgets, and so on recursively. When you freed a container, it would be responsible for freeing its children.

    However, consider what happens when a widget needs to hold a reference to another widget that is not one of its children. For example, a GtkLabel with an underlined mnemonic ("_N_ame:") needs to have a reference to the GtkEntry that should be focused when you press Alt-N. In the very earliest versions of GTK+, how to do this was undefined: C programmers were already used to having shared pointers everywhere, and they were used to being responsible for managing their memory.

    Of course, this was prone to bugs. If you have something like

    typedef struct {
        GtkWidget parent;
    
        char *label_string;
        GtkWidget *widget_to_focus;
    } GtkLabel;
    

    then if you are writing the destructor, you may simply want to

    static void
    gtk_label_free (GtkLabel *label)
    {
        g_free (label_string);
        gtk_widget_free (widget_to_focus);          /* oops, we don't own this */
    
        free_parent_instance (&label->parent);
    }
    

    Say you have a GtkBox with the label and its associated GtkEntry. Then, freeing the GtkBox would recursively free the label with that gtk_label_free(), and then the entry with its own function. But by the time the entry gets freed, the line gtk_widget_free (widget_to_focus) has already freed the entry, and we get a double-free bug!

    Madness!

    That is, we had no idea what we were doing. Or rather, our understanding of widgets had not evolved to the point of acknowledging that a widget tree is not a simply tree, but rather a directed graph of container-child relationships, plus random-widget-to-random-widget relationships. And of course, other parts of the program which are not even widget implementations may need to keep references to widgets and free them or not as appropriate.

    I think Marius Vollmer was the first person to start formalizing this. He came from the world of GNU Guile, a Scheme interpreter, and so he already knew how garbage collection and seas of shared references ought to work.

    Marius implemented reference-counting for GTK+ — that's where gtk_object_ref() and gtk_object_unref() come from; they eventually got moved to the base GObject class, so we now have g_object_ref() and g_object_unref() and a host of functions to have weak references, notification of destruction, and all the things required to keep garbage collectors happy.

    The first language bindings

    The very first language bindings were written by hand. The GTK+ API was small, and it seemed feasible to take

    void gtk_widget_show (GtkWidget *widget);
    void gtk_widget_hide (GtkWidget *widget);
    
    void gtk_container_add (GtkContainer *container, GtkWidget *child);
    void gtk_container_remove (GtkContainer *container, GtkWidget *child);
    

    and just wrap those functions in various languages, by hand, on an as-needed basis.

    Of course, there is a lot of duplication when doing things that way. As the C API grows, one needs to do more and more manual work to keep up with it.

    Also, C structs with public fields are problematic. If we had

    typedef struct {
        guchar r;
        guchar g;
        guchar b;
    } GdkColor;
    

    and we expect program code to fill in a GdkColor by hand and pass it to a drawing function like

    void gdk_set_foreground_color (GdkDrawingContext *gc, GdkColor *color);
    

    then it is no problem to do that in C:

    GdkColor magenta = { 255, 0, 255 };
    
    gdk_set_foreground_color (gc, &magenta);
    

    But to do that in a high level language? You don't have access to C struct fields! And back then, libffi wasn't generally available.

    Authors of language bindings had to write some glue code, in C, by hand, to let people access a C struct and then pass it on to GTK+. For example, for Python, they would need to write something like

    PyObject *
    make_wrapped_gdk_color (PyObject *args, PyObject *kwargs)
    {
        GdkColor *g_color;
        PyObject *py_color;
    
        g_color = g_new (GdkColor, 1);
        /* ... fill in g_color->r, g, b from the Python args */
    
        py_color = wrap_g_color (g_color);
        return py_color;
    }
    

    Writing that by hand is an incredible amount of drudgery.

    What language bindings needed was a description of the API in a machine-readable format, so that the glue code could be written by a code generator.

    The first API descriptions

    I don't remember if it was the GNU Guile people, or the PyGTK people, who started to write descriptions of the GNOME API by hand. For ease of parsing, it was done in a Scheme-like dialect. A description may look like

    (class GtkWidget
           ;;; void gtk_widget_show (GtkWidget *widget);
           (method show
                   (args nil)
                   (retval nil))
    
           ;;; void gtk_widget_hide (GtkWidget *widget);
           (method hide
                   (args nil)
                   (retval nil)))
    
    (class GtkContainer
           ;;; void gtk_container_add (GtkContainer *container, GtkWidget *child);
           (method add
                   (args GtkWidget)
                   (retval nil)))
    
    (struct GdkColor
            (field r (type 'guchar))
            (field g (type 'guchar))
            (field b (type 'guchar))) 
    

    Again, writing those descriptions by hand (and keeping up with the C API) was a lot of work, but the glue code to implement the binding could be done mostly automatically. The generated code may need subsequent tweaks by hand to deal with details that the Scheme-like descriptions didn't contemplate, but it was better than writing everything by hand.

    Glib gets a real type system

    Tim Janik took over the parts of Glib that implement objects/signals/types, and added a lot of things to create a good type system for C. This is where things like GType, GValue, GParamSpec, and fundamental types come from.

    For example, a GType is an identifier for a type, and a GValue is a type plus, well, a value of that type. You can ask a GValue, "are you an int? are you a GObject?".

    You can register new types: for example, there would be code in Gdk that registers a new GType for GdkColor, so you can ask a value, "are you a color?".

    Registering a type involves telling the GObject system things like how to copy values of that type, and how to free them. For GdkColor this may be just g_new() / g_free(); for reference-counted objects it may be g_object_ref() / g_object_unref().

    Objects can be queried about some of their properties

    A widget can tell you when you press a mouse button mouse on it: it will emit the button-press-event signal. When GtkWidget's implementation registers this signal, it calls something like

        g_signal_new ("button-press-event",
            gtk_widget_get_type(), /* type of object for which this signal is being created */
            ...
            G_TYPE_BOOLEAN,  /* type of return value */
            1,               /* number of arguments */
            GDK_TYPE_EVENT); /* type of first and only argument */
    

    This tells GObject that GtkWidget will have a signal called button-press-event, with a return type of G_TYPE_BOOLEAN, and with a single argument of type GDK_TYPE_EVENT. This lets GObject do the appropriate marshalling of arguments when the signal is emitted.

    But also! You can query the signal for its argument types! You can run g_signal_query(), which will then tell you all the details of the signal: its name, return type, argument types, etc. A language binding could run g_signal_query() and generate a description of the signal automatically to the Scheme-like description language. And then generate the binding from that.

    Not all of an object's properties can be queried

    Unfortunately, although GObject signals and properties can be queried, methods can't be. C doesn't have classes with methods, and GObject does not really have any provisions to implement them.

    Conventionally, for a static method one would just do

    void
    gtk_widget_set_flags (GtkWidget *widget, GtkWidgetFlags flags)
    {
        /* modify a struct field within "widget" or whatever */
        /* repaint or something */
    }
    

    And for a virtual method one would put a function pointer in the class structure, and provide a convenient way to call it:

    typedef struct {
        GtkObjectClass parent_class;
    
        void (* draw) (GtkWidget *widget, cairo_t *cr);
    } GtkWidgetClass;
    
    void
    gtk_widget_draw (GtkWidget *widget, cairo_t *cr)
    {
        GtkWidgetClass *klass = find_widget_class (widget);
    
        (* klass->draw) (widget, cr);
    }
    

    And GObject has no idea about this method — there is no way to query it; it just exists in C-space.

    Now, historically, GTK+'s header files have been written in a very consistent style. It is quite possible to write a tool that will take a header file like

    /* gtkwidget.h */
    typedef struct {
        GtkObject parent_class;
    
        void (* draw) (GtkWidget *widget, cairo_t *cr);
    } GtkWidgetClass;
    
    void gtk_widget_set_flags (GtkWidget *widget, GtkWidgetFlags flags);
    void gtk_widget_draw (GtkWidget *widget, cairo_t *cr);
    

    and parse it, even if it is with a simple parser that does not completely understand the C language, and have heuristics like

    • Is there a class_name_foo() function prototype with no corresponding foo field in the Class structure? It's probably a static method.

    • Is there a class_name_bar() function with a bar field in the Class structure? It's probably a virtual method.

    • Etc.

    And in fact, that's what we had. C header files would get parsed with those heuristics, and the Scheme-like description files would get generated.

    Scheme-like descriptions get reused, kind of

    Language binding authors started reusing the Scheme-like descriptions. Sometimes they would cannibalize the descriptions from PyGTK, or Guile (again, I don't remember where the canonical version was maintained) and use them as they were.

    Other times they would copy the files, modify them by hand some more, and then use them to generate their language binding.

    C being hostile

    From just reading/parsing a C function prototype, you cannot know certain things. If one function argument is of type Foo *, does it mean:

    • the function gets a pointer to something which it should not modify ("in" parameter)

    • the function gets a pointer to uninitialized data which it will set ("out" parameter)

    • the function gets a pointer to initialized data which it will use and modify ("inout" parameter)

    • the function will copy that pointer and hold a reference to the pointed data, and not free it when it's done

    • the function will take over the ownership of the pointed data, and free it when it's done

    • etc.

    Sometimes people would include these annotations in the Scheme-like description language. But wouldn't it be better if those annotations came from the C code itself?

    GObject Introspection appears

    For GNOME 3, we wanted a unified solution for language bindings:

    • Have a single way to extract the machine-readable descriptions of the C API.

    • Have every language binding be automatically generated from those descriptions.

    • In the descriptions, have all the information necessary to generate a correct language binding...

    • ... including documentation.

    We had to do a lot of work to accomplish this. For example:

    • Remove C-isms from the public API. Varargs functions, those that have foo (int x, ...), can't be easily described and called from other languages. Instead, have something like foov (int x, int num_args, GValue *args_array) that can be easily consumed by other languages.

    • Add annotations throughout the code so that the ad-hoc C parser can know about in/out/inout arguments, and whether pointer arguments are borrowed references or a full transfership of ownership.

    • Take the in-line documentation comments and store them as part of the machine-readable description of the API.

    • When compiling a library, automatically do all the things like g_signal_query() and spit out machine-readable descriptions of those parts of the API.

    So, GObject Introspection is all of those things.

    Annotations

    If you have looked at the C code for a GNOME library, you may have seen something like this:

    /**
     * gtk_widget_get_parent:
     * @widget: a #GtkWidget
     *
     * Returns the parent container of @widget.
     *
     * Returns: (transfer none) (nullable): the parent container of @widget, or %NULL
     **/
    GtkWidget *
    gtk_widget_get_parent (GtkWidget *widget)
    {
        ...
    }
    

    See that "(transfer none) (nullable)" in the documentation comments? The (transfer none) means that the return value is a pointer whose ownership does not get transferred to the caller, i.e. the widget retains ownership. Finally, the (nullable) indicates that the function can return NULL, when the widget has no parent.

    A language binding will then use this information as follows:

    • It will not unref() the parent widget when it is done with it.

    • It will deal with a NULL pointer in a special way, instead of assuming that references are not null.

    Every now and then someone discovers a public function which is lacking an annotation of that sort — for GNOME's purposes this is a bug; fortunately, it is easy to add that annotation to the C sources and regenerate the machine-readable descriptions.

    Machine-readable descriptions, or repository files

    So, what do those machine-readable descriptions actually look like? They moved away from a Scheme-like language and got turned into XML, because early XXIst century.

    The machine-readable descriptions are called GObject Introspection Repository files, or GIR for short.

    Let's look at some parts of Gtk-3.0.gir, which your distro may put in /usr/share/gir-1.0/Gtk-3.0.gir.

    <repository version="1.2" ...>
    
      <namespace name="Gtk"
                 version="3.0"
                 shared-library="libgtk-3.so.0,libgdk-3.so.0"
                 c:identifier-prefixes="Gtk"
                 c:symbol-prefixes="gtk">
    

    For the toplevel "Gtk" namespace, this is what the .so library is called. All identifiers have "Gtk" or "gtk" prefixes.

    A class with methods and a signal

    Let's look at the description for GtkEntry...

        <class name="Entry"
               c:symbol-prefix="entry"
               c:type="GtkEntry"
               parent="Widget"
               glib:type-name="GtkEntry"
               glib:get-type="gtk_entry_get_type"
               glib:type-struct="EntryClass">
    
          <doc xml:space="preserve">The #GtkEntry widget is a single line text entry
    widget. A fairly large set of key bindings are supported
    by default. If the entered text is longer than the allocation
    ...
           </doc>
    

    This is the start of the description for GtkEntry. We already know that everything is prefixed with "Gtk", so the name is just given as "Entry". Its parent class is Widget and the function which registers it against the GObject type system is gtk_entry_get_type.

    Also, there are the toplevel documentation comments for the Entry class.

    Onwards!

          <implements name="Atk.ImplementorIface"/>
          <implements name="Buildable"/>
          <implements name="CellEditable"/>
          <implements name="Editable"/>
    

    GObject classes can implement various interfaces; this is the list that GtkEntry supports.

    Next, let's look at a single method:

          <method name="get_text" c:identifier="gtk_entry_get_text">
            <doc xml:space="preserve">Retrieves the contents of the entry widget. ... </doc>
    
            <return-value transfer-ownership="none">
              <type name="utf8" c:type="const gchar*"/>
            </return-value>
    
            <parameters>
              <instance-parameter name="entry" transfer-ownership="none">
                <type name="Entry" c:type="GtkEntry*"/>
              </instance-parameter>
            </parameters>
          </method>
    

    The method get_text and its corresponding C symbol. Its return value is an UTF-8 encoded string, and ownership of the memory for that string is not transferred to the caller.

    The method takes a single parameter which is the entry instance itself.

    Now, let's look at a signal:

          <glib:signal name="activate" when="last" action="1">
            <doc xml:space="preserve">The ::activate signal is emitted when the user hits
    the Enter key. ...</doc>
    
            <return-value transfer-ownership="none">
              <type name="none" c:type="void"/>
            </return-value>
          </glib:signal>
    
        </class>
    

    The "activate" signal takes no arguments, and has a return value of type void, i.e. no return value.

    A struct with public fields

    The following comes from Gdk-3.0.gir; it's the description for GdkRectangle.

        <record name="Rectangle"
                c:type="GdkRectangle"
                glib:type-name="GdkRectangle"
                glib:get-type="gdk_rectangle_get_type"
                c:symbol-prefix="rectangle">
    
          <field name="x" writable="1">
            <type name="gint" c:type="int"/>
          </field>
          <field name="y" writable="1">
            <type name="gint" c:type="int"/>
          </field>
          <field name="width" writable="1">
            <type name="gint" c:type="int"/>
          </field>
          <field name="height" writable="1">
            <type name="gint" c:type="int"/>
          </field>
    
        </record>
    

    So that's the x/y/width/height fields in the struct, in the same order as they are defined in the C code.

    And so on. The idea is for the whole API exported by a GObject library to be describable by that format. If something can't be described, it's a bug in the library, or a bug in the format.

    Making language bindings start up quickly: typelib files

    As we saw, the GIR files are the XML descriptions of GObject APIs. Dynamic languages like Python would prefer to generate the language binding on the fly, as needed, instead of pre-generating a huge binding.

    However, GTK+ is a big API: Gtk-3.0.gir is 7 MB of XML. Parsing all of that just to be able to generate gtk_widget_show() on the fly would be too slow. Also, there are GTK+'s dependencies: Atk, Gdk, Cairo, etc. You don't want to parse everything just to start up!

    So, we have an extra step that compiles the GIR files down to binary .typelib files. For example, /usr/lib64/girepository-1.0/Gtk-3.0.typelib is about 600 KB on my machine. Those files get mmap()ed for fast access, and can be shared between processes.

    How dynamic language bindings use typelib files

    GObject Introspection comes with a library that language binding implementors can use to consume those .typelib files. The libgirepository library has functions like "list all the classes available in this namespace", or "call this function with these values for arguments, and give me back the return value here".

    Internally, libgirepository uses libffi to actually call the C functions in the dynamically-linked libraries.

    So, when you write foo.py and do

    import gi
    gi.require_version('Gtk', '3.0')
    from gi.repository import Gtk
    win = Gtk.Window()
    

    what happens is that pygobject calls libgirepository to mmap() the .typelib, and sees that the constructor for Gtk.Window is a C function called gtk_window_new(). After seeing how that function wants to be called, it calls the function using libffi, wraps the result with a PyObject, and that's what you get on the Python side.

    Static languages

    A static language like Rust prefers to have the whole language binding pre-generated. This is what the various crates in gtk-rs do.

    The gir crate takes a .gir file (i.e. the XML descriptions) and does two things:

    • Reconstructs the C function prototypes and C struct declarations, but in a way Rust can understand them. This gets output to the sys crate.

    • Creates idiomatic Rust code for the language binding. This gets output to the various crates; for example, the gtk one.

    When reconstructing the C structs and prototypes, we get stuff like

    #[repr(C)]
    pub struct GtkWidget {
        pub parent_instance: gobject::GInitiallyUnowned,
        pub priv_: *mut GtkWidgetPrivate,
    }
    
    extern "C" {
        pub fn gtk_entry_new() -> *mut GtkWidget;
    }
    

    And the idiomatic bindings? Stay tuned!

  8. Librsvg's build infrastructure: Autotools and Rust

    - autotools, gnome, librsvg, rust

    Today I released librsvg 2.41.1, and it's a big release! Apart from all the Rust goodness, and the large number of bug fixes, I am very happy with the way the build system works these days. I've found it invaluable to have good examples of Autotools incantations to copy&paste, so hopefully this will be useful to someone else.

    There are some subtleties that a "good" autotools setup demands, and so far I think librsvg is doing well:

    • The configure script checks for cargo and rustc.

    • "make distcheck" works. This means that the build can be performed with builddir != srcdir, and also that make check runs the available tests and they all pass.

    • The rsvg_internals library is built with Rust, and our Makefile.am calls cargo build with the correct options. It is able to handle debug and release builds.

    • "make clean" cleans up the Rust build directories as well.

    • If you change a .rs file and type make, only the necessary stuff gets rebuilt.

    • Etcetera. I think librsvg feels like a normal autotool'ed library. Let's see how this is done.

    Librsvg's basic autotools setup

    Librsvg started out with a fairly traditional autotools setup with a configure.ac and Makefile.am. For historical reasons the .[ch] source files live in the toplevel librsvg/ directory, not in a src subdirectory or something like that.

    librsvg
    ├ configure.ac
    ├ Makefile.am
    ├ *.[ch]
    ├ src/
    ├ doc/
    ├ tests/
    └ win32/
    

    Adding Rust to the build

    The Rust source code lives in librsvg/rust; that's where Cargo.toml lives, and of course there is the conventional src subdirectory with the *.rs files.

    librsvg
    ├ configure.ac
    ├ Makefile.am
    ├ *.[ch]
    ├ src/
    ├ rust/         <--- this is new!
    │ ├ Cargo.toml
    │ └ src/
    ├ doc/
    ├ tests/
    └ win32/
    

    Detecting the presence of cargo and rustc in configure.ac

    This goes in configure.ac:

    AC_CHECK_PROG(CARGO, [cargo], [yes], [no])
    AS_IF(test x$CARGO = xno,
        AC_MSG_ERROR([cargo is required.  Please install the Rust toolchain from https://www.rust-lang.org/])
    )
    AC_CHECK_PROG(RUSTC, [rustc], [yes], [no])
    AS_IF(test x$RUSTC = xno,
        AC_MSG_ERROR([rustc is required.  Please install the Rust toolchain from https://www.rust-lang.org/])
    )
    

    These two try to execute cargo and rustc, respectively, and abort with an error message if they are not present.

    Supporting debug or release mode for the Rust build

    One can call cargo like "cargo build --release" to turn on expensive optimizations, or normally like just "cargo build" to build with debug information. That is, the latter is the default: if you don't pass any options, cargo does a debug build.

    Autotools and C compilers normally work a bit differently; one must call the configure script like "CFLAGS='-g -O0' ./configure" for a debug build, or "CFLAGS='-O2 -fomit-frame-pointer' ./configure" for a release build.

    Linux distros already have all the infrastructure to pass the appropriate CFLAGS to configure. We need to be able to pass the appropriate flag to Cargo. My main requirement for this was:

    • Distros shouldn't have to substantially change their RPM specfiles (or whatever) to accomodate the Rust build.
    • I assume that distros will want to make release builds by default.
    • I as a developer am comfortable with passing extra options to make debug builds on my machine.

    The scheme in librsvg lets you run "configure --enable-debug" to make it call a plain cargo build, or a plain "configure" to make it use cargo build --release instead. The CFLAGS are passed as usual through an environment variable. This way, distros don't have to change their packaging to keep on making release builds as usual.

    This goes in configure.ac:

    dnl Specify --enable-debug to make a development release.  By default,
    dnl we build in public release mode.
    
    AC_ARG_ENABLE(debug,
                  AC_HELP_STRING([--enable-debug],
                                 [Build Rust code with debugging information [default=no]]),
                  [debug_release=$enableval],
                  [debug_release=no])
    
    AC_MSG_CHECKING(whether to build Rust code with debugging information)
    if test "x$debug_release" = "xyes" ; then
        AC_MSG_RESULT(yes)
        RUST_TARGET_SUBDIR=debug
    else
        AC_MSG_RESULT(no)
        RUST_TARGET_SUBDIR=release
    fi
    AM_CONDITIONAL([DEBUG_RELEASE], [test "x$debug_release" = "xyes"])
    
    AC_SUBST([RUST_TARGET_SUBDIR])
    

    This defines an Automake conditional called DEBUG_RELEASE, which we will use in Makefile.am later.

    It also causes @RUST_TARGET_SUBDIR@ to be substituted in Makefile.am with either debug or release; we will see what these are about.

    Adding Rust source files

    The librsvg/rust/src directory has all the *.rs files, and cargo tracks their dependencies and whether they need to be rebuilt if one changes. However, since that directory is not tracked by make, it won't rebuild things if a Rust source file changes! So, we need to tell our Makefile.am about those files:

    RUST_SOURCES =                   \
            rust/build.rs            \
            rust/Cargo.toml          \
            rust/src/aspect_ratio.rs \
            rust/src/bbox.rs         \
            rust/src/cnode.rs        \
            rust/src/color.rs        \
            ...
    
    RUST_EXTRA =                     \
            rust/Cargo.lock
    
    EXTRA_DIST += $(RUST_SOURCES) $(RUST_EXTRA)
    

    It's a bit unfortunate that the change tracking is duplicated in the Makefile, but we are already used to listing all the C source files in there, anyway.

    Most notably, the rust subdirectory is not listed in the SUBDIRS in Makefile.am, since there is no rust/Makefile at all!

    Cargo release or debug build?

    if DEBUG_RELEASE
    CARGO_RELEASE_ARGS=
    else
    CARGO_RELEASE_ARGS=--release
    endif
    

    We will call cargo build with that argument later.

    Verbose or quiet build?

    Librsvg uses AM_SILENT_RULES([yes]) in configure.ac. This lets you just run "make" for a quiet build, or "make V=1" to get the full command lines passed to the compiler. Cargo supports something similar, so let's add it to Makefile.am:

    CARGO_VERBOSE = $(cargo_verbose_$(V))
    cargo_verbose_ = $(cargo_verbose_$(AM_DEFAULT_VERBOSITY))
    cargo_verbose_0 =
    cargo_verbose_1 = --verbose
    

    This expands the V variable to empty, 0, or 1. The result of expanding that gives us the final command-line argument in the CARGO_VERBOSE variable.

    What's the filename of the library we are building?

    RUST_LIB=@abs_top_builddir@/rust/target/@RUST_TARGET_SUBDIR@/librsvg_internals.a
    

    Remember our @RUST_TARGET_SUBDIR@ from configure.ac? If you call plain "cargo build", it will put the binaries in rust/target/debug. But if you call "cargo build --release", it will put the binaries in rust/target/release.

    With the bit above, the RUST_LIB variable now has the correct path for the built library. The @abs_top_builddir@ makes it work when the build directory is not the same as the source directory.

    Okay, so how do we call cargo?

    @abs_top_builddir@/rust/target/@RUST_TARGET_SUBDIR@/librsvg_internals.a: $(RUST_SOURCES)
        cd $(top_srcdir)/rust && \
        CARGO_TARGET_DIR=@abs_top_builddir@/rust/target cargo build $(CARGO_VERBOSE) $(CARGO_RELEASE_ARGS)
    

    We make the funky library filename depend on $(RUST_SOURCES). That's what will cause make to rebuild the Rust library if one of the Rust source files changes.

    We override the CARGO_TARGET_DIR with Automake's preference, and call cargo build with the correct arguments.

    Linking into the main C library

    librsvg_@RSVG_API_MAJOR_VERSION@_la_LIBADD = \
            $(LIBRSVG_LIBS)                      \
            $(LIBM)                              \
            $(RUST_LIB)
    

    This expands our $(RUST_LIB) from above into our linker line, along with librsvg's other dependencies.

    make check

    This is our hook so that make check will cause cargo test to run:

    check-local:
            cd $(srcdir)/rust && \
            CARGO_TARGET_DIR=@abs_top_builddir@/rust/target cargo test
    

    make clean

    Same thing for make clean and cargo clean:

    clean-local:
            cd $(top_srcdir)/rust && \
            CARGO_TARGET_DIR=@abs_top_builddir@/rust/target cargo clean
    

    Vendoring dependencies

    Linux distros probably want Rust packages to come bundled with their dependencies, so that they can replace them later with newer/patched versions.

    Here is a hook so that make dist will cause cargo vendor to be run before making the tarball. That command will creates a rust/vendor directory with a copy of all the Rust crates that librsvg depends on.

    RUST_EXTRA += rust/cargo-vendor-config
    
    dist-hook:
        (cd $(distdir)/rust && \
        cargo vendor -q && \
        mkdir .cargo && \
        cp cargo-vendor-config .cargo/config)
    

    The tarball needs to have a rust/.cargo/config to know where to find the vendored sources (i.e. the embedded dependencies), but we don't want that in our development source tree. Instead, we generate it from a rust/cargo-vendor-config file in our source tree:

    # This is used after `cargo vendor` is run from `make dist`.
    #
    # In the distributed tarball, this file should end up in
    # rust/.cargo/config
    
    [source.crates-io]
    registry = 'https://github.com/rust-lang/crates.io-index'
    replace-with = 'vendored-sources'
    
    [source.vendored-sources]
    directory = './vendor'
    

    One last thing

    If you put this in your Cargo.toml, release binaries will be a lot smaller. This turns on link-time optimizations (LTO), which removes unused functions from the binary.

    [profile.release]
    lto = true
    

    Summary and thanks

    I think the above is some good boilerplate that you can put in your configure.ac / Makefile.am to integrate a Rust sub-library into your C code. It handles make-y things like make clean and make check; debug and release builds; verbose and quiet builds; builddir != srcdir; all the goodies.

    I think the only thing I'm missing is to check for the cargo-vendor binary. I'm not sure how to only check for that if I'm the one making tarballs... maybe an --enable-maintainer-mode flag?

    This would definitely not have been possible without prior work. Thanks to everyone who figured out Autotools before me, so I could cut&paste your goodies:

    Update 2017/Nov/11: Fixed the initialization of RUST_EXTRA; thanks to Tobias Mueller for catching this.

  9. How Glib-rs works, part 2: Transferring lists and arrays

    - gnome, rust

    (First part of the series, with index to all the articles)

    In the first part, we saw how glib-rs provides the FromGlib and ToGlib traits to let Rust code convert from/to Glib's simple types, like to convert from a Glib gboolean to a Rust bool and vice-versa. We also saw the special needs of strings; since they are passed by reference and are not copied as simple values, we can use FromGlibPtrNone and FromGlibPtrFull depending on what kind of ownership transfer we want, none for "just make it look like we are using a borrowed reference", or full for "I'll take over the data and free it when I'm done". Going the other way around, we can use ToGlibPtr and its methods to pass things from Rust to Glib.

    In this part, we'll see the tools that glib-rs provides to do conversions of more complex data types. We'll look at two cases:

    And one final case just in passing:

    Passing arrays from Glib to Rust

    We'll look at the case for transferring null-terminated arrays of strings, since it's an interesting one. There are other traits to convert from Glib arrays whose length is known, not implied with a NULL element, but for now we'll only look at arrays of strings.

    Null-terminated arrays of strings

    Look at this function for GtkAboutDialog:

    /**
     * gtk_about_dialog_add_credit_section:
     * @about: A #GtkAboutDialog
     * @section_name: The name of the section
     * @people: (array zero-terminated=1): The people who belong to that section
     * ...
     */
    void
    gtk_about_dialog_add_credit_section (GtkAboutDialog  *about,
                                         const gchar     *section_name,
                                         const gchar    **people)
    

    You would use this like

    const gchar *translators[] = {
        "Alice <alice@example.com>",
        "Bob <bob@example.com>",
        "Clara <clara@example.com>",
        NULL
    };
    
    gtk_about_dialog_add_credit_section (my_about_dialog, _("Translators"), translators);
    

    The function expects an array of gchar *, where the last element is a NULL. Instead of passing an explicit length for the array, it's done implicitly by requiring a NULL pointer after the last element. The gtk-doc annotation says (array zero-terminated=1). When we generate information for the GObject-Introspection Repository (GIR), this is what comes out:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    <method name="add_credit_section"
            c:identifier="gtk_about_dialog_add_credit_section"
            version="3.4">
      ..
        <parameter name="people" transfer-ownership="none">
          <doc xml:space="preserve">The people who belong to that section</doc>
          <array c:type="gchar**">
            <type name="utf8" c:type="gchar*"/>
          </array>
        </parameter>
    

    You can see the transfer-ownership="none" in line 5. This means that the function will not take ownership of the passed array; it will make its own copy instead. By convention, GIR assumes that arrays of strings are NULL-terminated, so there is no special annotation for that here. If we were implementing this function in Rust, how would we read that C array of UTF-8 strings and turn it into a Rust Vec<String> or something? Easy:

    let c_char_array: *mut *mut c_char = ...; // comes from Glib
    let rust_translators = FromGlibPtrContainer::from_glib_none(c_char_array);
    // rust_translators is a Vec<String>
    

    Let's look at how this bad boy is implemented.

    First stage: impl FromGlibPtrContainer for Vec<T>

    We want to go from a "*mut *mut c_char" (in C parlance, a "gchar **") to a Vec<String>. Indeed, there is an implementation of the FromGlibPtrContainer trait for Vecs here. These are the first few lines:

    impl <P: Ptr, PP: Ptr, T: FromGlibPtrArrayContainerAsVec<P, PP>> FromGlibPtrContainer<P, PP> for Vec<T> {
        unsafe fn from_glib_none(ptr: PP) -> Vec<T> {
            FromGlibPtrArrayContainerAsVec::from_glib_none_as_vec(ptr)
        }
    

    So... that from_glib_none() will return a Vec<T>, which is what we want. Let's look at the first few lines of FromGlibPtrArrayContainerAsVec:

    1
    2
    3
    4
        impl FromGlibPtrArrayContainerAsVec<$ffi_name, *mut $ffi_name> for $name {
            unsafe fn from_glib_none_as_vec(ptr: *mut $ffi_name) -> Vec<Self> {
                FromGlibContainerAsVec::from_glib_none_num_as_vec(ptr, c_ptr_array_len(ptr))
            }
    

    Aha! This is inside a macro, thus the $ffi_name garbage. It's done like that so the same trait can be implemented for const and mut pointers to c_char.

    See the call to c_ptr_array_len() in line 3? That's what figures out where the NULL pointer is at the end of the array: it figures out the array's length.

    Second stage: impl FromGlibContainerAsVec::from_glib_none_num_as_vec()

    Now that the length of the array is known, the implementation calls FromGlibContainerAsVec::from_glib_none_num_as_vec()

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
        impl FromGlibContainerAsVec<$ffi_name, *const $ffi_name> for $name {
            unsafe fn from_glib_none_num_as_vec(ptr: *const $ffi_name, num: usize) -> Vec<Self> {
                if num == 0 || ptr.is_null() {
                    return Vec::new();
                }
    
                let mut res = Vec::with_capacity(num);
                for i in 0..num {
                    res.push(from_glib_none(ptr::read(ptr.offset(i as isize)) as $ffi_name));
                }
                res
            }
    

    Lines 3/4: If the number of elements is zero, or the array is NULL, return an empty Vec.

    Line 7: Allocate a Vec of suitable size.

    Lines 8/9: For each of the pointers in the C array, call from_glib_none() to convert it from a *const c_char to a String, like we saw in the first part.

    Done! We started with a *mut *mut c_char or a *const *const c_char and ended up with a Vec<String>, which is what we wanted.

    Passing GLists to Rust

    Some functions don't give you an array; they give you a GList or GSList. There is an implementation of FromGlibPtrArrayContainerAsVec that understands GList:

    impl<T> FromGlibPtrArrayContainerAsVec<<T as GlibPtrDefault>::GlibType, *mut glib_ffi::GList> for T
    where T: GlibPtrDefault + FromGlibPtrNone<<T as GlibPtrDefault>::GlibType> + FromGlibPtrFull<<T as GlibPtrDefault>::GlibType> {
    
        unsafe fn from_glib_none_as_vec(ptr: *mut glib_ffi::GList) -> Vec<T> {
            let num = glib_ffi::g_list_length(ptr) as usize;
            FromGlibContainer::from_glib_none_num(ptr, num)
        }
    

    The impl declaration is pretty horrible, so just look at the method: from_glib_none_as_vec() takes in a GList, then calls g_list_length() on it, and finally calls FromGlibContainer::from_glib_none_num() with the length it computed.

    I have a Glib container and its length

    In turn, that from_glib_none_num() goes here:

    impl <P, PP: Ptr, T: FromGlibContainerAsVec<P, PP>> FromGlibContainer<P, PP> for Vec<T> {
        unsafe fn from_glib_none_num(ptr: PP, num: usize) -> Vec<T> {
            FromGlibContainerAsVec::from_glib_none_num_as_vec(ptr, num)
        }
    

    Okay, getting closer to the actual implementation.

    Give me a vector already

    Finally, we get to the function that walks the GList:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    impl<T> FromGlibContainerAsVec<<T as GlibPtrDefault>::GlibType, *mut glib_ffi::GList> for T
    where T: GlibPtrDefault + FromGlibPtrNone<<T as GlibPtrDefault>::GlibType> + FromGlibPtrFull<<T as GlibPtrDefault>::GlibType> {
    
        unsafe fn from_glib_none_num_as_vec(mut ptr: *mut glib_ffi::GList, num: usize) -> Vec<T> {
            if num == 0 || ptr.is_null() {
                return Vec::new()
            }
            let mut res = Vec::with_capacity(num);
            for _ in 0..num {
                let item_ptr: <T as GlibPtrDefault>::GlibType = Ptr::from((*ptr).data);
                if !item_ptr.is_null() {
                    res.push(from_glib_none(item_ptr));
                }
                ptr = (*ptr).next;
            }
            res
        }
    

    Again, ignore the horrible impl declaration and just look at from_glib_none_num_as_vec().

    Line 4: that function takes in a ptr to a GList, and a num with the list's length, which we already computed above.

    Line 5: Return an empty vector if we have an empty list.

    Line 8: Allocate a vector of suitable capacity.

    Line 9: For each element, convert it with from_glib_none() and push it to the array.

    Line 14: Walk to the next element in the list.

    Passing containers from Rust to Glib

    This post is getting a bit long, so I'll just mention this briefly. There is a trait ToGlibContainerFromSlice that takes a Rust slice, and can convert it to various Glib types.

    • To GSlist and GList. These have methods like to_glib_none_from_slice() and to_glib_full_from_slice()

    • To an array of fundamental types. Here, you can choose between to_glib_none_from_slice(), which gives you a Stash like we saw the last time. Or, you can use to_glib_full_from_slice(), which gives you back a g_malloc()ed array with copied items. Finally, to_glib_container_from_slice() gives you back a g_malloc()ed array of pointers to values rather than plain values themselves. Which function you choose depends on which C API you want to call.

    I hope this post gives you enough practice to be able to "follow the traits" for each of those if you want to look at the implementations.

    Next up

    Passing boxed types, like public structs.

    Passing reference-counted types.

    How glib-rs wraps GObjects.

  10. How Glib-rs works, part 1: Type conversions

    - gnome, rust

    During the GNOME+Rust hackfest in Mexico City, Niko Matsakis started the implementation of gnome-class, a procedural macro that will let people implement new GObject classes in Rust and export them to the world. Currently, if you want to write a new GObject (e.g. a new widget) and put it in a library so that it can be used from language bindings via GObject-Introspection, you have to do it in C. It would be nice to be able to do this in a safe language like Rust.

    How would it be done by hand?

    In a C implementation of a new GObject subclass, one calls things like g_type_register_static() and g_signal_new() by hand, while being careful to specify the correct GType for each value, and being super-careful about everything, as C demands.

    In Rust, one can in fact do exactly the same thing. You can call the same, low-level GObject and GType functions. You can use #[repr(C)]] for the instance and class structs that GObject will allocate for you, and which you then fill in.

    You can see an example of this in gst-plugins-rs. This is where it implements a Sink GObject, in Rust, by calling Glib functions by hand: struct declarations, class_init() function, registration of type and interfaces.

    How would it be done by a machine?

    That's what Niko's gnome-class is about. During the hackfest it got to the point of being able to generate the code to create a new GObject subclass, register it, and export functions for methods. The syntax is not finalized yet, but it looks something like this:

    gobject_gen! {
        class Counter {
            struct CounterPrivate {
                val: Cell<u32>
            }
    
            signal value_changed(&self);
    
            fn set_value(&self, v: u32) {
                let private = self.private();
                private.val.set(v);
                // private.emit_value_changed();
            }
    
            fn get_value(&self) -> u32 {
                let private = self.private();
                private.val.get()
            }
        }
    }
    

    I started adding support for declaring GObject signals — mainly being able to parse them from what goes inside gobject_gen!() — and then being able to call g_signal_newv() at the appropriate time during the class_init() implementation.

    Types in signals

    Creating a signal for a GObject class is basically like specifying a function prototype: the object will invoke a callback function with certain arguments and return value when the signal is emitted. For example, this is how GtkButton registers its button-press-event signal:

      button_press_event_id =
        g_signal_new (I_("button-press-event"),
                      ...
                      G_TYPE_BOOLEAN,    /* type of return value */
                      1,                 /* how many arguments? */
                      GDK_TYPE_EVENT);   /* type of first and only argument */
    

    g_signal_new() creates the signal and returns a signal id, an integer. Later, when the object wants to emit the signal, it uses that signal id like this:

    GtkEventButton event = ...;
    gboolean return_val;
    
    g_signal_emit (widget, button_press_event_id, 0, event, &return_val);
    

    In the nice gobject_gen!() macro, if I am going to have a signal declaration like

    signal button_press_event(&self, event: &ButtonPressEvent) -> bool;
    

    then I will need to be able to translate the type names for ButtonPressEvent and bool into something that g_signal_newv() will understand: I need the GType values for those. Fundamental types like gboolean get constants like G_TYPE_BOOLEAN. Types that are defined at runtime, like GDK_TYPE_EVENT, get GType values generated at runtime, too, when one registers the type with g_type_register_*().

    Rust type GType
    i32 G_TYPE_INT
    u32 G_TYPE_UINT
    bool G_TYPE_BOOLEAN
    etc. etc.

    Glib types in Rust

    How does glib-rs, the Rust binding to Glib and GObject, handle types?

    Going from Glib to Rust

    First we need a way to convert Glib's types to Rust, and vice-versa. There is a trait to convert simple Glib types into Rust types:

    pub trait FromGlib<T>: Sized {
        fn from_glib(val: T) -> Self;
    }
    

    This means, if I have a T which is a Glib type, this trait will give you a from_glib() function which will convert it to a Rust type which is Sized, i.e. a type whose size is known at compilation time.

    For example, this is how it is implemented for booleans:

    impl FromGlib<glib_ffi::gboolean> for bool {
        #[inline]
        fn from_glib(val: glib_ffi::gboolean) -> bool {
            !(val == glib_ffi::GFALSE)
        }
    }
    

    and you use it like this:

    let my_gboolean: glib_ffi::gboolean = g_some_function_that_returns_gboolean ();
    
    let my_rust_bool: bool = from_glib (my_gboolean);
    

    Booleans in glib and Rust have different sizes, and also different values. Glib's booleans use the C convention: 0 is false and anything else is true, while in Rust booleans are strictly false or true, and the size is undefined (with the current Rust ABI, it's one byte).

    Going from Rust to Glib

    And to go the other way around, from a Rust bool to a gboolean? There is this trait:

    pub trait ToGlib {
        type GlibType;
    
        fn to_glib(&self) -> Self::GlibType;
    }
    

    This means, if you have a Rust type that maps to a corresponding GlibType, this will give you a to_glib() function to do the conversion.

    This is the implementation for booleans:

    impl ToGlib for bool {
        type GlibType = glib_ffi::gboolean;
    
        #[inline]
        fn to_glib(&self) -> glib_ffi::gboolean {
            if *self { glib_ffi::GTRUE } else { glib_ffi::GFALSE }
        }
    }
    

    And it is used like this:

    let my_rust_bool: bool = true;
    
    g_some_function_that_takes_gboolean (my_rust_bool.to_glib ());
    

    (If you are thinking "a function call to marshal a boolean" — note how the functions are inlined, and the optimizer basically compiles them down to nothing.)

    Pointer types - from Glib to Rust

    That's all very nice for simple types like booleans and ints. Pointers to other objects are slightly more complicated.

    GObject-Introspection allows one to specify how pointer arguments to functions are handled by using a transfer specifier.

    (transfer none)

    For example, if you call gtk_window_set_title(window, "Hello"), you would expect the function to make its own copy of the "Hello" string. In Rust terms, you would be passing it a simple borrowed reference. GObject-Introspection (we'll abbreviate it as GI) calls this GI_TRANSFER_NOTHING, and it's specified by using (transfer none) in the documentation strings for function arguments or return values.

    The corresponding trait to bring in pointers from Glib to Rust, without taking ownership, is this. It's unsafe because it will be used to de-reference pointers that come from the wild west:

    pub trait FromGlibPtrNone<P: Ptr>: Sized {
        unsafe fn from_glib_none(ptr: P) -> Self;
    }
    

    And you use it via this generic function:

    #[inline]
    pub unsafe fn from_glib_none<P: Ptr, T: FromGlibPtrNone<P>>(ptr: P) -> T {
        FromGlibPtrNone::from_glib_none(ptr)
    }
    

    Let's look at how this works. Here is the FromGlibPtrNone trait implemented for strings.

    1
    2
    3
    4
    5
    6
    7
    impl FromGlibPtrNone<*const c_char> for String {
        #[inline]
        unsafe fn from_glib_none(ptr: *const c_char) -> Self {
            assert!(!ptr.is_null());
            String::from_utf8_lossy(CStr::from_ptr(ptr).to_bytes()).into_owned()
        }
    }
    

    Line 1: given a pointer to a c_char, the conversion to String...

    Line 4: check for NULL pointers

    Line 5: Use the CStr to wrap the C ptr, like we looked at last time, validate it as UTF-8 and copy the string for us.

    Unfortunately, there's a copy involved in the last step. It may be possible to use Cow<&str> there instead to avoid a copy if the char* from Glib is indeed valid UTF-8.

    (transfer full)

    And how about transferring ownership of the pointed-to value? There is this trait:

    pub trait FromGlibPtrFull<P: Ptr>: Sized {
        unsafe fn from_glib_full(ptr: P) -> Self;
    }
    

    And the implementation for strings is as follows. In Glib's scheme of things, "transferring ownership of a string" means that the recipient of the string must eventually g_free() it.

    1
    2
    3
    4
    5
    6
    7
    8
    impl FromGlibPtrFull<*const c_char> for String {
        #[inline]
        unsafe fn from_glib_full(ptr: *const c_char) -> Self {
            let res = from_glib_none(ptr);
            glib_ffi::g_free(ptr as *mut _);
            res
        }
    }
    

    Line 1: given a pointer to a c_char, the conversion to String...

    Line 4: Do the conversion with from_glib_none() with the trait we saw before, put it in res.

    Line 5: Call g_free() on the original C string.

    Line 6: Return the res, a Rust string which we own.

    Pointer types - from Rust to Glib

    Consider the case where you want to pass a String from Rust to a Glib function that takes a *const c_char — in C parlance, a char *, without the Glib function acquiring ownership of the string. For example, assume that the C version of gtk_window_set_title() is in the gtk_ffi module. You may want to call it like this:

    fn rust_binding_to_window_set_title(window: &Gtk::Window, title: &String) {
        gtk_ffi::gtk_window_set_title(..., make_c_string_from_rust_string(title));
    }
    

    Now, what would that make_c_string_from_rust_string() look like?

    • We have: a Rust String — UTF-8, known length, no nul terminator

    • We want: a *const char — nul-terminated UTF-8

    So, let's write this:

    1
    2
    3
    4
    5
    fn make_c_string_from_rust_string(s: &String) -> *const c_char {
        let cstr = CString::new(&s[..]).unwrap();
        let ptr = cstr.into_raw() as *const c_char;
        ptr
    }
    

    Line 1: Take in a &String; return a *const c_char.

    Line 2: Build a CString like we way a few days ago: this allocates a byte buffer with space for a nul terminator, and copies the string's bytes. We unwrap() for this simple example, because CString::new() will return an error if the String contained nul characters in the middle of the string, which C doesn't understand.

    Line 3: Call into_raw() to get a pointer to the byte buffer, and cast it to a *const c_char. We'll need to free this value later.

    But this kind of sucks, because we the have to use this function, pass the pointer to a C function, and then reconstitute the CString so it can free the byte buffer:

    let buf = make_c_string_from_rust_string(my_string);
    unsafe { c_function_that_takes_a_string(buf); }
    let _ = CString::from_raw(buf as *mut c_char);
    

    The solution that Glib-rs provides for this is very Rusty, and rather elegant.

    Stashes

    We want:

    • A temporary place to put a piece of data
    • A pointer to that buffer
    • Automatic memory management for both of those

    Glib-rs defines a Stash for this:

    1
    2
    3
    4
    5
    6
    pub struct Stash<'a,                                 // we have a lifetime
                     P: Copy,                            // the pointer must be copy-able
                     T: ?Sized + ToGlibPtr<'a, P>> (     // Type for the temporary place
        pub P,                                           // We store a pointer...
        pub <T as ToGlibPtr<'a, P>>::Storage             // ... to a piece of data with that lifetime ...
    );
    

    ... and the piece of data must be of of the associated type ToGlibPtr::Storage, which we will see shortly.

    This struct Stash goes along with the ToGlibPtr trait:

    pub trait ToGlibPtr<'a, P: Copy> {
        type Storage;
    
        fn to_glib_none(&'a self) -> Stash<'a, P, Self>;  // returns a Stash whose temporary storage
                                                          // has the lifetime of our original data
    }
    

    Let's unpack this by looking at the implementation of the "transfer a String to a C function while keeping ownership":

    1
    2
    3
    4
    5
    6
    7
    8
    9
    impl <'a> ToGlibPtr<'a, *const c_char> for String {
        type Storage = CString;
    
        #[inline]
        fn to_glib_none(&self) -> Stash<'a, *const c_char, String> {
            let tmp = CString::new(&self[..]).unwrap();
            Stash(tmp.as_ptr(), tmp)
        }
    }
    

    Line 1: We implement ToGlibPtr<'a *const c_char> for String, declaring the lifetime 'a for the Stash.

    Line 2: Our temporary storage is a CString.

    Line 6: Make a CString like before.

    Line 7: Create the Stash with a pointer to the CString's contents, and the CString itself.

    (transfer none)

    Now, we can use ".0" to extract the first field from our Stash, which is precisely the pointer we want to a byte buffer:

    let my_string = ...;
    unsafe { c_function_which_takes_a_string(my_string.to_glib_none().0); }
    

    Now Rust knows that the temporary buffer inside the Stash has the lifetime of my_string, and it will free it automatically when the string goes out of scope. If we can accept the .to_glib_none().0 incantation for "lending" pointers to C, this works perfectly.

    (transfer full)

    And for transferring ownership to the C function? The ToGlibPtr trait has another method:

    pub trait ToGlibPtr<'a, P: Copy> {
        ...
    
        fn to_glib_full(&self) -> P;
    }
    

    And here is the implementation for strings:

    impl <'a> ToGlibPtr<'a, *const c_char> for String {
        fn to_glib_full(&self) -> *const c_char {
            unsafe {
                glib_ffi::g_strndup(self.as_ptr() as *const c_char, 
                                    self.len() as size_t)
                    as *const c_char
            }
        }
    

    We basically g_strndup() the Rust string's contents from its byte buffer and its len(), and we can then pass this on to C. That code will be responsible for g_free()ing the C-side string.

    Next up

    Transferring lists and arrays. Stay tuned!

Page 1 / 2 »