Federico's Blog

  1. GUADEC 2017 presentation

    - gnome, guadec, librsvg, rust, talks

    During GUADEC this year I gave a presentation called Replacing C library code with Rust: what I learned with librsvg. This is the PDF file; be sure to scroll past the full-page presentation pages until you reach the speaker's notes, especially for the code sections!

    Replacing C library code with Rust - link to PDF

    You can also get the ODP file for the presentation. This is released under a CC-BY-SA license.

    For the presentation, my daughter Luciana made some drawings of Ferris, the Rust mascot, also released under the same license:

    Ferris says hi Ferris busy at work Ferris makes a mess Ferris presents her work

  2. Surviving a rust-cssparser API break

    - gnome, librsvg, rust

    Yesterday I looked into updating librsvg's Rust dependencies. There have been some API breaks (!!!) in the unstable libraries that it uses since the last time I locked them. This post is about an interesting case of API breakage.

    rust-cssparser is the crate that Servo uses for parsing CSS. Well, more like tokenizing CSS: you give it a string, it gives you back tokens, and you are supposed to compose CSS selector information or other CSS values from the tokens.

    Librsvg uses rust-cssparser now for most of the micro-languages in SVG's attribute values, instead of its old, fragile C parsers. I hope to be able to use it in conjunction with Servo's rust-selectors crate to fully parse CSS data and replace libcroco.

    A few months ago, rust-cssparser's API looked more or less like the following. This is the old representation of a Token:

    pub enum Token<'a> {
        // an identifier
        Ident(Cow<'a, str>),
    
        // a plain number
        Number(NumericValue),
    
        // a percentage value normalized to [0.0, 1.0]
        Percentage(PercentageValue),
    
        WhiteSpace(&'a str),
        Comma,
    
        ...
    }
    

    That is, a Token can be an Identifier with a string name, or a Number, a Percentage, whitespace, a comma, and many others.

    On top of that is the old API for a Parser, which you construct with a string and then it gives you back tokens:

    impl<'i> Parser<'i> {
        pub fn new(input: &'i str) -> Parser<'i, 'i> {
    
        pub fn next(&mut self) -> Result<Token<'i>, ()> { ... }
    
        ...
    }
    

    This means the following. You create the parser out of a string slice with new(). You can then extract a Result with a Token sucessfully, or with an empty error value. The parser uses a lifetime 'i on the string from which it is constructed: the Tokens that return identifiers, for example, could return sub-string slices that come from the original string, and the parser has to be marked with a lifetime so that it does not outlive its underlying string.

    A few commits later, rust-cssparser got changed to return detailed error values, so that instead of () you get a a BasicParseError with sub-cases like UnexpectedToken or EndOfInput.

    After the changes to the error values for results, I didn't pay much attention to rust-cssparser for while. Yesterday I wanted to update librsvg to use the newest rust-cssparser, and had some interesting problems.

    First, Parser::new() was changed from taking just a &str slice to taking a ParserInput struct. This is an implementation detail which lets the parser cache the last token it saw. Not a big deal:

    // instead of constructing a parser like
    let mut parser = Parser::new (my_string);
    
    // you now construct it like
    let mut input = ParserInput::new (my_string);
    let mut parser = Parser::new (&mut input);
    

    I am not completely sure why this is exposed to the public API, since Rust won't allow you to have two mutable references to a ParserInput, and the only consumer of a (mutable) ParserInput is the Parser, anyway.

    However, the parser.next() function changed:

    // old version
    pub fn next(&mut self) -> Result<Token<'i>, ()> { ... }
    
    // new version
    pub fn next(&mut self) -> Result<&Token<'i>, BasicParseError<'i>> {... }
    // note this bad boy here -------^
    

    The successful Result from next() is now a reference to a Token, not a plain Token value which you now own. The parser is giving you a borrowed reference to its internally-cached token.

    My parsing functions for the old API looked similar to the following. This is a function that parses a string into an angle; it can look like "45deg" or "1.5rad", for example.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    pub fn parse_angle_degrees (s: &str) -> Result <f64, ParseError> {
        let mut parser = Parser::new (s);
    
        let token = parser.next ()
            .map_err (|_| ParseError::new ("expected angle"))?;
    
        match token {
            Token::Number (NumericValue { value, .. }) => Ok (value as f64),
    
            Token::Dimension (NumericValue { value, .. }, unit) => {
                let value = value as f64;
    
                match unit.as_ref () {
                    "deg"  => Ok (value),
                    "grad" => Ok (value * 360.0 / 400.0),
                    "rad"  => Ok (value * 180.0 / PI),
                    _      => Err (ParseError::new ("expected angle"))
                }
            },
    
            _ => Err (ParseError::new ("expected angle"))
        }.and_then (|r|
                    parser.expect_exhausted ()
                    .map (|_| r)
                    .map_err (|_| ParseError::new ("expected angle")))
    }
    

    This is a bit ugly, but it was the first version that passed the tests. Lines 4 and 5 mean, "get the first token or return an error". Line 17 means, "anything except deg, grad, or rad for the units causes the match expression to generate an error". Finally, I was feeling very proud of using and_then() in line 22, with parser.expect_exhausted(), to ensure that the parser would not find any more tokens after the angle/units.

    However, in the new version of rust-cssparser, Parser.next() gives back a Result with a &Token success value — a reference to a token —, while the old version returned a plain Token. No problem, I thought, I'm just going to de-reference the value in the match and be done with it:

        let token = parser.next ()
            .map_err (|_| ParseError::new ("expected angle"))?;
    
        match *token {
        //    ^ dereference here...
            Token::Number { value, .. } => value as f64,
    
            Token::Dimension { value, ref unit, .. } => {
        //                            ^ avoid moving the unit value
    

    The compiler complained elsewhere. The whole function now looked like this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    pub fn parse_angle_degrees (s: &str) -> Result <f64, ParseError> {
        let mut parser = Parser::new (s);
    
        let token = parser.next ()
            .map_err (|_| ParseError::new ("expected angle"))?;
    
        match token {
            // ...
        }.and_then (|r|
                    parser.expect_exhausted ()
                    .map (|_| r)
                    .map_err (|_| ParseError::new ("expected angle")))
    }
    

    But in line 4, token is now a reference to something that lives inside parser, and parser is therefore borrowed mutably. The compiler didn't like that line 10 (the call to parser.expect_exhausted()) was trying to borrow parser mutably again.

    I played a bit with creating a temporary scope around the assignment to token so that it would only borrow parser mutably inside that scope. Things ended up like this, without the call to and_then() after the match:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    pub fn angle_degrees (s: &str) -> Result <f64, ParseError> {
        let mut input = ParserInput::new (s);
        let mut parser = Parser::new (&mut input);
    
        let angle = {
            let token = parser.next ()
                .map_err (|_| ParseError::new ("expected angle"))?;
    
            match *token {
                Token::Number { value, .. } => value as f64,
    
                Token::Dimension { value, ref unit, .. } => {
                    let value = value as f64;
    
                    match unit.as_ref () {
                        "deg"  => value,
                        "grad" => value * 360.0 / 400.0,
                        "rad"  => value * 180.0 / PI,
                        _      => return Err (ParseError::new ("expected 'deg' | 'grad' | 'rad'"))
                    }
                },
    
                _ => return Err (ParseError::new ("expected angle"))
            }
        };
    
        parser.expect_exhausted ().map_err (|_| ParseError::new ("expected angle"))?;
    
        Ok (angle)
    }
    

    Lines 5 through 25 are basically

        let angle = {
            // parse out the angle; return if error
        };
    

    And after that is done, I test for parser.expect_exhausted(). There is no chaining of results with helper functions; instead it's just going through each token linearly.

    The API break was annoying to deal with, but fortunately the calling code ended up cleaner, and I didn't have to change anything in the tests. I hope rust-cssparser can stabilize its API for consumers that are not Servo.

  3. Legacy Systems as Old Cities

    Translations: es - gnome, recompiler, urbanism

    I just realized that I only tweeted about this a couple of months ago, but never blogged about it. Shame on me!

    I wrote an article, Legacy Systems as Old Cities, for The Recompiler magazine. Is GNOME, now at 20 years old, legacy software? Is it different from mainframe software because "everyone" can change it? Does long-lived software have the same patterns of change as cities and physical artifacts? Can we learn from the building trades and urbanism for maintaining software in the long term? Could we turn legacy software into a good legacy?

    You can read the article here.

    Also, let me take this opportunity to recommend The Recompiler magazine. It is the most enjoyable technical publication I read. Their podcast is also excellent!

    Update 2017/06/10 - Spanish version of the article, Los Sistemas Heredados como Ciudades Viejas

  4. Setting Alt-Tab behavior in gnome-shell

    - gnome, gnome-shell

    After updating my distro a few months ago, I somehow lost my tweaks to the Alt-Tab behavior in gnome-shell.

    The default is to have Alt-Tab switch you between applications in the current workspace. One can use Alt-backtick (or whatever key you have above Tab) to switch between windows in the current application.

    I prefer a Windows-like setup, where Alt-Tab switches between windows in the current workspace, regardless of the application to which they belong.

    Many moons ago there was a gnome-shell extension to change this behavior, but these days (GNOME 3.24) it can be done without extensions. It is a bit convoluted.

    With the GUI

    If you are using X instead of Wayland, this works:

    1. Unset the Switch applications command. To do this, run gnome-control-center, go to Keyboard, and find the Switch applications command. Click on it, and hit Backspace in the dialog that prompts you for the keyboard shortcut. Click on the Set button.

    2. Set the Switch windows command. While still in the Keyboard settings, find the Switch windows command. Click on it, and hit Alt-Tab. Click Set.

    That should be all you need, unless you are in Wayland. In that case, you need to do it on the command line.

    With the command line, or in Wayland

    The kind people on #gnome-hackers tell me that as of GNOME 3.24, changing Alt-Tab doesn't work on Wayland as in (2) above, because the compositor captures the Alt-Tab key when you type it inside the dialog that prompts you for a keyboard shortcut. In that case, you have to change the configuration keys directly instead of using the GUI:

    gsettings set org.gnome.desktop.wm.keybindings switch-applications "[]"
    gsettings set org.gnome.desktop.wm.keybindings switch-applications-backward "[]"
    gsettings set org.gnome.desktop.wm.keybindings switch-windows "['<Alt>Tab', '<Super>Tab']"
    gsettings set org.gnome.desktop.wm.keybindings switch-windows-backward  "['<Alt><Shift>Tab', '<Super><Shift>Tab']"
    

    Of course the above also works in X, too.

    Changing windows across all workspaces

    If you'd like to switch between windows in all workspaces, rather than in the current workspace, find the org.gnome.shell.window-switcher current-workspace-only GSettings key and change it. You can do this in dconf-editor, or on the command line with

    gsettings set org.gnome.shell.window-switcher current-workspace-only true
    
  5. Exploring Rust's standard library: system calls and errors

    - rust

    In this post I'll show you the code path that Rust takes inside its standard library when you open a file. I wanted to learn how Rust handles system calls and errno, and all the little subtleties of the POSIX API. This is what I learned!

    The C side of things

    When you open a file, or create a socket, or do anything else that returns an object that can be accessed like a file, you get a file descriptor in the form of an int.

    /* All of these return a int with a file descriptor, or
     * -1 in case of error.
     */
    int open(const char *pathname, int flags, ...);
    int socket(int domain, int type, int protocol);
    

    You get a nonnegative integer in case of success, or -1 in case of an error. If there's an error, you look at errno, which gives you an integer error code.

    int fd;
    
    retry_open:
    fd = open ("/foo/bar/baz.txt", 0);
    if (fd == -1) {
        if (errno == ENOENT) {
            /* File doesn't exist */
        } else if (errno == ...) [
            ...
        } else if (errno == EINTR) {
            goto retry_open; /* interrupted system call; let's retry */
        }
    }
    

    Many system calls can return EINTR, which means "interrupted system call", which means that something interrupted the kernel while it was doing your system call and it returned control to userspace, with the syscall unfinished. For example, your process may have received a Unix signal (e.g. you send it SIGSTOP by pressing Ctrl-Z on a terminal, or you resized the terminal and your process got a SIGWINCH). Most of the time EINTR means simply that you must retry the operation: if you Control-Z a program to suspend it, and then fg to continue it again; and if the program was in the middle of open()ing a file, you would expect it to continue at that exact point and to actually open the file. Software that doesn't check for EINTR can fail in very subtle ways!

    Once you have an open file descriptor, you can read from it:

    ssize_t
    read_five_bytes (int fd, void *buf)
    {
        ssize_t result;
    
        retry:
        result = read (fd, buf, 5);
        if (result == -1) {
            if (errno == EINTR) {
                goto retry;
            } else {
                return -1; /* the caller should cherk errno */
            }
        } else {
            return result; /* success */
        }
    }
    

    ... and one has to remember that if read() returns 0, it means we were at the end-of-file; if it returns less than the number of bytes requested it means we were close to the end of file; if this is a nonblocking socket and it returns EWOULDBLOCK or EAGAIN then one must decide to retry the operation or actually wait and try again later.

    There is a lot of buggy software written in C that tries to use the POSIX API directly, and gets these subtleties wrong. Most programs written in high-level languages use the I/O facilities provided by their language, which hopefully make things easier.

    I/O in Rust

    Rust makes error handling convenient and safe. If you decide to ignore an error, the code looks like it is ignoring the error (e.g. you can grep for unwrap() and find lazy code). The code actually looks better if it doesn't ignore the error and properly propagates it upstream (e.g. you can use the ? shortcut to propagate errors to the calling function).

    I keep recommending this article on error models to people; it discusses POSIX-like error codes vs. exceptions vs. more modern approaches like Haskell's and Rust's - definitely worth studying over a few of days (also, see Miguel's valiant effort to move C# I/O away from exceptions for I/O errors).

    So, what happens when one opens a file in Rust, from the toplevel API down to the system calls? Let's go down the rabbit hole.

    You can open a file like this:

    use std::fs::File;
    
    fn main () {
        let f = File::open ("foo.txt");
        ...
    }
    

    This does not give you a raw file descriptor; it gives you an io::Result<fs::File, io::Error>, which you must pick apart to see if you actually got back a File that you can operate on, or an error.

    Let's look at the implementation of File::open() and File::create().

    impl File {
        pub fn open<P: AsRef<Path>>(path: P) -> io::Result<File> {
            OpenOptions::new().read(true).open(path.as_ref())
        }
    
        pub fn create<P: AsRef<Path>>(path: P) -> io::Result<File> {
            OpenOptions::new().write(true).create(true).truncate(true).open(path.as_ref())
        }
        ...
    }
    

    Here, OpenOptions is an auxiliary struct that implements a "builder" pattern. Instead of passing bitflags for the various O_CREATE/O_APPEND/etc. flags from the open(2) system call, one builds a struct with the desired options, and finally calls .open() on it.

    So, let's look at the implementation of OpenOptions.open():

        pub fn open<P: AsRef<Path>>(&self, path: P) -> io::Result<File> {
            self._open(path.as_ref())
        }
    
        fn _open(&self, path: &Path) -> io::Result<File> {
            let inner = fs_imp::File::open(path, &self.0)?;
            Ok(File { inner: inner })
        }
    

    See that fs_imp::File::open()? That's what we want: it's the platform-specific wrapper for opening files. Let's look at its implementation for Unix:

        pub fn open(path: &Path, opts: &OpenOptions) -> io::Result<File> {
            let path = cstr(path)?;
            File::open_c(&path, opts)
        }
    

    The first line, let path = cstr(path)? tries to convert a Path into a nul-terminated C string. The second line calls the following:

        pub fn open_c(path: &CStr, opts: &OpenOptions) -> io::Result<File> {
            let flags = libc::O_CLOEXEC |
                        opts.get_access_mode()? |
                        opts.get_creation_mode()? |
                        (opts.custom_flags as c_int & !libc::O_ACCMODE);
            let fd = cvt_r(|| unsafe {
                open64(path.as_ptr(), flags, opts.mode as c_int)
            })?;
            let fd = FileDesc::new(fd);
    
            ...
    
            Ok(File(fd))
        }
    

    Here, let flags = ... converts the OpenOptions we had in the beginning to an int with bit flags.

    Then, it does let fd = cvt_r (LAMBDA), and that lambda function calls the actual open64() from libc (a Rust wrapper for the system's libc): it returns a file descriptor, or -1 on error. Why is this done in a lambda? Let's look at cvt_r():

    pub fn cvt_r<T, F>(mut f: F) -> io::Result<T>
        where T: IsMinusOne,
              F: FnMut() -> T
    {
        loop {
            match cvt(f()) {
                Err(ref e) if e.kind() == ErrorKind::Interrupted => {}
                other => return other,
            }
        }
    }
    

    Okay! Here f is the lambda that calls open64(); cvt_r() calls it in a loop and translates the POSIX-like result into something friendly to Rust. This loop is where it handles EINTR, which gets translated into ErrorKind::Interrupted. I suppose cvt_r() stands for convert_retry()? Let's look at the implementation of cvt(), which fetches the error code:

    pub fn cvt<T: IsMinusOne>(t: T) -> io::Result<T> {
        if t.is_minus_one() {
            Err(io::Error::last_os_error())
        } else {
            Ok(t)
        }
    }
    

    (The IsMinusOne shenanigans are just a Rust-ism to help convert multiple integer types without a lot of as casts.)

    The above means, if the POSIX-like result was -1, return an Err() from the last error returned by the operating system. That should surely be errno internally, correct? Let's look at the implementation for io::Error::last_os_error():

        pub fn last_os_error() -> Error {
            Error::from_raw_os_error(sys::os::errno() as i32)
        }
    

    We don't need to look at Error::from_raw_os_error(); it's just a conversion function from an errno value into a Rust enum value. However, let's look at sys::os::errno():

    pub fn errno() -> i32 {
        unsafe {
            (*errno_location()) as i32
        }
    }
    

    Here, errno_location() is an extern function defined in GNU libc (or whatever C library your Unix uses). It returns a pointer to the actual int which is the errno thread-local variable. Since non-C code can't use libc's global variables directly, there needs to be a way to get their addresses via function calls - that's what errno_location() is for.

    And on Windows?

    Remember the internal File.open()? This is what it looks like on Windows:

        pub fn open(path: &Path, opts: &OpenOptions) -> io::Result<File> {
            let path = to_u16s(path)?;
            let handle = unsafe {
                c::CreateFileW(path.as_ptr(),
                               opts.get_access_mode()?,
                               opts.share_mode,
                               opts.security_attributes as *mut _,
                               opts.get_creation_mode()?,
                               opts.get_flags_and_attributes(),
                               ptr::null_mut())
            };
            if handle == c::INVALID_HANDLE_VALUE {
                Err(Error::last_os_error())
            } else {
                Ok(File { handle: Handle::new(handle) })
            }
        }
    

    CreateFileW() is the Windows API function to open files. The conversion of error codes inside Error::last_os_error() happens analogously - it calls GetLastError() from the Windows API and converts it.

    Can we not call C libraries?

    The Rust/Unix code above depends on the system's libc for open() and errno, which are entirely C constructs. Libc is what actually does the system calls. There are efforts to make the Rust standard library not use libc and use syscalls directly.

    As an example, you can look at the Rust standard library for Redox. Redox is a new operating system kernel entirely written in Rust. Fun times!

    Update: If you want to see what a C-less libstd would look like, take a look at steed, an effort to reimplement Rust's libstd without C dependencies.

    Conclusion

    Rust is very meticulous about error handling, but it succeeds in making it pleasant to read. I/O functions give you back an io::Result<>, which you piece apart to see if it succeeded or got an error.

    Internally, and for each platform it supports, the Rust standard library translates errno from libc into an io::ErrorKind Rust enum. The standard library also automatically handles Unix-isms like retrying operations on EINTR.

    I've been enjoying reading the Rust standard library code: it has taught me many Rust-isms, and it's nice to see how the hairy/historical libc constructs are translated into clean Rust idioms. I hope this little trip down the rabbit hole for the open(2) system call lets you look in other interesting places, too.

  6. Moving to a new blog engine

    - meta

    In 2003 I wrote an Emacs script to write my blog and produce an RSS feed. Back then, I seemed to write multiple short blog entries in a day rather than longer articles (doing Mastodon before it was cool?). But my blogging patterns have changed. I've been wanting to add some more features to the script: moving to a page-per-post model, support for draft articles, tags, and syntax highlighting for code excerpts...

    This is a wheel that I do not find worth reinventing these days. After asking on Mastodon about static site generators (thanks to everyone who replied!), I've decided to give Pelican a try. I've reached the age where "obvious, beautiful documentation" is high on my list of things to look for when shopping for tools, and Pelican's docs are nice from the start.

    The old blog is still available in the old location.

    If you find broken links, or stuff that doesn't work correctly here, please mail me!

« Page 2 / 2