Rust things I miss in C

- rust

Librsvg feels like it is reaching a tipping point, where suddenly it seems like it would be easier to just port some major parts from C to Rust than to just add accessors for them. Also, more and more of the meat of the library is in Rust now.

I'm switching back and forth a lot between C and Rust these days, and C feels very, very primitive these days.

A sort of elegy to C

I fell in love with the C language about 24 years ago. I learned the basics of it by reading a Spanish translation of The C Programming Language by K&R second edition. I had been using Turbo Pascal before in a reasonably low-level fashion, with pointers and manual memory allocation, and C felt refreshing and empowering.

K&R is a great book for its style of writing and its conciseness of programming. This little book even taught you how to implement a simple malloc()/free(), which was completely enlightening. Even low-level constructs that seemed part of the language could be implemented in the language itself!

I got good at C over the following years. It is a small language, with a small standard library. It was probably the perfect language to implement Unix kernels in 20,000 lines of code or so.

The GIMP and GTK+ taught me how to do fancy object orientation in C. GNOME taught me how to maintain large-scale software in C. 20,000 lines of C code started to seem like a project one could more or less fully understand in a few weeks.

But our code bases are not that small anymore. Our software now has huge expectations on the features that are available in the language's standard library.

Some good experiences with C

Reading the POV-Ray code source code for the first time and learning how to do object orientation and inheritance in C.

Reading the GTK+ source code and learning a C style that was legible, maintainable, and clean.

Reading SIOD's source code, then the early Guile sources, and seeing how a Scheme interpreter can be written in C.

Writing the initial versions of Eye of Gnome and fine-tuning the microtile rendering.

Some bad experiences with C

In the Evolution team, when everything was crashing. We had to buy a Solaris machine just to be able to buy Purify; there was no Valgrind back then.

Debugging gnome-vfs threading deadlocks.

Debugging Mesa and getting nowhere.

Taking over the intial versions of Nautilus-share and seeing that it never free()d anything.

Trying to refactor code where I had no idea about the memory management strategy.

Trying to turn code into a library when it is full of global variables and no functions are static.

But anyway — let's get on with things in Rust I miss in C.

Automatic resource management

One of the first blog posts I read about Rust was "Rust means never having to close a socket". Rust borrows C++'s ideas about Resource Acquisition Is Initialization (RAII), Smart Pointers, adds in the single-ownership principle for values, and gives you automatic, deterministic resource management in a very neat package.

  • Automatic: you don't free() by hand. Memory gets deallocated, files get closed, mutexes get unlocked when they go out of scope. If you are wrapping an external resource, you just implement the Drop trait and that's basically it. The wrapped resource feels like part of the language since you don't have to babysit its lifetime by hand.

  • Deterministic: resources get created (memory allocated, initialized, files opened, etc.), and they get destroyed when they go out of scope. There is no garbage collection: things really get terminated when you close a brace. You start to see your program's data lifetimes as a tree of function calls.

After forgetting to free/close/destroy C objects all the time, or worse, figuring out where code that I didn't write forgot to do those things (or did them twice, incorrectly)... I don't want to do it again.

Generics

Vec<T> really is a vector of whose elements are the size of T. It's not an array of pointers to individually allocated objects. It gets compiled specifically to code that can only handle objects of type T.

After writing many janky macros in C to do similar things... I don't want to do it again.

Traits are not just interfaces

Rust is not a Java-like object-oriented language. Instead it has traits, which at first seem like Java interfaces — an easy way to do dynamic dispatch, so that if an object implements Drawable then you can assume it has a draw() method.

However, traits are more powerful than that.

Associated types

Traits can have associated types. As an example, Rust provies the Iterator trait which you can implement:

pub trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

This means that whenever you implement Iterator for some iterable object, you also have to specify an Item type for the things that will be produced. If you call next() and there are more elements, you'll get back a Some(YourElementType). When your iterator runs out of items, it will return None.

Associated types can refer to other traits.

For example, in Rust, you can use for loops on anything that implements the IntoIterator trait:

pub trait IntoIterator {
    /// The type of the elements being iterated over.
    type Item;

    /// Which kind of iterator are we turning this into?
    type IntoIter: Iterator<Item=Self::Item>;

    fn into_iter(self) -> Self::IntoIter;
}

When implementing this trait, you must provide both the type of the Item which your iterator will produce, and IntoIter, the actual type that implements Iterator and that holds your iterator's state.

This way you can build webs of types that refer to each other. You can have a trait that says, "I can do foo and bar, but only if you give me a type that can do this and that".

Slices

I already posted about the lack of string slices in C and how this is a pain in the ass once you get used to having them.

Modern tooling for dependency management

Instead of

  • Having to invoke pkg-config by hand or with Autotools macros
  • Wrangling include paths for header files...
  • ... and library files.
  • And basically depending on the user to ensure that the correct versions of libraries are installed,

You write a Cargo.toml file which lists the names and versions of your dependencies. These get downloaded from a well-known location, or from elsewhere if you specify.

You don't have to fight dependencies. It just works when you cargo build.

Tests

C makes it very hard to have unit tests for several reasons:

  • Internal functions are often static. This means they can't be called outside of the source file that defined them. A test program either has to #include the source file where the static functions live, or use #ifdefs to remove the statics only during testing.

  • You have to write Makefile-related hackery to link the test program to only part of your code's dependencies, or to only part of the rest of your code.

  • You have to pick a testing framework. You have to register tests against the testing framework. You have to learn the testing framework.

In Rust you write

#[test]
fn test_that_foo_works() {
    assert!(foo() == expected_result);
}

anywhere in your program or library, and when you type cargo test, IT JUST FUCKING WORKS. That code only gets linked into the test binary. You don't have to compile anything twice by hand, or write Makefile hackery, or figure out how to extract internal functions for testing.

This is a very killer feature for me.

Documentation, with tests

Rust generates documentation from comments in Markdown syntax. Code in the docs gets run as tests. You can illustrate how a function is used and test it at the same time:

/// Multiples the specified number by two
///
/// ```
/// assert_eq!(multiply_by_two(5), 10);
/// ```
fn multiply_by_two(x: i32) -> i32 {
    x * 2
}

Your example code gets run as tests to ensure that your documentation stays up to date with the actual code.

Update 2018/Feb/23: QuietMisdreavus has posted how rustdoc turns doctests into runnable code internally. This is high-grade magic and thoroughly interesting.

Hygienic macros

Rust has hygienic macros that avoid all of C's problems with things in macros that inadvertently shadow identifiers in the code. You don't need to write macros where every symbol has to be in parentheses for max(5 + 3, 4) to work correctly.

No automatic coercions

All the bugs in C that result from inadvertently converting an int to a short or char or whatever — Rust doesn't do them. You have to explicitly convert.

No integer overflow

Enough said.

Generally, no undefined behavior in safe Rust

In Rust, it is considered a bug in the language if something written in "safe Rust" (what you would be allowed to write outside unsafe {} blocks) results in undefined behavior. You can shift-right a negative integer and it will do exactly what you expect.

Pattern matching

You know how gcc warns you if you switch() on an enum but don't handle all values? That's like a little baby.

Rust has pattern matching in various places. It can do that trick for enums inside a match() expression. It can do destructuring so you can return multiple values from a function:

impl f64 {
    pub fn sin_cos(self) -> (f64, f64);
}

let angle: f64 = 42.0;
let (sin_angle, cos_angle) = angle.sin_cos();

You can match() on strings. YOU CAN MATCH ON FUCKING STRINGS.

let color = "green";

match color {
    "red"   => println!("it's red"),
    "green" => println!("it's green"),
    _       => println!("it's something else"),
}

You know how this is illegible?

my_func(true, false, false)

How about this instead, with pattern matching on function arguments:

pub struct Fubarize(pub bool);
pub struct Frobnify(pub bool);
pub struct Bazificate(pub bool);

fn my_func(Fubarize(fub): Fubarize, 
           Frobnify(frob): Frobnify, 
           Bazificate(baz): Bazificate) {
    if fub {
        ...;
    }

    if frob && baz {
        ...;
    }
}

...

my_func(Fubarize(true), Frobnify(false), Bazificate(true));

Standard, useful error handling

I've talked at length about this. No more returning a boolean with no extra explanation for an error, no ignoring errors inadvertently, no exception handling with nonlocal jumps.

#[derive(Debug)]

If you write a new type (say, a struct with a ton of fields), you can #[derive(Debug)] and Rust will know how to automatically print that type's contents for debug output. You no longer have to write a special function that you must call in gdb by hand just to examine a custom type.

Closures

No more passing function pointers and a user_data by hand.

Conclusion

I haven't done the "fearless concurrency" bit yet, where the compiler is able to prevent data races in threaded code. I imagine it being a game-changer for people who write concurrent code on an everyday basis.

C is an old language with primitive constructs and primitive tooling. It was a good language for small uniprocessor Unix kernels that ran in trusted, academic environments. It's no longer a good language for the software of today.

Rust is not easy to learn, but I think it is completely worth it. It's hard because it demands a lot from your understanding of the code you want to write. I think it's one of those languages that make you a better programmer and that let you tackle more ambitious problems.