Stuff Michael Meeks is doing

This is my (in)activity log. You might like to visit Collabora Productivity a subsidiary of Collabora focusing on LibreOffice support and services for whom I work. Also if you have the time to read this sort of stuff you could enlighten yourself by going to Unraveling Wittgenstein's net or if you are feeling objectionable perhaps here. Failing that, there are all manner of interesting things to read on the LibreOffice Planet news feed.

Older items: 2021: ( J F M A M J J A S O N D ), 2019, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, legacy html


2010-10-04: Monday.

Re-factoring C++ and the vtable problem

The opportunity

Consider that you have large piece of C++, over many years, for reasons unknown, it has accumulated (at least) four boolean types: FASTBOOL, BOOL, sal_Bool and of course the native boolean type bool. Now consider a class with a virtual method in it:

class Base {       ... virtual void setValue (BOOL b); /* A */ }
class Sub : Base { ... virtual void setValue (BOOL b); /* B */ }

Please notice that these two classes are highly unlikely to be in the same header file, and may even be separated by many leagues of code, so you might never notice.

A small cleanup

Now imagine that we want to cleanup this obsolete 'BOOL' to a 'bool' - that would be nice and pretty; so - since we are hacking on 'Sub' - we change it there, and we change all the callers in Sub's implementation:

class Base {       ... virtual void setValue (BOOL b); /* A */ }
class Sub : Base { ... virtual void setValue (bool b); /* B2 */ }

At this point we have a problem - we suddenly broke the functionality of the Base class, and anyone calling the virtual method A. This is because without realising it we introduced an entirely new method with the same name and a different signature - by the miracle of polymorphism. ie. now we have two virtual 'setValue' methods, and B2 does not override A. That is not good.

Of course, because this is a feature, the compile will just love it, and suddenly, magically compile a completely different piece of code, and better - since the change is almost invisible to the casual reader this will be a nightmare to debug. Interestingly changes such as the above may seem rare, but are only the top of the iceberg - a more common incidence of these crops up in 64bit porting, where there is some mix of different types down the virtual function chain - someone overrides with a different typedef - all of which previously resolved to 'int'.

How to avoid some grief

The (rather dumb) tool I hacked up today tries to avoid this sort of problem. It does this by noticing that when we make this sort of slip, we add a new virtual method to Sub's v-table. Thus - if we simply print-out all the vtables sizes (in slots) before, and then after a change, we can rapidly check if they match. If there is a disparity - we can find the class that changed size, wind back to (all of) its parents, and work out which one is missing a suitable change. This technique was remarkably succcessful in finding 64bit portability problems (by comparing 64bit vs. 32bit builds) when Kendy was working on the original OpenOffice.org port to 64bit architectures. The tool is here

How to re-introduce grief

One final deep joy about C++ is that there is no real need to mark an overriding method as virtual (though most people, most of the time tend to). So we can write the original problem just as well as:

class Base { ... virtual void setValue (BOOL b); /* A */ }
class Sub : Base {    ... void setValue (BOOL b); /* B3 */ }

It is worth noting at this point, C#'s much better, and more readable syntax, mandating an 'override' for B3. Then lets re-add the great change we made before:

class Base { ... virtual void setValue (BOOL b); /* A */ }
class Sub : Base {    ... void setValue (bool b); /* B4 */ }

So - now we have an even nastier beastie - in this case by changing the type, we suddenly stop overriding A, but instead of creating a new virtual method, we create a normal method B4. Worse - this doesn't change the size of the vtable - so it is even harder to notice. At this point, at the moment - you loose.

All is not lost (in future)

Fortunately, this shouldn't be entirely the end of the story; in subsequent iterations, it should be easy to add a 'method count' that can be calculated for each class - and compared before and after, this would allow B4 to be detected rather easily. Perhaps tomorrow I'll expand the tool to do that too. Probably I am missing some other obvious gotchas, but getting this wrong - particularly in deeply inherited hierarchies, with complex interactions can create some particularly nasty problems to debug.


My content in this blog and associated images / data under images/ and data/ directories are (usually) created by me and (unless obviously labelled otherwise) are licensed under the public domain, and/or if that doesn't float your boat a CC0 license. I encourage linking back (of course) to help people decide for themselves, in context, in the battle for ideas, and I love fixes / improvements / corrections by private mail.

In case it's not painfully obvious: the reflections reflected here are my own; mine, all mine ! and don't reflect the views of Collabora, SUSE, Novell, The Document Foundation, Spaghetti Hurlers (International), or anyone else. It's also important to realise that I'm not in on the Swedish Conspiracy. Occasionally people ask for formal photos for conferences or fun.

Michael Meeks (michael.meeks@collabora.com)

pyblosxom