Stuff Michael Meeks is doing

This is my (in)activity log. You might like to visit Collabora Productivity a subsidiary of Collabora focusing on LibreOffice support and services for whom I work. Also if you have the time to read this sort of stuff you could enlighten yourself by going to Unraveling Wittgenstein's net or if you are feeling objectionable perhaps here. Failing that, there are all manner of interesting things to read on the LibreOffice Planet news feed.

Older items: 2021: ( J F M A M J J A S O N D ), 2019, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, legacy html


LibreOffice under the hood: progress to 4.3.0

Today we release LibreOffice 4.3.0, packed with a load of new features for people to enjoy - you can read and enjoy all the great news about the user visible features from so many hardy developers, but there are of course also some contributors whose work is primarily behind the scenes in places that are not so easy to see. These are of course still vitally important to the project. It can be hard to extract those from the over fourteen thousand commits since LibreOffice 4.2 was branched, so let me expand:

User Interface Dialog / Layout

The UI migration to Glade based layout of VCL widgets is finally approaching the home straight; more than two hundred dialogs were converted this release; leaving the final dialogs rather hard to find - help appreciated. Many thanks to Caolán McNamara (Red Hat) - for his incredible work here, and also Szymon Kłos, Michal Siedlaczek, Olivier Hallot (EDX), Andras Timar (Collabora), Jan Holesovsky (Collabora), Katarina Behrens, Thomas Arnhold, Maxim Monastirsky, Manal Alhassoun, Palenik Mihály, and many others ... Thanks also to our translators who helped in the migration of strings.

Graph of progress in UI layout conversion

If you'd like to get involved in driving this to 100%, checkout Caolan's howto and his great blog: 99 to go update (now only 65) illustrated by this:

Build improvements

We've improved a lot this cycle in terms of buildability, and ease of comprehension - important for new contributors.

Visual Studio support

Not only did Jesus Corrius add initial support for Visual Studio 2013, but we had a major win from Honza Havlíček who (building on Bjoern Michaelsen (Canonical)'s similar KDevelop work) implemented building a Visual Studio project file - allowing much improved build / debugging support video or just: make vs2012-ide-integration.

OpenGL as a run-time dependency

In the past when we needed an OpenGL code-path we would link a separate shared library to OpenGL and then dynamically load that component - as for the OpenGL slideshow. In 4.3 we unified all of our OpenGL code to use glew and now have a central VCL API for initializing and linking in OpenGL, making it much easier to use in future. Another benefit of using glew is the ability to check for certain extensions at run-time dynamically to better adapt to your platform's capabilities rather than having to work vs. a baseline.

Pre-compiled-headers / PCH updates

Thomas Arhnold discovered that our pch files (used for accelerating windows building) had bit-rotted, and did a fine cleanup sweep across them. That significantly reduced build time for a number of modules.

Graph of compile-time speedup from improving pre-compiled headers

Mobile code-size reduction

A lot of work was put into LibreOffice 4.3 to allow us to shrink the code to fit a mobile footprint nicely. Thanks to Matus Kukan (Collabora) for splitting a large number of UNO components into individual factory functions - to allow the linker to garbage collect un-used components. Matus also created a python script solenv/bin/native-code.py to share the building of lists of components to statically link in for various combinations of functionality. Tor Lillqvist (Collabora) did some re-work on ICU to package the rather large data tables as a file instead of code. Vincent Saunders (Collabora) worked away to improve dwarfprofile to identify larger pieces of object file and where they came from. Jan Holesovsky de-coupled lots of accessibility code, and removed lots of static variables dragging in un-needed code. Miklos Vajna turned OOXML custom shape preset definitions (oox::drawingml::CustomShapeProperties::PresetsMap) from generated code to generated data: that allowed removal of 50k lines of code. Thanks to Tsahi Glik / CloudOn for funding this work.

Code quality work

There has been a lot of work on code quality and improving the maintainability and cleanliness of the code. Another 75 or so commits to fix cppcheck errors are thanks to Julien Nabet, along with the huge scad of daily commits to build without any compile warnings -Werror -Wall -Wextra on every platform with thanks primarily to Tor Lillqvist (Collabora), Caolán McNamara (Red Hat), and Thomas Arnhold.

Assert usage

Another tool that developers use to ensure they do not introduce new bugs is assertions; historically the OOo code base has had custom assertion facilities that can easily be ignored, and so most developers did just that; thanks to Stephan Bergmann (Red Hat), we have started to use the standard assert() macros in LibreOffice, which have the important advantage that they actually abort the program: if an assertion fails, developers see a crash that is rather harder to ignore than some text printed on the terminal. Thanks to all who asserted the truth.

Graph of number of run-time assertions
Rocking Coverity

We have been chewing through the huge amount of analysis from the Coverity Scan, well - in particular Caolán McNamara (Red Hat) has done an awesome job here; his blog on that is typically modest.

We now have a defect density of 0.08 - meaning 8x bugs in every 100,000 lines of code found by static checking. This compares rather favourably with the average open source project of this size which has 65 per 100,000 lines. Perhaps the most useful thing here is Coverity's report on new issues - many of which are rather more serious than the last few, lowest priority un-triaged reports.

This was achieved by 2679 commits, 88% of them from Caolán, and then Norbert Thiebaud, Miklos Vajna (Collabora), Noel Grandin, Stephan Bergmann (RedHat), Chris Sherlock, David Tardon (RedHat), Thomas Arnhold, Steve Yin (IBM), Kohei Yoshida (Collabora), Jan Holesovsky (Collabora), Eike Rathke (RedHat), Markus Mohrhard (Collabora) and Julien Nabet

Import and now export testing

Markus Mohrhard's great import/export crash testing has been expanded to 55,000+ problem/bug documents, now covering the PDF importer, and our crash and validation problem counts continue to drop. We import each of these documents, and then export them into each export format that we support; eg. an ODS would be re-exported as ODS, XLS, XLSX, etc. Markus also re-wrote and simplified the test script in python to make it simpler; however we routinely suffer from this test (running for 5 days and consuming a beefy machine) locking up Linux of several distributons, kernel versions, on both virtual and real hardware; which has a negative impact on usefulness.

Re-factoring big objects

In some cases LibreOffice has classes that seem to do 'everything' and include the kitchen sink too. Thanks to Valentin Kettner, Michael Stahl (RedHat) and Bjoern Michaelsen (Canonical) for helping to re-factor these. As an example SwDoc (a writer document) now inherits from only nine classes instead of nineteen, and the header file shrunk by more than three hundred lines.

Valgrind fixes

Valgrind continued to be a wonderful tool for finding and isolating leaks, and poor behavior of various bits of code - although normal code-paths are by now rather valgrind clean. Dave Richards from Largo very kindly donated us some CPU time on his new 80x CPU Linux machine to burn it in. We used that to run Markus' import/export testing under valgrind, and found and fixed a number of issues. valgrind logs here. We would be most happy to help others with their boxes in need of load testing.

Address / Leak Sanitizer

There are some great new ways of doing (compile time) code sanitisation, and thanks to Stephan Bergmann (RedHat) we're using them enthusiastically -fsanitize is available for Clang and gcc 4.9. It lets us do memory checking (like valgrind) but with visibility into stack corruption, and to do that very significantly faster. Some details on -fsanitize for libreoffice are available. Lots of leaks and badness have been fixed using the tool, thanks too to Markus Mohrhard, and Caolan McNamara.

Unit testing

We also built and executed more unit tests with LibreOffice 4.3 to avoid regressions as we change the code. Grepping for CPPUNIT_TEST() and CPPUNIT_ASSERT as last time we continued the trend of growth here:

Graph of number of unit tests and assertions
Our ideal is that every bug that is fixed gets a unit test to stop it ever recurring. With 1100 commits, and over eighty committers to the unit tests in 4.3 it is hard to list everyone involved here, apologies for that; what follows is a sorted list of those with over 20x commits to the qa/ directories: Miklos Vajna (Collabora), Kohei Yoshida (Collabora), Caolán McNamara (RedHat), Stephan Bergmann (RedHat), Jacobo Aragunde Pérez (Igalia), Tomaž Vajngerl (Collabora), Markus Mohrhard (Collabora), Zolnai Tamás (Collabora), Tor Lillqvist (Collabora), Michael Stahl (RedHat), Alexander Wilms

SAL_OVERRIDE and more

Traditionally C++ has allowed significant ambiguity in overriding methods, allowing the 'virtual' keyword to be ommitted in overrides, and also allowing accidentally polymorphic overrides. To prepare for the new C++ standard here we've annotated all of our virtual methods that are overridden in sub-classes with the SAL_OVERRIDE macro, to ensure that we are building our vtables correctly. Many thanks to Noel Grandin, and Stephan Bergmann (RedHat) for building a clang plugin to help to build annotation here with another to verify that the result stays consistent. That fixed several long-standing bugs. As a bonus when you read the code it is much easier to find the base virtual method declaration: it's the one that is not marked with SAL_OVERRIDE.

QA / bugzilla

This release the QA team has grown, and done some amazing work both triaging bugs, and also closing them, getting us back well under the totemic one thousand un-triaged bug barrier. Currently ~750 un-confirmed which is the lowest in over two years. Thanks to everyone for their great work there, sadly it is rather hard to extract credits for confirming bugs, but the respective hero list overlaps with the non-developer / top closers listed below.

We also had one of our best bug-hunting weekends ever around 4.3 see Joel Madero's write-up. The QA team are also doing excellent job with our bibisect git repositories to isolate regressions to small blocks of commits - which makes life significantly easier for developers.

One metric we watch in the ESC call is who is in the top ten in the freedesktop Weekly bug summary. Here is a list of the top twenty people who have appeared most frequently in the weekly list of top ten bug closers in order of frequency of appearance: Jorendc, Kohei Yoshida (Collabora), Maxim Monastirsky, tommy27, Joel Madero, Caolán McNamara (RedHat), Foss, Jay Philips, m.a.riosv, Julien Nabet, Sophie Gautier (TDF), Cor Nouws, Michael Stahl (RedHat), Jean-Baptiste Faure, Andras Timar (Collabora), Adolfo Jayme, ign_christian, Markus Mohrhard (Collabora), Eike Rathke (RedHat), Urmas. And thanks to the many others that helped to close so many bugs for this release.

Bjoern Michaelsen (Canonical) also write up a nice taxonomy of our twenty five thousand reported bugs so far, and provided the data for this nice breakdown:

Graph of bug stats over the development of 4.3

Code cleanup

Code that is dirty should be cleaned up - so we did a lot of that.

The final death of UniString

While we killed our last tools/ string class in 4.2 and switched to clean, uniform OUStrings everywhere - we were still using some 16bit quantities to describe text offsets elsewhere. Thanks to Caolán McNamara (Red Hat) for finally enabling writer to have >64k paragraphs - a long requested feature by a certain type of user, see the related blogpost.

VCL code / structure cleanup

The Visual Class Libraries - the LibreOffice native toolkit has not been given the love it deserves in recent years. Many thanks to Chris Sherlock for several hundred commits - starting to cleanup VCL. That involves lots of good things - giving the code a more logical structure so it is easy to find methods; systematically writing doxygen documentation for API methods, ensuring that API methods have sensible, descriptive names and starting to unwind some poor legacy design decisions; much appreciated.

Ongoing German Comment redux

We continued to make some progress on translating our last lingering German comments across the codebase to good, crisp technical English. Many thanks to Luc Castermans, Sven Wehner, Christian M. Heller, Philipp Weissenbacher, Stefan Ring, Philipp Riemer, Tobias Mueller, Chris Sherlock, Alexander Wilms and others. We also reduced the number of false positives and accelerated the bin/find-german-comments tool in this cycle.

Graph of remaining lines of German comment to translate
Automated code re-factoring using Clang

One hero of code cleaning is Noel Grandin who is constantly improving the code in many ways; eg. writing out un-necessary duplicate code to use standard wrappers such as SimpleReferenceObject. Noel has been heavily involved in Clang plugins to re-write a lot of our error prone binary file format / stream overrides pStream >> nVar seems like a great idea until you realise that an unexpected change to the type of nVar far away tweaks the file format. These operators are now all re-written to explicit ReadFloat type methods enhancing the robustness of the code to changes. Noel also created plugins to inline simple member functions, detect inefficient passing of uno::Sequence, and OUString. Stephan Bergmann (RedHat) also wrote a number of advanced linting tools, checks for de-referencing NULL pointers, quickly catching inlining problems on Linux that cause most grief on Windows, and re-writing un-necessary uses of sal_Bool to bool. Stephan also wrote a plugin to find unused functions and unused functions in templates, as well as warning on illicit conversions of literal to bool e.g. if (n == KIND_FOO || KIND_BAR). All of this improves the readability, consistency, reliability and in some cases performance of the code.

Improving lifecycle

Takeshi Abe invested lots of time this cycle in improving our often unhelpful object lifecycle. Using smart pointers not only makes the code more readable and often shorter, but also exception safe which is very useful.

DocTok cleanup

This cleanup saved nearly 80k lines of code and make the codebase much simpler to understand thanks to Miklos Vajna (Collabora) you can see the before & after pictures in his blog.

Holding the line on performance

Performance is one of those hard things to keep solid. It has an alarming habit of bit-rotting when your back is turned. That's why Matus Kukan (Collabora) has built a test machine that routinely builds LibreOffice and runs a suite of document loads, conversions etc. under callgrind. Using callgrind's simulated CPU has the beautiful property of ensuring repeatable behaviour, and thus making any small reduction or improvement in performance noticeable and fixable. It is easy to see that in a graph - admire the crisp flatness of the graph between significant events. The X axis is time (annotating the axis with git hashes is not so photogenic).

Graph of various documents performance

Often we only check performance just before a release, its interesting to see here the big orange hump from a performance fragility found and fixed as a direct result of these tests. Raw callgrind data is made available for trivial examination of the latest traces along with a flat ODS of the previous runs.

Getting involved

I hope you get the idea that more developers continue to find a home at LibreOffice and work together to complete some rather significant work both under the hood, and also on the surface. If you want to get involved there are plenty of great people to meet and work alongside. As you can see individuals make a huge impact to the diversity of LibreOffice (the colour legends on the right should be read left to right, top to bottom, which maps to top down in the chart):

Graph showing individual code committers per month

And also in terms of diversity of code commits, we love to see the unaffiliated volunteers contribution by volume, though clearly the volume and balance changes with the season, release cycle, and volunteers vacation / business plans:

Graph of number of commits per month by affiliation

Naturally we maintain a list of small, bite-sized tasks which you can use to get involved at our Easy Hacks page, with simple build / setup instructions. It is extremely easy to build LibreOffice, each easy-hack should have code pointers and be a nicely self contained task that is easy to solve. In addition some of them are really nice-to-have features or performance improvements. Please do consider getting stuck in with something.

Graph of progress closing easy hacks over time

Another thing that really helps is running pre-release builds and reporting bugs just grab and install a pre-release and you're ready to contribute alongside the rest of the development team.

Conclusion

LibreOffice 4.3 is the next in a series of releases that incrementally improve not only the features, but also the foundation of the Free Software office suite. Please be patient, it is just the first in a long series of monthly 4.3.x releases which will bring a stream of bug fixes and quality improvements over the next months as we start working in earnest on LibreOffice 4.4.

I hope you enjoy LibreOffice 4.3.0, thanks for reading, and thank you for supporting LibreOffice.

Raw data for many of the above graphs is available.

A French translation of much of this is available.


My content in this blog and associated images / data under images/ and data/ directories are (usually) created by me and (unless obviously labelled otherwise) are licensed under the public domain, and/or if that doesn't float your boat a CC0 license. I encourage linking back (of course) to help people decide for themselves, in context, in the battle for ideas, and I love fixes / improvements / corrections by private mail.

In case it's not painfully obvious: the reflections reflected here are my own; mine, all mine ! and don't reflect the views of Collabora, SUSE, Novell, The Document Foundation, Spaghetti Hurlers (International), or anyone else. It's also important to realise that I'm not in on the Swedish Conspiracy. Occasionally people ask for formal photos for conferences or fun.

Michael Meeks (michael.meeks@collabora.com)

pyblosxom