Stuff Michael Meeks is doing

Today we release LibreOffice 5.0.0, a new foundation for ongoing work over the next months and years. It also has a fine suite of new features for people to enjoy - you can read and enjoy all the great news about the user visible features from so many great hackers, but there are, as always, many contributors whose work is primarily behind the scenes, and a lot of work that is more technical than user-facing. That work is, of course, still vitally important to the project. It can be hard to extract those from around eleven thousand commits since LibreOffice 4.4 was branched, so let me try to expand:

VCL - Toolkit Improvements

One of the largest areas of work in LibreOffice 5.0 is in the VCL toolkit, the graphics toolkit LibreOffice uses for all the widgets and rendering. 5.0 means modernizing and improving several aspects of it and bringing them into line with other cross-platform toolkits.

Mainloop / idle handling

This is a rather major change that landed in 5.0, and is a vital under-pinning to the ongoing attempts to make VCL and LibreOffice more efficient and performant, thanks to Jennifer Liebel and Tobias Madl (interview). The essential problem with our previous approach to deciding what to do next in LibreOffice (eg. should I do some more word-counting ? or process some deferred window re-sizing work ? or re-paint a windows' contents ?) was decided by a rather arbitrary set of random number of millisecond timeouts eg. 30ms for a re-paint, 50ms for a re-size - which was not only race prone, but also horribly inefficient - there being no solid basis to these pseudo-random numbers.

Thankfully in LibreOffice 5.0 we have a new 'idle' concept that prioritizes tasks we want to get completed and allows them to be executed in order at top speed. This combined with Jan Holesovsky (Collabora)'s work to ensure we can queue sub 10ms timeouts on Windows means we finally have a reasonably useful mainloop.

This has also helped us to find some power-draining bad behavior that was previously less visible - since frequently executed (say every 30ms) shortish tasks that wastefully woke the CPU without making any progress, now cause a 100% CPU spike - and can be addressed. Thanks to Ashod Nakashian for attacking several of these.

Lifecycle re-work (VclPtr)

For much of its lifetime, VCL widget lifecycle was a bit of a mystery, even to VCL itself. Widgets could be heap allocated, stack allocated, or be members of other widgets. If heap allocated they could be wrapped in various flavours of shared pointers. As such predicting when a widgets would be destroyed, and/or following its lifecycle across the code was non-trivial. Inside VCL we often used dog-tags: special listeners that would turn null when an object was destroyed to try to avoid referencing an object involved in several back-to-back callbacks. Unfortunately this support was rather incomplete, and lots of code would end up deferring deleting heap allocated widgets until idle in an attempt to avoid problems.

In an attempt to solve all of this mess, we now have a single smart pointer type: VclPtr to reference-count all Window (and OutputDevice) sub-classes, which are now always heap allocated. This gives a consistent lifecycle mechanism, which is even documented. We moved to a 'dispose' mechanism to break reference-cycles, replacing the previous explicit or implicit 'delete' mechanism, and have made lots of methods safe to call even on disposed widgets. This should, in the end provide predictable lifecycle, and much less fragile destruction code paths making it easier to safely re-factor code. In the meantime we continue to iron out problems, thanks to Noel Grandin (Peralex) for his invaluable help to me with this work, and Caolan McNamara (RedHat) and Julien Nabet among others for helping to fixup some of the aftermath. It is hoped that (ultimately) nearly all long-lived VCL types will use a similar lifecycle mechanism. This work was made possible by Caolan's huge re-factor to use VclBuilder for all dialogs.

Modern rendering: RenderContext

A bold attempt to switch the code-base from immediate rendering to deferred rendering was initiated. LibreOffice previously rendered what is seen on the screen in one of two ways - either immediately: ie. when you press an 'A' it tries to nail the pixels for 'A' immediately to the screen; or - via. a very deferred (30+ms delay) idle rendering = callback.

This situation is really non-ideal for modern rendering hardware and APIs - where we want to ensure the scene is fully and perfectly painted as a whole before showing it on-screen. Happily with the new idle handling work, there is no longer a hard-coded delay before deferred rendering can occur; so we started the task of removing immediate rendering, and replacing it with deferred rendering. This means replacing explicit rendering calls with area invalidation to queue this area for later re-rendering. In many cases this can remove any visible flickering and other intermediate rendering artifacts as the UI refreshes. Many thanks to Tomaž Vajngerl (Collabora), Miklos Vajna (Collabora), with help and fixing from Krisztian Pinter, Noel Grandin (Peralex), Jan Holesovsky (Collabora), Caolán McNamara (RedHat), Laszlo Nemeth (Collabora)

Gtk3 backend: Wayland

An very rough, initial gtk3 port was hacked together long ago by yours truly to prototype LibreOffice online via gdk-broadway. However thanks to Caolán McNamara (RedHat) who has done the 80% of the hard work to finish this, giving us a polished and complete VCL backend for gtk3. His blog entry focuses on the importance of this for running LibreOffice natively under wayland - the previous gtk2 backend was heavily tied to raw X11 rendering, while the new gtk3 backend uses CPU rendering via the VCL headless backend, of which more below.

OpenGL rendering improvements

The OpenGL rendering backend also significantly matured in this version, allowing us to talk directly to the hardware to accelerate much of our rendering, with large numbers of bug fixes and improvements. Many thanks to Louis-Francis Ratté-Boulianne (Collabora), Markus Mohrhard, Luboš Luňák (Collabora), Tomaž Vajngerl (Collabora), Jan Holesovsky (Collabora), Tor Lillqvist (Collabora), Chris Sherlock and others . It is hoped that with the ongoing bug-fixing here, that this can be enabled by default as a late feature, after suitable review, for LibreOffice 5.0.1 or at the outside 5.0.2.

LibreOfficeKit Improvements

LibreOfficeKit provides an easy way to re-use the rendering, file-format and now editing core from LibreOffice. In the last six months it has gone from being primaily useful for file format conversion, to being the foundation of LibreOffice on Android, and Online.

Headless rendering improvements

LibreOfficeKit re-uses our headless rendering backend, which allows us to render documents without underlying OS assistance, ie. without X11, Windows, OS/X etc. A number of performance and other rendering fixes were implemented here as part of the gtk3 and online work (headless rendering is also used on Android while our GL backend is maturing for that platform). Thanks to Caolán McNamara (RedHat) and Michael Meeks (Collabora).

Android editing extensions

Android editing builds on top of the LibreOfficeKit editing features, and provides the user with the Android equivalent of the gtktiledviewer feature list, like native cursor, text and graphic selection, resizing and more. Thanks to The Document Foundation & their generous donors these significant API extensions and core work are thanks to Miklos Vajna, Tor Lillqvist, Andrzej Hunt, Siqi Liu, Mihai Varga, Tomaž Vajngerl and Jan Holesovsky all of Collabora, as well as work from Pranav Kant (GSOC), and cleanups from Stephan Bergmann (RedHat)..

LibreOffice Online bits

LibreOfficeKit (alongside an adapted leaflet) is the basis for the new work targetting LibreOffice at the Cloud, checkout the code and a presentation. Huge amounts of tangled heavy lifting here were done thanks to: Tor Lillqvist, Mihai Varga, Jan Holesovsky, Henry Castro and Miklos Vajna, all of Collabora. With thanks to IceWarp for funding this work.

Conversion performance improvements

LibreOfficeKit provides a nice simple, clean API for loading and saving (ie. converting) documents. Thanks to Laszlo Nemeth (Collabora) and Mihai Varga (Collabora) we now have a new filter attribute: SkipImages to allow a significant acceleration for the use-case of converting any file type to HTML. This is really useful for re-using the wide range of LibreOffice filters to do document text indexing - giving a very significant speedup for large and complex documents. Another vital win here was to avoid doing an accurate word-count before export (for document statistics). Document conversion to text with this option should be significantly quicker for certain documents.

Build / platform improvements

Clawing back compilation time

With increasing template use in headers, compile times have taken a turn for the slower, thanks to Michael Stahl (Red Hat) who created a nice script bin/includebloat script to locate the largest and most problematic headers to be removed. As an example dropping boost/utility.hpp from several places removes ~830Mb of boost/preprocessor/seq/fold_left.hpp pre-processing.

Win64 porting action

The 5.0 release debuts a Win64 build - with many thanks to David Ostrovsky (CIB) with help from Thorsten Behrens (CIB), Norbert Thiebaud, Stephan Bergmann (RedHat) and others fixing and cleaning up a number of nasty platform-specific corner-cases across the suite. While we have had many 64bit platforms for years, the Windows LLP64 model can create issues.

Code quality work

Work is ongoing around code quality in many areas, with 120 or so cppcheck fixes thanks to Caolán McNamara (RedHat), Michael Weghorn, Julien Nabet, Noel Grandin (Peralex), and others. along with the daily commits to build without any compile warnings -Werror -Wall -Wextra etc. on many platforms with thanks primarily to Tor Lillqvist (Collabora) and Caolán McNamara (Red Hat) - this category of problems however is shrinking thanks to the increasing use of CI.

Coverity at ~zero

Having hit nearly zero coverity issues Caolán McNamara (RedHat) (with some help from others) does an awesome job of keeping the count at (or nearly at) zero each week with ~360 commits this cycle. We routinely have a few new issues in each build and fix a few others, the total being currently two issues (of 6+ million lines analyzed). Hopefully keeping the numbers at zero is a reasonably achiveable goal:

Graph of coverity static checking issues

PVS-Studio

The company OOO "Program Verification Systems" develops the PVS-Studio static analysis tool and made results of a one-time analysis run available to LibreOffice developers. Dozens of reported issues were fixed by Caolán McNamara (RedHat), Michael Stahl (RedHat), David Tardon (RedHat), and Markus Mohrhard. You can read more about that (with cartoon) here.

Import and export testing

The new TDF donor funded crash-testing hardware combined with a significant effort from Caolán McNamara (RedHat), Michael Stahl (RedHat), Markus Mohrhard and several others we have got the number of (paranoid) assertions and/or crashes on import of our significant bugzilla document corpus (of 75k+ dodgy bug documents) down to effectively zero. It's wonderful to be able to catch commits that cause regressions here and nail them within days on master, before they have a chance to escape into the user-base.

Ongoing work here is to compile the crash-testing binaries with Address Sanitizer as well as starting to fuzz various document types and expanding the set of input file-types.

Clang plugins / checkers

We have continued to add to our clang compiler plugins; a quick git grep for 'Registration' in compilerplugins shows that we've gone from 38 to 59 in the last six months (double the growth of last release). These are used to check for all manner of nasty gotchas, and also to automatically re-write various problematic bits of code. Many are run automatically by tinderboxes to catch badness. Thanks to: Stephan Bergmann (Red Hat) and Noel Grandin (Peralex) for their hard work on these checkers this cycle.

The new plugins do all sorts of things, and usually come complete with a set of relevant fixes for the underlying code; here are some examples:

loplugin:loopvartoosmall - checks that the bit width of a loop index variable is at least the size of the thing it is indexing over. In the case of unsigned values, this can prevent infinite loops. In other cases, it simply avoids us truncating data.
loplugin:staticmethods - looks for methods that can be declared as static. This is both more efficient and makes the code more understandable, because it clearly indicates that the method does not depend on any object state.
loplugin:vclwidget - enforces the various rules surrounding the usage of our new VclPtr ref-counting smart pointer for VCL objects. Ref-counting classes can be tricky to use in corner cases - so having a checker that validates at compile-time as much of the otherwise implicit rules is very useful.
loplugin:constantfunction - looks for functions that should be removed/inlined, since they always return the same value. That is useful for finding old code that has become redundant due to refactoring.
simplifybool - this detects and de-tangles particularly tortured boolean logic expressions to simplify them. Some examples are converting a ? false : true to !a.
cstylecast and redundantcast- these detect and warn about C-style casts eg. class Foo; Foo *pFoo = (Foo *)pBaa; ie. when a type is incomplete. These should really be safer static_casts. Also we detected and removed un-necessary casts to make the code easier to understand.
de-virtualization - this detects virtual methods that are never over-ridden, to replace them with better performing non-virtual methods.
lopluign:deletedspecial - finds special member function declarations that are left undefined, which should actually be marked as = delete to entail further compiler optimizations and warnings.

Other sets of cleanups were also clang assisted such as Noel's attack on cleaning up, making consistent and nicely scoping our enumerations. Stephan's drive to detect and remove implicit bool conversion, switching many inline methods from sal_Bool (really an unsigned char) to a true 'bool' whever possible, and several other helpful plugins.

Unit testing

We also built and executed more unit tests with LibreOffice 5.0 to avoid regressions as we change the code. Grepping for the relevant TEST and ASSERT macros we continue to grow the number of unit tests:

Graph of number of unit tests and assertions

Our ideal is that every bug that is fixed gets a unit test to stop it ever recurring. With around 800 commits, and over seventy committers to the unit tests in 5.0 it is hard to list everyone involved here, apologies for that; what follows is a sorted list of those with over 10x commits to the relevant qa/ directories: Miklos Vajna (Collabora), Markus Mohrhard, Caolán McNamara (RedHat) Stephan Bergmann (RedHat), Noel Grandin (Peralex), Michael Meeks (Collabora), Michael Stahl (RedHat), Zolnai Tamás, Tor Lillqvist (Collabora), Bjoern Michaelsen (Canonical), Eike Rathke (RedHat), Takeshi Abe, Andras Timar (Collabora), PriyankaGaikwad (Synerzip)

Windows Testing

While we have had a subset of unit tests that we run at compile time on Windows, our larger battery of make check tests has been hindered by strange thread-affine behavior on Windows related to handling various Window and event resources. Thanks to various locking, and inter- thread messaging fixes from Michael Stahl (RedHat), and Stephan Bergmann (Redhat) we now have far more robust and reliable unit testing on Windows.

QA / bugzilla

One metric we watch in the ESC call is who is in the top ten in the freedesktop Weekly bug summary. Here is a list of the people who have appeared more than five times in the weekly list of top bug closers in order of frequency of appearance: Adolfo Jayme, Beluga, Caolán McNamara (RedHat), raal, Julien Nabet, Jean-Baptiste Faure, Markus Mohrhard, m.a.riosv, Gordo, V Stuart Foote, Eike Rathke (RedHat), Andras Timar (Collabora), Alex Thurgood, Yousuf (Jay) Philips, Miklos Vajna (Collabora), Joel Madero, Cor Nouws, Michael Stahl (RedHat), Michael Meeks (Collabora), Matthew Francis, David Tardon (Redhat), tommy27, Timur, Robinson Tryon (qubit) (TDF). And thanks to the many others that helped to close and triage so many bugs for this release.

Jenkins / CI

Thanks to Norbert Thiebaud - we now have some rather excellent Jenkins / CI integration with gerrit, to allow us to test-build all incoming patches across our three major platforms. Using CI to test patches before pushing them to master has become another valuable tool to increase the quality of master (and thus its accessibility to casual builders), and to allow those without access to Windows & Mac devices to check their code builds there. Thanks to ByteMark and TDF donors we hope to have even more, fast hardware to throw at the CI build farm soon making this an even more attractive route to test submitted code. With over 25,000 builds from 13 build slaves since the beginning of the year (which compares favourably with the around 11,000 commits, it is hoped that with enough hardware we can compile and run tests vs. all incoming commits in future without introducing excessive latency.

Also for the next development cycle we have enabled tests beyond those run during compile. We enable a slew of extra assertions in a dbgutil build and run make check at least on Linux to apply a much larger set of extra tests to each individual commit.

Expanded bibisect

In this cycle we expanded the great Bi(nary)Bisect(ion) repositories - which contain thousands of compressed pre-built binaries to allow end-users to quickly ascertain almost down to a single commit that introduced a regression long after the date - to include Mac and Windows builds for the 5.0 epoch (ie. the range from the 4.4 branch to 5.0 branching. The 5.1 epoch is being built and refreshed reasonably regularly. Many thanks to Norbert Thiebaud, Matthew Jay Francis & Robinson Tryon (qubit) (TDF)

Code cleanup

Code that is dirty should be cleaned up - so we did a lot of that left & right:

Upgrading to (a) C++11 subset

In the 5.0 release we started to move more aggressively to the subset of C++11 we can now use with our updated compiler baselined. Features such as variadic templates, simpler initializations, and more. Work also involved removing deprecated std:: functions such as std::ptr_fun using std::any_of & std::none_of and other newer constructs such as auto. Thanks goes to many hackers cleaning the code including Stephan Bergmann (RedHat), Takeshi Abe, Nathan Yee, Bjoern Michaelsen (Canonical) and others.

Framework Cleanup

Thanks to Maxim Monastirsky we saved many hundreds lines of duplicate code from the framework, by creating nice generic controllers that could be controlled via small, clean XML configuration descriptions - great to see such cleanups.

Expanding integer id types

A number of legacy structures in LibreOffice have used 16bit indicees, and stored / serialied these to various structures for many years. This can cause problems with very large mail merged - such as those in-use at Munich City. Thanks to Katarina Behrens (CIB) - Writer in 5.0 allows more than 64k of: Page Descriptions, Sections and Style Names.

Ongoing German Comment redux

We continued to make progress, but somehow the last ~5000 lines of comment persistently appear to defy translation. Answers by E-mail postcard from German speakers much appreciated. Many thanks to: Michael Weghorn, Michael Jaumann (Munich), Daniel Sikeler (Munich), Albert Thuswaldner, Christian M. Heller, Philipp Weissenbacher. There are now only the following eight modules left to do: include, reportdesign, rsc, sc, sfx2, stoc, svx, sw

Graph of remaining lines of German comment to translate

std:: containers

A systematic set of improvements to our usage of the std:: containers has been going on through the code. Things like avoiding inheritance from std::vector, changing std::deque to std::vector and starting to use the newer C++ constructs for iteration like for (auto& it : aTheContainer) { ... }. There are many people to credit here, thanks to Stephan Bergmann (Red Hat), Takeshi Abe, Tor Lillqvist (Collabora), Caolán McNamara (Red Hat), Michaël Lefèvre, and many others.

Writer

Thanks to Bjoern Michaelsen (Canonical) we have had a few key, long desired writer cleanups in 5.0. These include:

Improvement and re-factoring of a number of core Writer UNO implementations around tables, reducing the code by some 20% and eliminating some code duplication. Unit tests have been added and it should now be easier to add further, including tests that check the writer core.
Clean-up some very old classes implementing the observer pattern in a clunky way (SwClient/SwModify), also adding a test harness to clarify its interface. Ultimately, the goal is to move away from this implementation towards one of the more modern implementations we use elsewhere. This work should help find a migration path later.
Consolidated multiple ad-hoc implementations of intrusive double-linked lists into one sw::Ring and adding tests to clarify its interface.
Use compiler plugins to hunt for both the deepest cascading conditional expressions and assignments happening during evaluation of conditionals, which is errorprone, and expand the worst offenders into something that is more readable and debuggable.

writerfilter's resourcemodel

The resourcemodel building block of writerfilter (that handles Writer’s DOCX and RTF import in LibreOffice) was basically a bucket of old and unused stuff. The few still needed pieces from it are now moved into the relevant mapper/tokenizer/filter parts, and the rest is now removed. You can read more detail thanks to Miklos Vajna (Collabora).

Other wins

We had a number of other wins that are somewhat difficult to categorize, but well worth noting:

OOXML vs. MS Office 2007

MS Office 2007 has an unhelpfully different set of default values for many of its attributes - ie. the same XML (with an attribute ommitted) can produce different results in Office 2007 and later versions. Clearly this is a little irritating. Thanks to Markus Mohrhard for adding some infrastructure (and a set of fixes) for known problematic attributes in this regard. This should improve our interoperability with the zoo of documents out there.

Android - file-system abstraction

Thanks to TDF's donors and Jacobo Aragunde Pérez (Igalia) we implemented an abstract file-system API for Android - to allow arbitrary file-system backends to be plugged in (in a separate thread). An example OwnCloud backend was implemented to show-case this.

Base bits

Thanks to Matthew Nicholls we removed a couple of thousand lines of redundant wrappers in svx's dbtoolsclient - which was duplicated elsewhere in connectivity. Great to see this much cruft leave the code-base.

Getting involved

I hope you get the idea that more developers continue to find a home at LibreOffice and work together to complete some rather significant work both under the hood, and also on the surface. If you want to get involved there are plenty of great people to meet and work alongside. As you can see individuals make a huge impact to the diversity of LibreOffice (the colour legends on the right should be read left to right, top to bottom, which maps to top down in the chart):

Graph showing individual code committers per month

And also in terms of diversity of code commits, we love to see the unaffiliated volunteers contribution by volume, though clearly the volume and balance changes with the season, release cycle, and volunteers vacation / business plans:

Graph of number of commits per month by affiliation

Naturally we maintain a list of small, bite-sized tasks which you can use to get involved at our Easy Hacks page, with simple build / setup instructions. It is extremely easy to build LibreOffice, each easy-hack should have code pointers and be a nicely self contained task that is easy to solve. In addition some of them are really nice-to-have features or performance improvements. Please do consider getting stuck in with something.

Another thing that really helps is running pre-release builds and reporting bugs just grab and install a pre-release and you're ready to contribute alongside the rest of the development team.

Conclusion

LibreOffice 5.0 is a great new foundation for building the next series of releases which will incrementally improve not only features, but also the foundation of the Free Software office suite. It is of course not perfect yet, this is the first in a long series of monthly 5.0.x releases, and six monthly 5.x releases which will bring a stream of bug fixes and quality improvements over the next months and years.

I hope you enjoy LibreOffice 5.0.0, thanks for reading, don't forget to checkout the user visible feature page and thank you for supporting LibreOffice.

Raw data for many of the above graphs is available.

My content in this blog and associated images / data under images/ and data/ directories are (usually) created by me and (unless obviously labelled otherwise) are licensed under the public domain, and/or if that doesn't float your boat a CC0 license. I encourage linking back (of course) to help people decide for themselves, in context, in the battle for ideas, and I love fixes / improvements / corrections by private mail.

In case it's not painfully obvious: the reflections reflected here are my own; mine, all mine ! and don't reflect the views of Collabora, SUSE, Novell, The Document Foundation, Spaghetti Hurlers (International), or anyone else. It's also important to realise that I'm not in on the Swedish Conspiracy. Occasionally people ask for formal photos for conferences or fun.