Stuff Michael Meeks is doing

This is my (in)activity log. You might like to visit Collabora Productivity a subsidiary of Collabora focusing on LibreOffice support and services for whom I work. Also if you have the time to read this sort of stuff you could enlighten yourself by going to Unraveling Wittgenstein's net or if you are feeling objectionable perhaps here. Failing that, there are all manner of interesting things to read on the LibreOffice Planet news feed.

Older items: 2021: ( J F M A M J J A S O N D ), 2019, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, legacy html


LibreOffice's under-the-hood progress in 4.1.0 (beta)

Rather soon we will be releasing LibreOffice 4.1—currently we're in a Beta phase of that, and we appreciate people getting stuck in and helping with testing. You can download builds from here pre-releases or if you like some up-to-the-hour builds from dev-builds.

( This post is also available in French )

We're still building our list of features and credits. We have a number of new visible features of course with credits against them. Cor has made a pair of beautiful blog entries highlighting UI improvement and the Photo Album features in 4.1. That made me think of the many developers who have been working extremely hard on things that are under the covers and not so easily seen, but are still really important. I'd like to explain just some highlights of that here, (crediting the developers' employer where there is one at the first mention). Often these are tasks that are easy to get involved with, and may seem trivial in isolation but cumulatively add up to a code-base that is far easier to understand and to contribute to.

Build system: configure / make

One of the tasks that most irritates and has distracted new developers from doing interesting feature work on the code-base over many years has been our build system. At the start of LibreOffice, there was an incomplete transition to using GNU make, which required us to use both the horrible old dmake tool as well as gnumake, with configure using a Perl script to generate a shell script configuring a set of environment variables that had to be sourced into your shell in order to compile (making it impossible to re-configure from that shell), with a Perl build script that batched compilation with two layers of parallelism, forcing you to over- or undercommit on any modern builder; it looked something like this:

# old and awful
autoconf
./configure --enable-this-and-that
source LinuxIntelEnv.Set.sh
./bootstrap
cd instsetoo_native
build --all

Thanks to the stirling efforts Björn Michaelsen (Canonical), David Tardon (Red Hat), Peter Foley, Norbert Thiebaud, Michael Stahl (Red Hat), Matúš Kukan, Tor Lillqvist (SUSE), Stephan Bergmann (Red Hat), Luboš Luňák (SUSE), Caolán McNamara (Red Hat), Mathias Bauer (Oracle), Jan Holesovsky (SUSE), Andras Timar (SUSE), David Ostrovsky, Hans-Joachim Lankenau (Oracle) and more—(more details) the 126 thousand targets, and 1700 makefiles are now fully converted to GNU make so we have the significantly simpler:

# LibreOffice configure & make as of now:

./autogen.sh --enable-this-and-that
make

No shell pollution, no 'bootstrap' script, no Perl build wrapper, no obsolete 'dmake' required, just plain GNU make files—and incredible build parallelism—after generating headers, we could utilize a thousand CPUs. This is a clean-cut task with a clear boundary; like the process of removing dead code in previous releases, it is now complete—freeing up developers for more interesting things.

Graph of gnumake vs. dmake conversion by version

Build system: make dev-install

LibreOffice, in contrast to much other software, is fully relocateable—you can plonk it down where you like, and run it from there. As such we use a make dev-install to create an install set in install/ that you can run in the build tree. This process has traditionally been performed by a Perl script using a convoluted set of pre-processed rules, to achieve what is (mostly) a copying operation. David Tardon has made some great progress moving this to use much simpler file-lists that we auto-generate. So—nowadays we have an instdir/ top-level (on which these file-lists operate) that starts to mirror the install—the hope being to do away with the make install phase for running inside the build tree. So far we have more than 250 file lists, handling nearly 20k files.

This initiative makes it significantly easier to add or remove files from the install, and removes lots of zipping and un-zipping of sets of files that used to happen during the build: thus making packaging a build faster: the SDK packaging went from 90s to 30s or so, while also dropping lots of scp2/ rules. The hope is that, when this is complete we will have an office suite that is runnable out of the box after a make, without an extra install phase.

Code cleanup / linting

A huge amount has been done here to make the code-base easier to understand. Doing this makes it easier and quicker for us to read the code, check it is correct, understand the flow—and so to add features or fixes.

sane includes
In the bad old days each module used to have an inc/<module> directory inside itself where its external include files were concealed. During the build of each module, these were copied to a separate artifacts directory (the 'solver') and the next module was compiled against those copies. This lead to a number of problems with debuggers identifying copies of headers, newbies editing the wrong (solver) headers, performance issues on windows, and more. So—thanks to Bjoern Michaelsen, Matúš Kukan, Michael Stahl for moving all the headers to a single top-level include/ directory and de-crufting the makefiles to make that nice.
tools cleanup
The tools/ module has a lot of duplicate functionality that is not needed, in this cycle we removed a complete duplicate file-system abstraction by writing it out of the code, thanks to Tomas Turek, Krisztian Pinter, Thomas Arnhold, Marcos Paulo de Souza & Andras Timar. It is always good for security to remove yet another duplicate, cross-platform, safe temporary file creation code-path.
String cleanups
We continued to make good progress on the removal of the obsolete UniString class, with a couple more method removals from Jean-Noël Rouvignac & Caolán McNamara. In addition Lubos Lunak did a mass removal of redundant rtl:: namespace prefixes all across the code for OUString and OString - making the code more readable, with a number of other significant performance, and cleanliness improvements. Large numbers of call sites were upgraded from UniString to OUString, had their redundant RTL_USTRING_CONSTASCII macro bloat removed, and used faster ways of concatenating strings—thanks to: Olivier Hallot, Christina Rossmanith, Stephan Bergmann, Chris Sherlock, Peter Foley, Marcos Paulo de Souza, José Guilherme Vanz, Jean-Noël Rouvignac, Markus Mohrhard, Ricardo Montania, Donizete Waterkemper, Sean Young, Thomas Arnhold, Rodolfo Ribeiro Gomes, Lionel Elie Mamane, Matteo Casalin, Janit Anjaria, Noel Grandin, Tomaž Vajngerl, Krisztian Pinter, Fridrich Strba (SUSE), Gergő Mocsi, Prashant, Ádám Csaba Király, Kohei Yoshida—and more I missed in the log (mail me).
component service registration
Noel Grandin continued his indomitable work to cleanup all call-sites that create components with new-style service constructors, with lots of other associated improvements—around two hundred and fifty new commits in 4.1.

Code quality work

Perhaps the least visible kind of improvement is crasher bugs that are not there anymore. Clearly the goal is never crashing, but how do we get there ? Markus Mohrhard worked on a lovely set of automated tests to load over twenty four thousand files—of the most evil and twisted kind: ie. the contents of all bugzillas we could scrape. Thanks to some great work from Markus, Fridrich Strba (SUSE), Michael Stahl, Eike Rathke (Red Hat) for fixing the results, we hope users will enjoy fewer sightings of our ugly crash dialog.

Another source of significant improvement, was the use of static checking tools to increase code quality, and hence reliability. This release a team started systematically going through the coverity data. This yielded nearly three hundred commits—thanks to: Markus Mohrhard, Julien Nabet, Norbert Thiebaud, Caolán McNamara, Marc-André Laverdière (TCS), and others. In addition Julian Nabet got over sixty fixes from the cppcheck tool included. Lastly lint-wise, we continue to use Clang and Lubos' nice plugins to find and remove questionable code as it appears.

Another great tool we that has improved here is bibisect—allowing us to have a git repository with binaries from every few dozen previous commits included inside it. This allows end-user testers to find very precisely where a given bug was introduced into the product using bisection of lots of binary builds crammed into a single git repository. Thanks to Bjoern Michaelsen & Canonical's QA labs for more build hardware here.

We also built and executed more unit tests with LibreOffice 4.1 to avoid regressions as we change the code. These are rather hard to measure, since people like to pile up new tests inside existing unit test modules. By grepping for the CPPUNIT_TEST registration macro we can see that that we added around a hundred such tests to 4.1—the majority of these were added to calc, with significant gains in writer, chart2, connectivity and impress. Thanks to Miklos Vajna (SUSE), Kohei Yoshida (SUSE), Noel Power (SUSE), Markus Mohrhard, Luboš Luňák, Stephan Bergmann, Michael Stahl, Noel Grandin, Eike Rathke, Julien Nabet, Caolán McNamara, Jan Holesovsky, Thomas Arnhold, Tor Lillqvist, David Ostrovsky, Pierre-Eric Pelloux-Prayer (Lanedo), Christina Rossmanith and others for working on the tests.

Calc core refactoring

One of the reasons why Calc gained so many, badly needed, systematic unit tests for previously un-covered code, was the very significant re-factoring work going on in the core. For many years, calc was architected under the delusion that a spreadsheet is composed of cells - which created some serious scalability and performance problems. The end goal of this work is to kill ScBaseCell completely—and move to storage of spans of contiguous data of uniform type down a column. Some of the initial work for this is in place in 4.1, but the full benefit will have to wait at least until 4.2 or even later versions when we can make further adjustment to take full advantage of the new cell storage structure. The aim with 4.1 is to have no visible performance regression, perhaps some minor speedups and memory footprint reductions in some areas, but more importantly, better code maintainability thanks to the separation of cell broadcaster mechanism from the cell storage itself. Thanks to Kohei Yoshida for his great work here.

German Comment Translation

Always encouraging to build the metrics, in the last release cycle we lost approaching five thousand lines of German comment: translated into English. That helps new developers get started on the code, understand it and get developing faster. The rough graph of this (which unfortunately includes a number of false positives for lines of German) looks like this:

Graph of remaining lines of German comment to translate

With many thanks to Urs Fässler, Christian M. Heller, Philipp Weissenbacher, Luc Castermans, David Verrier, Chris Sherlock, Joren De Cuyper, Thomas Arnhold, Philipp Riemer, and others. Help appreciated from German speakers with translating the last sixteen-thousand lines—it's a matter of checking the code out and running bin/find-german-comments on a module, translating a few lines and mailing a git diff to libreoffice At lists.freedesktop.org (no subscription required).

Completed Wizard conversion to python

Java remains an excellent, if not preferred environment for writing cross-platform extensions. All the existing Java support and APIs remain as before. Having said that—on some platforms Java is not available, and as such using our bundled, internal python runtime makes good sense for built in features.

This release we completed porting the Java wizards, which can be used in the File->Wizards menu, to Python. This should give a better experience for Windows users who are not lucky enough to have a JRE installed. Many thanks to Xisco Fauli, and Javier Fernandez (Igalia)

Linking & startup

One of the key features required to get the LibreOffice prototypes running on Android and iOS was to be able to link nearly all our code into a single shared library (Android) or executable (iOS). This work is re-used with an --enable-mergelibs configure option—which aggregates much of the common code of LibreOffice into a huge, single shared library: much as is done with Mozilla. This is increasingly the default choice for Linux distribution builds, and should yield improved seek and hence cold-start performance. Work remains to be done on code re-ordering, and PGO to further improve startup performance. Many thanks to Matúš Kukan (for the Raspberry Pi Foundation) and Tor Lillqvist for working on this.

Another startup performance feature kindly funded by the Raspberry Pi foundation is to reduce the amount of configuration data pointlessly parsed during startup. One nice win in this area was removing fourteen thousand lines of data for printing sheets of labels from our configuration, and defering that parsing, until someone wants to print a label, thanks to Matus Kukan for that too.

New type format

The programming interfaces that are used in LibreOffice require type information to inform their work, particularly for scripting. In the past this was stored in some ancient, inefficient, legacy binary database. Thanks to Stephan Bergmann (Red Hat) we now have a new, more efficient and compressed binary format, with our main offapi.rdb shrinking ten-fold from 6.5Mb to 0.65Mb, more details in his Well Typed Uno talk at FOSDEM. So far this format is used only for private, internal type information, and we plan to remain fully backwards compatible for extensions that provide old-style type information. Documentation of the format is availble in the source tree: nowadays we have increasingly detailed structural / overview documentation in each module's README file.

Miscellaneous

Other areas showed some great improvements:

Time
The resolution of the time-related datatypes in UNO (LibreOffice's API) has been increased to nanoseconds, from centi- and milliseconds. This is mainly useful in Base, where LibreOffice will not anymore truncate times and timestamps to centiseconds, nor durations to milliseconds, in user data. Lionel Elie Mamane
Base
In a form, DatabaseListBox now exposes the selected value(s) (as opposed to the selected display strings) to the scripting interface. Lionel Elie Mamane
UI migration to Glade XML
The UI migration to Glade layout based XML files continued apace with contributions from many individuals, we managed to go from 64 .ui dialog descriptions in 4.0 to 230 in the 4.1 branch (so far): quite a jump towards completeness at five hundred dialogs—thanks to Caolán McNamara, Krisztian Pinter, Jack Leigh, Alia Almusaireae (KACST), Katarina 'Bubli' Behrens, Abdulaziz A Alayed (KACST), Jan Holesovsky, Faisal M. Al-Otaibi (KACST), Abdulmajeed Ahmed (KACST), Andras Timar, Manal Alhassoun (KACST), Bubli, Albert Thuswaldner, Olivier Hallot, Miklos Vajna, Abdulelah Alarifi (KACST), Gokul Swaminathan (KACST), Rene Engelhard, and others. It is also worth mentioning the great work done by translators to check & update strings here. The most significant benefit of the UI migration is finally making it extremely painless to tweak and improve the user interface.
Debugging output
There are new SAL_INFO and SAL_DEBUG macros which make it easy to add filtered, or temporary debugging output. Our git hooks warn if you leave any SAL_DEBUG statements around on commit too.
Gallery building
LibreOffice has been lumbered with a rather hideous format in which to store galleries. We generally ship the gallery images as standalone files, but have a set of binary resources containing thumbnails of these, and unqiue integers to refer to their translated names which ship in the libreoffice binary. In 4.1 we build most of these on each platform during compile, making them easy to extend, (and avoiding having impenetrable binaries in git), and we translate the theme name with a new .desktop syntax file alongside. This also should make it easy for users to build their own galleries as extensions and ship them with translated names.
Intermediate SDF removal
While we removed SDF from our developer facing translation flow for 4.0—we still generated some SDF files as temporary build intermediates. Thanks to Tamás Zolnai for moving us to a pure .po solution.

Getting involved

I hope you get the idea that more developers continue to find a home at LibreOffice and work together to complete some rather significant work both under the hood, and also on the trim. I've enjoyed hacking on several of these improvements. Our hope is that as the on-ramp to the project gets less precipitous, people will join us, and find out how fun, and how much easier it is to improve the code these days. You'll also be in good company—first in terms of the number of code contributors to collaborate with:

Graph showing individual code committers per month

And also in terms of diversity of code commits, we love to see the unaffiliated volunteers contribution by volume, though clearly the volume and balance changes with the season, release cycle, and time available for mentoring:

Graph of number of commits per month by affiliation

Of course, we maintain a list of small, bite-sized tasks which you can use to get involved at our Easy Hacks page, with simple build / setup instructions. We now have a cleaner, and safer environment to work on improving the code.

One of the easiest things to do is to help out with bug reporting, and bug triage (confiming and quality checking other people's bug reports), you can be an effective triager with little experience, and good bug reports really help developers out, just grab and install a pre-release and you're ready to contribute alongside the rest of the development team. Even better you could get involved with the fun QA Bug Triage Contest and win a prize.

Conclusion

LibreOffice 4.1 will be another milestone, and we hope a yet-higher watermark for code-quality, design improvement, and incrementally more solid foundations for improving the best office suite in the world. Of course, with so much changing, we really appreciate early testing of our betas and release candidates, which (we hope) should be useful for doing work with - though save regularly and generationally. If you havn't time to test our betas or release candidates, our time-based release plan predicts our final release date at the very end of July. Thank you for supporting LibreOffice.


My content in this blog and associated images / data under images/ and data/ directories are (usually) created by me and (unless obviously labelled otherwise) are licensed under the public domain, and/or if that doesn't float your boat a CC0 license. I encourage linking back (of course) to help people decide for themselves, in context, in the battle for ideas, and I love fixes / improvements / corrections by private mail.

In case it's not painfully obvious: the reflections reflected here are my own; mine, all mine ! and don't reflect the views of Collabora, SUSE, Novell, The Document Foundation, Spaghetti Hurlers (International), or anyone else. It's also important to realise that I'm not in on the Swedish Conspiracy. Occasionally people ask for formal photos for conferences or fun.

Michael Meeks (michael.meeks@collabora.com)

pyblosxom