Stuff Michael Meeks is doing

This is my (in)activity log. You might like to visit my employer Novell which is an amazing company, and also Dell who in days of yore provided me with a free laptop for Gnome development / conferences. Also if you have the time to read this sort of stuff you could enlighten yourself by going to Unraveling Wittgenstein's net or if you are feeling objectionable perhaps here.

Older items: 2009: ( J F ), 2008: ( J F M A M J J A S O N D ), 2007: ( J F M A M J J A S O N D ), 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, legacy html


2008-10-31: Friday.

2008-10-30: Thursday.

Microsoft Office 14 for Web

It seems that MS are entering the hosted Office space with Office Live, Q&A, Video (hover over the downloads link). Here are some of my initial thoughts, at least crystallising my dislike of some of the 'cloud' computing hype.
Architecture
This appears to involve putting a full-blown instance of MS Office on the server, per user - this is (presumably) one of the reasons they are on a vast server farm building spree: and who can blame them - running at least one full copy of office on the server per user is going to be incredibly expensive.

This at least lets them expose the full feature set of MS Office, and makes it something I'd want to use - the down side being things like those Excel no more limits features mean a casual user can swallow gigabytes of server RAM fairly trivially:

Total amount of PC memory that Excel can use
Old Limit: 1GB
New Limit: Maximum allowed by Windows
I hope that works out well for whomever happened to get dropped into sharing the same server as the heavy spreadsheet user.

What about the web technologies ? - apparently this is not Silverlight only, though there seems to be an acknowledgement that if you want a version that performs well, you will need that, otherwise (reading between the lines) I suspect it's a lot of bitmaps, lag and server side rendering.

The exact technology mix doesn't seem to be public yet, but the MS Office code-base is mythically large and twisty. Having said that they seem to have a Model/View separation clean-up underway, unless the live collaboration is some grisly hack. I strongly suspect, where claims of perfect visual fidelity (6:05 into the video) are made, that this is a StarPortal model, or perhaps even more basic - with EMF+ or even an RDP-like protocol being used with a -very- thin client, ie. a model akin to Ulteo's embedded-Java/VNC style setup.

Of course, this type of architecture is really great for getting apps onto the web fast, and sharing that code-base with the fat client, but ultimately can never allow dis-connected operation. Then again, for large complex applications I've never believed the "re-write everything in JavaScript with Gears" mantra (after all, we've not yet finished the re-write of everything into Java yet), and it seems (to me) an expensively lame solution to the simpler deployment problems the web is supposed to solve.

Normal Web problems
This of course ties into the problem of payment - sure, in this world people can save money by buying some trivial piece of hardware, and running just Firefox on it; but sadly - unless money can be made simply by competing to give things away faster than Google, or by advertisements: someone has to pay for 10k new machines per month, and worse the electricity to run them. Is there some corresponding, functioning micro-payment / metering scheme to make a business model fly here ? and how does the transition to that work ?

Then of course, service level agreements tend to be an issue - particularly in the presence of the known pathological resource sharing problems. Of course, service wise - there needs to be some really good sand-boxing to isolate everyone from the next MS Office binary filter vulnerability, and no doubt there remain many of them to be discovered.

To overcome some of these problems, and the potential confidentiality / compliance issues people can run this on their own Sharepoint server(s), then, if you're not careful this begins to look like some of the lamer desktop 'virtualisation' solutions that essentially are just an exercise in PC movement: all the PCs move into the data-centre: but you have just as many of them, perhaps at greater expense, but at least they sit somewhere else. Still - that should make hardware manufacturers salivate I guess.

Of course, in the browser world, as I understand it, there are a host of evil problems with missing open standards and ergonomics around interacting with local devices, and exposing those back to the server: USB keys, printers, printer-quirks-and-settings, 3D acceleration etc. Perhaps many of these can be overcome with Silverlight, but no doubt eventually deploying and updating increasingly complex fat-client technologies starts to eat into the 'reduced deployment' rational for all this work in the first place. It reminds me of our old XMS system (written in Java), that only worked (or was certified / supported) with Java 1.2.3.4-5.6, and only on Windows, and only in IE, such that everyone had to use Citix to access it on a Windows cluster in Provo.

Win32 for the Web ?
If my guess is correct: that this is some very thin RDP-like layer to a fat client on a server - and indeed for eg. VBA macros, Excel analytics plugins etc. to work well this is basically necessary (since they can use any part of the Win32 API) then life is interesting. It would suggest a possibility to extend the life of the Win32 APIs to become a 'web' application framework: if so, surely ~all windows desktop apps can be "Webbed" in a similar way. If my guess is not correct, then at least some degree of embarrassing retraction wrt. the functionality available in the web version will be due soon, and/or some wholesale feature axing for Office 14.
Cross Platform
Does this mean Microsoft is finally shipping Office for GNU/Linux ? lets see how it performs there, I guess this is at least better than nothing for the last few percent scared of the GNU/Linux desktop and OpenOffice, and it'll be interesting to see if MS bothers sustaining their increasingly creaky Mac version: apparently the Web version on Mac will be more feature complete, albeit less 'beautiful'.

Summary
This is a smartish move by Microsoft. It will make, thanks to Miguel's prescience, MS Office available on the GNU/Linux desktop. However it will cost Microsoft a fortune in server hardware and electricity, and there are formidable problems around metering and managing the live service, particularly against a leaner, simpler free (beer) rival in Google. Of course, as soon as the half of computer users with laptops go dis-connected, or catch a flight (when can we expect high-bandwidth, low latency transatlantic internet services ? or even ubiquitous in-seat power ?) - they will have to use OpenOffice anyway, oh, and their portable hardware will need to be beefy enough to be capable of running that, so - why not author it there in the first place ?

This is a problem for Free software - traditionally it has been hard to fund even simple and lightweight shared services (eg. freenode) - never mind server computing on this scale. This is an architecture only giants can play with, as such there is much hope that it will come horribly and expensively un-stuck. It is yet more of a problem as RMS has pointed out because people have no control over their software - they surrender all their freedoms (if it is even implemented with Free software) to some all-knowing hosted provider.

A great Free-software response in my view is to work on adding collaboration features using existing protocols, I'm a fan of bootstrapping from existing IM services with Telepathy tubes; and sticking with the fat clients. That keeps your data local, and shares it only transiently with people you trust, and it also requires little-to-no server load, just bandwidth. We should also keep adding features that (so far) are not going to work well in the 2D fat-web space: eg. get some more sexy 3D transitions going, and better native Mac integration.

Finally - poor ideas don't die, they come back to haunt us; the idea that there would be a few large powerful computers with lots of terminals is a perennial, and goes in cycles. In the past it has foundered on the rocks of Moore's law - it turned out to be cheaper, easier and more reliable to put almost all the software, data, and processing on the local device. Enabling dis-connected operation will still mandate beefy thick clients. I'm optimistic that the trend will continue - eventually even for Mobile devices, last time I looked Google develops and deploys some fat clients.

I look forward to trying the tech. preview at the beginning of next year to corroborate my suspicions, or the final product in Office 14 in 2010 ?.

2008-10-29: Wednesday.

2008-10-28: Tuesday.

2008-10-27: Monday.

2008-10-26: Sunday.

2008-10-25: Saturday.

2008-10-24: Friday.

2008-10-23: Thursday.

2008-10-22: Wednesday.

2008-10-21: Tuesday.

2008-10-20: Monday.

2008-10-19: Sunday.

2008-10-18: Saturday.

2008-10-17: Friday.

2008-10-16: Thursday.

2008-10-15: Wednesday.

Pitch Forkage

Bad, bad cutlery !

Apparently people are trying to stick 'forks' into each other (and in particular me) these days cf. Simon & Roy's noted enthusiasm for this. Apparently (particularly according to medieval, western, knife manufacturers everywhere) the fork is hell-spawn, not to mention economically damaging, particularly when bundled with rice. But, what is a fork ? Wikipedia's write-up is helpful but extremely broad.

Historical antecedents

Back in the day - I used Slackware Linux (cue nostalgia for antique hardware etc.) but now I read up on it, I seem to have been duped: Slackware seems to have been based on a fork of SLS! - I was cheated out of using the real thing. Worse than that, apparently SLS was forked again by that Ian Murdock guy into something called Debian. Sadly, since then, SLS appears to have gone the way of the "Save the Stegosaurus" campaign.

Of course - there are now hundreds of Linux Distributions, each with different choices built in, many with different packages. Of course, there is some nuance here - is Mandriva a fork of RedHat ? is Ubuntu a fork of Debian ? is Mepis a further fork of Ubuntu ? are these cycles of creative destruction: branching, tweaking, trail-blazing new directions, including new features, and/or failing and disappearing bad ? should we have a central planning committee wielding powerful legal tools to pro-actively stamp them out ? or are they a positive side-effect of open-ness: allowing bad ideas to be weeded out by survival-of-the-fittest ?

Waffling and yardsticks

What makes a fork ? Can a budding etymologist confirm the relation to the unix 'fork' system call ? If you use that, in a miracle of copy-on-write-ness; post fork, anything you change is not shared with the parent process ?

Why are linux distributions not generally seen as forks ? I hypothesise that the answer is primarily that development is (for the most part) not independent - distros work on the same underlying code, and (hopefully) contribute their changes back to the common underlying code-pool. Of course, they tend to put different branding on top, have different configuration settings, and often include packages specific to their own distribution, some of them (sadly) not Free software.

What about branching ? is cvs tag -b gnome-2-22 a fork ? do projects go forking themselves regularly as wikipedia suggests ? again, usually people don't call this a fork - it's a branch, and code changes go into both sides of the branch. What then about adding components to a package ? is writing a new out-of-tree kernel module a fork ? what if you give it to some of your friends ? Does this apply more broadly ? is open-sourcing Solaris creating a fork of the GNU/Linux stack - by replacing a key component with a duplicate ? or is that just duplication ? or is Linux itself a fork/ duplication of PrimevalNix and thus intrinsically bad ? does licensing matter ? is duplication for licensing reasons acceptable ? how about purely for ownership ?

David A. Wheeler's forking appendix (to his interesting paper) postulates intent here; is the intent to become a competing or replacement project ? Well, certainly I'd like go-oo to become, as long as Sun pointlessly excludes our features, a replacement distribution (NB. not fork) of OO.o to the one Sun controls: simply because I'd like to get this injustice addressed, but of course, I'd far prefer to have a single up-stream community controlled source for OO.o.

Re-applying this to the world of OO.o, lets look first at StarOffice - was that a fork of OO.o ? in the past when it included proprietary closed source pieces and custom tweaks, and clearly was intended to compete with OO.o, did Sun fork it's 'own' project ? or was that a commercial re-distribution of OO.o with bits added ? How about ooo-build / go-oo ? that includes plenty of patches that for one reason & another (mostly the pain threshold) are not yet up-stream, the snapshot looking far worse than it is, since many patches have already gone up-stream over the years. Of course go-oo also includes some pieces (mostly well separated) that Sun simply refuses to accept up-stream because it wants to own them. Does that make it a fork ? if so, who is to blame for any negative effect ?

Conclusions

Some issues are great for endless charged debate - to quote Yes Minister, The problem is all my facts are statistics, and all your statistics are facts. I (of course) pass on useful and relevant information; it is other people that gossip, I take a lively interest in my surroundings; it's the other people that are nosy etc. People can call go-oo a fork certainly, it's a convenient way of putting it in a box marked 'evil' without mature consideration - but perhaps for consistency they need to call rather a number of other things forks, putting us in good company. Personally, I think forking is sometimes justified, but is go-oo really a fork ? you decide. Whatever your answer, it is totally facile to insinuate that because we encourage people to work on go-oo, that none of their work will get into Sun's OO.o - some of it will (we hope), and we'd love Sun to take the components too, under the copy-left license of their choice.

2008-10-14: Tuesday.

2008-10-13: Monday.

2008-10-12: Sunday.

2008-10-11: Saturday.

2008-10-10: Friday.

Measuring the true success of OpenOffice.org

What is success?

Is success measured in downloads, or up-loads? are bugs filed as good as bugs fixed? are volunteer marketers as valuable as volunteer developers? If we have lots of bugs filed and lots of volunteer management material is that success? is the pace of change important? Does successful QA exist to create process to slow and reject changes, or by accelerating inclusion of fixes improve quality? Is success having complete, up-to-date and detailed specifications for every feature? Is success getting everyone to slavishly obey laborious multi-step processes, before every commit? Alternatively does success come through attracting and empowering developers, who have such fun writing the code that they volunteer their life, allegiance and dreams to improve it?

I encourage people to download & use OpenOffice.org in one of it's derivatives. I'm pleased when people file bugs, help with the QA burden, promote the projet etc. However, in a Free Software project the primary production is developing and improving the software - ie. hacking. So the question is: how is OpenOffice.org doing in this area? Are we a success in attracting and retaining hackers? Is the project sufficiently fun to be involved in that lots of people actually want to be involved?

As we are finally on the brink of switching away from the creaking (22 years old) CVS (provided by Collab.net), to an improved Sun hosted Subversion (sadly not a DRCS) - Kohei and I created a set of scripts to crunch the raw RCS files as they go obsolete. They reveal an interesting picture.

Caveats

As with any measurement task, we believe these numbers are fairly reasonable; and we try to make them meaningful. On the other hand perhaps there is some horrendous thinko in the analysis, bug reports appreciated. It'd also be nice to see if the internal Sun statistics match these.

Firstly - the data is dirty; since we're analysing RCS files; so - when files are moved to the binfilter, or even renamed they have been simply re-committed - causing huge commit spikes. Similarly license changes, header guard removals and various other automated clean-ups, or check-ins of external projects cause massive signal swamping spikes. We have made some (incomplete) attempts to eliminate a few of these. In recentish times all work happens on a CVS branch, which is later merged release engineers (who appear to have done ~50% of the commits themselves), so we filter their (invaluable) contribution out by account name (cf. rt's oloh score).

Secondly - another distorting factor is that we chart only lines added: in fact when you change a line it is flagged as an add and a remove; so the number is more correctly lines added or changed. This of course fails to capture some of the best hacking that is done: removing bloat, which should be a prioirity. In the Linux kernel case this metric also gives extra credit to bad citizens that dump large drivers packed with duplicated functionality, and worse it rewards cut & paste coding. I don't often agree with Bill Gates but:

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.
still at least the 'lines changed' facet should be helpful.

Thirdly - release cycles cause changes in contribution patterns, clearly frantic activity during feature development lapses into more bug-fixing later in the cycle. Thus we expect to see some sort of saw-shape effect.

Fourthly, working on OO.o is infernally difficult, getting code up-stream is extremely and unnecessarily painful - this results in many contributors leaving their code in patches attached to bugs in the issue tracker, and we make no account for these; these changes (if they are committed at all) would appear to be Sun commits. Thus it is possible that there is at least somewhat wider contribution than shown. Clearly we would hope that full-time contributors would tend to commit directly to CVS themselves.

Magnitude of contributions

This graph is more meaningless than it might first appear, the raw data still shows noise like individuals committing obvious sillies copying chunks of OO.o to the binfilter eg. To some extent it is further distorted by us trying to clean this up for the past couple of years before giving up:

So the data is not that useful. Is it more useful to look at an individual to see if they are contributing something? If we threshold the data we can at least approximate an activity metric / boolean. The graph below shows two developers - the Sun developer Niklas Nebel, and the Novell hacker Kohei Yoshida. Both work primarily on calc, and you can see the large bar when Kohei committed his solver to a branch at the end of 2006.

It seems clear that we can at least approximate activity with some thresholding. More interesting than this though, we can see a most curious thing. Despite Calc (apparently) being the relative weakness of OO.o, and Niklas being the maintainer of the calc core engine, and the calc "Project Lead" (with special voting privileges for the 'community' council), in fact he hasn't committed any real amount of code recently. That jumps out in the comparison with (vote-less) Kohei in the last six months. It is very sad indeed to all but loose Niklas from the project, though at least we'll see him at OOoCon. Verifying this counter-intuitive result with bonsai reveals the same picture.

Activity graphs

Extending this metric to the entire project we see perhaps a more interesting picture. By thresholding contributions at one hundred lines of code added/changed per month, we can get a picture of the number of individuals committing code to OO.o. Why one hundred? why not? it's at least a sane floor. Clearly we get a metric that is very easy to game, but luckily that's hard to do retrospectively.

It is clear that the number of active contributors Sun brings to the project is continuing to shrink, which would be fine if this was being made up for by a matched increase in external contributors, sadly that seems not to be so. Splitting out just the external contributors we see some increase, but not enough:

Novell's up-stream contribution appears small in comparison with the fifteen engineers we have working on OO.o. This has perhaps two explanations: of course we continue to work on features that are apparently not welcome in Sun's build cf. the rejection of Kohei's solver late in 2007, and much of the rest of our work happens in ooo-build, personal git repositories, and is subsequently filed as patches in IZ.

A comparison

So, it should be clear that OO.o is a profoundly sick project, and worse one that doesn't appear to be improving with age. But what does a real project look like that is alive? By patching Jonathon Corbet's gitdm I generated some similar activity statistics for the Linux kernel, another project of equivalent code size, and arguably complexity:


Graph showing number and affiliation of active kernel developers (contributing more than 100 lines per month).
Quick affiliation key, from bottom up: Unknown, No-Affiliation, IBM, RedHat, Novell, Intel ...

There are a number of points of comparison with the data pilot of active developers aggregated by affiliation for OO.o.

Similarities: both graphs show the release cycle. Spikes of activity at the start reducing to release. Linux' cycle is a loose 3 months, vs. OO.o's 6 months.

Differences: most obviously, magnitude and trend: OO.o peaked at around 70 active developers in late 2004 and is trending downwards, the Linux kernel is nearer 300 active developers and trending upwards. Time range - this is drastically reduced for the Linux kernel - down to the sheer volume of changes: eighteen months of Linux' changes bust calc's row limit, where OO.o hit only 15k rows thus far. Diversity: the linux graph omits an in-chart legend, this is a result of the 300+ organisations that actively contribute to Linux; interestingly, a good third of contribution to Linux comes from external (or un-affiliated) developers, but the rest comes from corporates. What is stopping corporations investing similarly in OO.o?

Conclusions

Crude as they are - the statistics show a picture of slow disengagement by Sun, combined with a spectacular lack of growth in the developer community. In a healthy project we would expect to see a large number of volunteer developers involved, in addition - we would expect to see a large number of peer companies contributing to the common code pool; we do not see this in OpenOffice.org. Indeed, quite the opposite we appear to have the lowest number of active developers on OO.o since records began: 24, this contrasts negatively with Linux's recent low of 160+. Even spun in the most positive way, OO.o is at best stagnating from a development perspective.

Does this matter? Of course, hugely ! Everyone that wants Free software to succeed on the desktop, needs to care about the true success of OpenOffice.org: it is a key piece here. Leaving the project to a single vendor to resource & carry will never bring us the gorgeous office suite that we need.

What can be done? I would argue that in order to kick-start the project, there is broadly a two step remedy:

Unfortunately, the chances of either of these points being addressed in full seem fairly remote - though, perhaps there will continue to be some grudging movement in these directions.

A half-hearted open-source strategy (or execution) that is not truly 'Open' runs a real risk of capturing the perceived business negatives of Free software: that people can copy your product for free, without capturing many of the advantages: that people help you develop it, and in doing so build a fantastic support and services market you can dominate. It's certainly possible to cruise along talking about all the marketing advantages of end-user communities, but in the end-game, without a focus on developers, and making OO.o truly fair and fun to contribute to - any amount of spin will not end up selling a dying horse.

Postscript

Why is my bug not fixed? why is the UI still so unpleasant? why is performance still poor? why does it consume more memory than necessary? why is it getting slower to start? why? why? - the answer lies with developers: Will you help us make OpenOffice.org better? if so, probably the best place to get started is by playing with go-oo.org and getting in touch, please mail us.

Finally - we invite you to repeat the analysis, the raw spreadsheet data (for data-miners) is here: ooo-stats.ods linux-stats.ods and the RCS parsing scripts parse_rcs.py with dependants in that same directory.

2008-10-09: Thursday.

2008-10-08: Wednesday.

2008-10-07: Tuesday.

2008-10-06: Monday.

2008-10-05: Sunday.

2008-10-04: Saturday.

2008-10-03: Friday.

2008-10-02: Thursday.

2008-10-01: Wednesday.


In case it's not painfully obvious: the reflections reflected here are my own; mine, all mine ! and don't reflect the views of Novell, The Lithuanian Gov't or Arnold Schwarzenegger. It's also important to realise that I'm not in on the Swedish Conspiracy. Occasionally people ask for formal photos for conferences, bio. or fun.
Michael Meeks (michael.meeks@novell.com)
Made with PyBlosxom