About GNOME · Download · Users · Developers · Foundation · Contact

Analyzing and Improving GNOME Startup Time

Status: DRAFT
Last updated: 2005-10-11 15:09 CEST

Introduction

The time it takes to log in to GNOME has not improved much in recent years and does not compare favourably with other systems, especially closed-source systems. This work attempts to determine what causes GNOME startup to be slow and what can be done to address the problem.

Initial investigation

Startup is I/O bound

An initial investigation shows that GNOME startup time is heavily I/O bound: when running gnome-session from the shell on a fake X display, time until the disk stops spinning is >30 seconds (bootchart) on the first run and ~5 seconds (bootchart) on subsequent runs. Therefore:

Sequential I/O and random I/O

By far the most expensive I/O operation is a disk head seek. At the time of this writing, fast 7200 RPM desktop hard disks have a transfer rate of about 30-70 MB/s, but a random access time of about 10-15 ms, which equates to 60-100 reads/s. Therefore, while sequential reads might obtain close to the maximum transfer rate, a worst-case random read on a 4k-block filesystem would read at most 100 4k blocks per second, or 400kB/s, which is two orders of magnitude slower. Laptop hard disks, which usually run at 5400 RPM or even 4200 RPM, are even slower.

Methodology

Metrics

For the purposes of this work, startup time is defined as from when GNOME (or GDM) is launched to when the disk stops spinning.

Test setup

All tests were conducted on a Dell Latitude D400 with 512 MB RAM and a Toshiba MK3021GAS 30GB, 4200 RPM hard disk. The system was running Debian Unstable GNU/Linux with a custom 2.6.12.5 kernel and X.org 6.8.2. GNOME was checked out of CVS on about 2005-08-20 (soon before the 2.12.0 release) and built with jhbuild. To avoid disturbing the distribution-provided GNOME installation, the test GNOME version was installed under /usr/local/gnome.

Tools used

Bootchart

Example of bootchart image
Example bootchart
Bootchart is a program developed by Ziga Mahkovec to monitor the Linux boot process. It consists of (i) a shell script daemon that monitors various system and process parameters at regular intervals and collects the output in a series of log files, and (ii) a Java program that analyzes and processes the log files and produces a chart that shows the use of system resources by various processes over time.

In order to monitor GNOME startup a few modifications had to be made to bootchart:

  1. Non-root user support
    Bootchart normally stores its logs under /var/log and its configuration files under /etc, neither of which are accessible to a non-root user. It was modified to allow configurable log file and configuration file paths.
  2. Better single-process support
    Bootchart uses a series of heuristics to collapse processes, to merge multiple processes into one, and to eliminate from the chart processes which are not "interesting". This does not work well for GNOME startup. Bootchart does have a "single process tree" mode in which it attempts to monitor only the processes spawned by a single command, but it cannot track processes spawned by a process which has lost its parent, which is the case for virtually all of the programs launched during GNOME startup.
  3. Milestones in bootchart
    Milestones in bootchart
    (click for full version)
  4. Milestone support
    Bootchart lacks a mechanism for marking on the graph the time at which a particular event occurred. However, this is very useful for monitoring specific applications and for finding out which phases in application startup take the most time. Therefore, support for "milestones" was added: the application monitored may write strings of the form:
    xxx.yy <string>
    to a file named milestones.log in the bootchart temporary directory, where xxx.yy is the system uptime taken from /proc/uptime and <string> is a text string. Bootchart will then display them on the chart.
    Displaying the milestones on the bootchart can then be achieved simply by modifying the GNOME binaries to write a status line to milestones.log when some significant point in startup is reached.
The patches to bootchart are currently available here (warning: currently VERY raw).

Sysprof

Example Sysprof output
Example Sysprof output
Sysprof is a Linux system profiler that uses a kernel module to profile the entire system as it is running. Sysprof was designed to profile CPU and not I/O usage, but the author has also developed a kernel patch to profile block reads on ext3 filesystems (the patch requires the CVS version of sysprof). This has the advantage of being able to monitor all I/O that actually occurs on the disk, rather than simply logging I/O requests or read() calls. Sysprof is a GUI application, so to monitor the GNOME startup process a console application, sysprof-text, was written (patch to sysprof CVS). sysprof-text non-interactively collects profile data to a file that can then be opened and analyzed in the Sysprof GUI.

Iolog

Iolog is a small kernel module developed for this project which uses the sysprof ext3 patch to logs all ext3 block reads to the kernel log in the form:
READ: 1125508642.466082 (metacity/3773) 116 /usr/local/gnome/bin/metacity
where 1125508642.466082 is a timestamp, metacity/3773 is the name and PID of the process performing the read, 116 is the offset into the file in 4k blocks, and /usr/local/gnome/bin/metacity is the file being read. Simple scripts were then written to parse the logs. In particular, analyzereads.py is a Python script that parses the I/O log and coalesces contiguous read operations, producing output such as the following:
(gdm/2982): /usr/local/gnome/sbin/gdm-binary 0-7
(gdm-binary/2982): /usr/local/gnome/lib/libgtk-x11-2.0.so.0 0-7
(gdm-binary/2982): /usr/local/gnome/lib/libgtk-x11-2.0.so.0 687-718
(gdm-binary/2982): /usr/local/gnome/lib/libgtk-x11-2.0.so.0 653-684
(gdm-binary/2982): /usr/local/gnome/lib/libgtk-x11-2.0.so.0 34-65
(gdm-binary/2982): /usr/local/gnome/lib/libgtk-x11-2.0.so.0 8-33
(gdm-binary/2982): /usr/local/gnome/lib/libgtk-x11-2.0.so.0 515-546
which shows gdm reading the first 32k of /usr/local/gnome/sbin/gdm-binary and then gdm-binary itself performing various non-contiguous reads on /usr/local/gnome/lib/libgtk-x11-2.0.so.0. This makes it possible to distinguish contiguous I/O operations, which do not have a large impact on performance (unless the filesystem is fragmented), from non-contiguous I/O operations, which involve disk seeks and thus have a more significant impact on performance.

The scripts topseeks-byfile.sh and topseeks-byprocess.sh wrap analyzereads.py to produce lists of the files and of the processes that are responsible for the most non-contiguous reads. Using these scripts we can easily see, for example, that the process causing the most seeks is nautilus and the file that is read in the largest number of non-contiguous reads is the GTK library:

$ ./topseeks-byprocess.sh io.log.gnomeonly 5
    122 nautilus
    109 gnome-session.r
     74 bonobo-activati
     49 gnome-panel
     46 gnome-settings-

$ ./topseeks-byfile.sh io.log.gnomeonly 5

     26 /usr/local/gnome/lib/libgtk-x11-2.0.so.0
     10 /usr/local/gnome/lib/libxml2.so.2
      8 /usr/lib/libstdc++.so.6.0.5
      8 /usr/lib/libstdc++.so.5.0.7
      7 /usr/local/gnome/lib/libpoppler.so.0

Strace

Strace is a program that traces the system calls made by another process. It can be very useful to determine, which files a process is opening, if a process is blocking on a socket, if it is timing out on a DNS lookup, etc.

Strace is very simple to use: for example, the command:
strace -ttt -e trace=open cat file
will print the all the open() system calls executed by the command cat file, with timestamps, to standard error. The output is similar to the following:

1126702167.515387 open("/etc/ld.so.cache", O_RDONLY) = 3
1126702167.516001 open("/lib/tls/libc.so.6", O_RDONLY) = 3
1126702167.516963 open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3
1126702167.563043 open("file", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
[...]

The -p pid option requests strace to attach to the existing process having PID pid, while the -f option follows forks and outputs the system calls executed by child processes.

Benchmarking methods

Benchmarking gnome-session on a parallel X server

Initially, benchmarks were run by starting the test version of GNOME on a virtual X display (Xvfb) while the computer was already running another X server and the distribution-provided version of GNOME. Since the distribution version and the test version are in different directories and share no files, this approximates the behaviour of a cold boot, nless there is data in the cache from a previous benchmark run. The caches were empirically flushed by running commands such as:
cat largefile > /dev/null
between runs. This approximation allowed the most obvious bottlenecks to be identified and removed without having to reboot the system every time time. However, the results obtained using this method are too noisy to allow the measurement of small differences in startup time; furthermore, they do not reflect the effect of applications that may already be in cache during normal startup but not during benchmarking (e.g. xrdb, which is started both by the X server startup scripts and by GNOME, and effectively gives no measurable performance penalty when run by GNOME if GNOME is started immediately after X).

Benchmarking GDM and GNOME using reboots

The parallel X server method of benchmarking is too noisy to spot small differences in startup time and is obviously less realistic than benchmarking from a clean boot. Therefore, later benchmarks were performed by rebooting into single user mode and running a custom system startup script that loaded gdm, which was configured to autologin into GNOME.

Evern when rebooting between runs, however, running consecutive benchmarks showed that startup time was slightly higher the first time a given configuration was benchmarked than on successive trials, perhaps due to the the hard disk's internal cache. Therefore, to obtain consistent results every measurement was run twice and the first result discarded.

Detailed Analysis

I/O analysis

Loading dynamic libraries

Analysis of the I/O logs shows that most of the I/O during GNOME startup is caused by loading dynamic libraries, which are not read sequentially because they are loaded with mmap() and page faulted into memory as needed (see example IO log above). Indeed, once most sources of non-contiguous I/O such as reading gconf defaults are removed, libraries account for about 64% of non-contiguous reads:
$ ./analyzereads.py io.log.gnomeonly | grep /lib/ | wc -l
302
$ ./analyzereads.py io.log.gnomeonly | grep -v /lib/ | wc -l
172
This is not something which is easy to address within GNOME itself. Efforts to consolidate libraries such as Project Ridley, which plans to merge a number of smaller libraries in to GTK+, would not help: since large libraries are already being read sparsely using mmap(), consolidating many small libraries into one large file does not significantly reduce the number of disk seeks. Preloading the libraries with readahead does help, since in this case the library files are read contiguously. Although this increases memory usage, it is probably be a worthwile tradeoff since first boot uses only a small fraction of the RAM that is used once applications are loaded.

Possible improvements and their effect of startup time

This section discusses changes proposed on the basis of the benefit to startup time that they bring.

Merging gconf defaults and incremental processing

From a bootchart of an unmodified installation it is clear that one of the top I/O consumers is gconfd-2, which performs almost constant I/O at many points in startup. Running a:
strace gconfd-2
showed gconf opening hundreds of files under /usr/local/gnome/etc/gconf/gconf.xml.defaults. This is because the gconf database is made up of hundreds of files (one per key) in directory trees that mirror the gconf trees. However, gconf already contains code to read whole configuration directory hierarchies from a flat file and a tool, gconf-merge-tree, to merge a configuration directory hierarchy into such a file. Merging the files in /usr/local/gnome/etc/gconf/gconf.xml.defaults into one file with:
gconf-merge-tree /usr/local/gnome/etc/gconf/gconf.xml.defaults
yields about a 7 second reduction in startup time.

However, further improvement is possible: as shown in the zoomed-in bootcharts below, when the files are merged gconf-2 first performs I/O for about 3 seconds and then uses 100% CPU time for about 4 seconds. This suggests that the flat file is being read into memory in its entirety and then processed all together. Therefore, a patch was written to process the file incrementally while reading it. When the files are also read incrementally, gconf-2 performs I/O and uses the CPU simultaneously, and completes ~8 seconds instead of ~10 seconds after startup, thus yielding a further 2 second reduction in startup time:

Non-Incremental Incremental

The combined effect of the two changes yields a major reduction in startup time (9 seconds).

Stock Merged, non-incremental Merged, incremental

A patch was written to merge the configuration files automatically, but it was backed out because it would cause problems with users who also logged into pre-2.6 versions of GNOME, which did not support merged files. In view of the large benefits that the patch brings, and the low likelihood of users logging into both GNOME 2.6 and 2.14, it appears to be time to apply that patch, possibly in conjunction with the incremental reading patch to obtain the full benefit to startup time.

Separating translated gconf descriptions from main locale

The gconf database contains the description of configuration keys in many languages, only one of which is likely to be in use by a given user. To evaluate the impact of translations, all non-English descriptions were removed from the flat file using a simple XSLT file. This yielded a further two-second reduction in startup time:

With translations Without translations

Lazy-loading the screensaver

The screensaver is loaded by gnome-settings daemon very soon after it starts. However, it is not needed immediately after login, and loading it competes for resources with other startup tasks which are more important. Therefore, the screensaver should be loaded only after more important startup tasks have been performed. To evaluate the effect of lazy-loading the screensaver, it was wrapped by a simple shell script that loaded it after a fixed 25-second delay. This yielded a 1 second reduction in startup time:

Loading xscreensaver on startup Loading xscreensaver lazily

Note that loading xscreensaver causes a non-trivial amount of I/O, therefore care must be taken to load it when the system is otherwise idle:

file:///home/lor/work/soc/analysis/xscreensaver-lazy-detail.png
I/O caused by
screensaver startup

Using a smaller cpp implementation for xrdb

xrdb is called first by the gdm startup scripts in order to load .Xdefaults and .Xresources files, and then by gnome-settings-daemon in order to match the colors of non-GTK applications to the GTK theme. Since X resource files may contain C-style preprocessor statements, xrdb calls the system C preprocessor to process them. However, since the GNU preprocessor is integrated in the C compiler, this requires loading the whole compiler (about 4 MB). Patching xrdb to use a smaller C preprocessor implementation such as mcpp results in a ~1.5 second reduction in startup time.

xrdb calling cpp xrdb calling mcpp

A patch to xrdb to use mcpp by default has been attached to freedesktop.org bug 4325.

Preloading libraries

More than half of the non-contiguous I/O performed during GNOME startup is caused by loading dynamic libraries. For example, during GNOME startup the GTK+ library is loaded in about 26 separate non-contiguous reads:

$ ./topseeks-byfile.sh io.log.gnomeonly 1
     26 /usr/local/gnome/lib/libgtk-x11-2.0.so.0

To evaluate the impact of loading libraries contiguously at runtime, the GTK+ library, which is the library is responsible for the most seeks, was preloaded by running a simple:
cat /usr/local/gnome/lib/libgtk-x11-2.0.so.0 > /dev/null
command early in the startup process. This had the effect of loading the libraries into the buffer cache using fast sequential reads, so when the dynamic linker performed random reads at runtime no seeks were necessary, which yielded a ~0.5 second decrease in startup time. To determine if it was advantageous to preload other libraries too,  all the libraries which were read in 5 or more non-contiguous reads during the whole startup process were preloaded in the same way.  The libraries in question were:

Library name # requestsLibrary name # requests
libgtk-x11-2.0.so.0 32libgdk-x11-2.0.so.0 6
libxml2.so.2
10libORBit-2.so.0 5
libpoppler.so.0 7libgstreamer-0.8.so.1 5
libnautilus-private.so.2
7 libecal-1.2.so.3 5
libgnomeui-2.so.0
7libdbus-1.so.1 5
libglib-2.0.so.0
7libbonoboui-2.so.0 5
libeel-2.so.2
7libbonobo-2.so.0 5

Preloading the libraries in the table yielded a ~1 second improvement in startup time:

Without preloading Preloading GTK+ Preloading top libraries

Note the difference in I/O patterns, with the sequential reads reaching much higher transfer rates:

Without preloading Preload GTK+ Preload top libraries

This shows that it would be convienient to modify the linker in order to be able to configure it at run-time or compile-time to read the whole library at load time. Therefore, a one-line patch to the glibc dynamic linker was written. The patch reduces total startup time by ~3 seconds:

Stock ld.so Patched ld.so

The patch is Linux-specific, and may not be ideal for all situations, but it has been submitted to the glibc developers in the hope that it will be included at least as a configurable setting.

Caching bonobo-activation-server files

Analyzing the I/O log shows that bonobo-activation-server causes a sizable percentage of seeks:

$ ./topseeks-byprocess.sh io.log 5
    122 nautilus
    109 gnome-session.r
     94 Xorg
     74 bonobo-activati
     73 gdm-binary

Further analysis shows that almost all of these reads are the files under lib/bonobo/servers/, 58 files in total:

$ ./analyzereads.py io.log | grep bonobo-activati
(bonobo-activati/3757): /usr/local/gnome/lib/bonobo/servers/GNOME_Gok.server 0-0
(bonobo-activati/3757): /usr/local/gnome/lib/bonobo/servers/GNOME_NetstatusApplet_Factory.server 0-5
(bonobo-activati/3757): /usr/local/gnome/lib/bonobo/servers/GNOME_GtikApplet.server 0-5
[...]
(bonobo-activati/3757): /usr/local/gnome/lib/bonobo/servers/GNOME_MixerApplet.server 0-2
$ ./analyzereads.py io.log | grep bonobo-activati | grep /lib/bonobo/servers/ | wc -l
58

This is a substantial amount of I/O: even on an otherwise idle system, reading the files takes more than 1.6 seconds:

$ time cat /usr/local/gnome/lib/bonobo/servers/*.server > /dev/null

real    0m1.626s
user    0m0.003s
sys     0m0.014s

Since the .server files do not change often, an obvious improvement to startup time would be to maintain a cache of all the .server files in a single file, which would be rebuilt every time a file is modified or a new file is added. This has not been tested, but we can assume it would save about 1 second of startup time, since if the files are all concatenated to a single file and the caches are flushed, reading the file requires less than 0.25 seconds:

$ time cat /tmp/bonobo.cache > /dev/null

real    0m0.238s
user    0m0.001s
sys     0m0.004s

Summary of proposed changes

Proposed change
Benefit
(seconds)
ReferenceStatus
Merging gconf defaults 7 bug 316672 
Separating gconf translations2bug 316672 
gconf incremental reading 2 bug 316673
FIXED
Lazy loading screensaver 1 bug 316674 
Using mcpp instead of cpp 1.5 fd.o bug 4325 
Patching ld.so to read whole libraries 3 patch, libc-alpha thread 
Caching bonobo-activation files1?  
Total 17.5   

Effect of the icon cache

GTK+ includes an icon cache which aggregates all the icons in an icon theme into one binary file. The cache can operate in two modes: normal (full) mode, which stores the whole icons in the cache, and index mode, which only stores information about the icons in the cache. The effect of the icon cache on startup time is limited, because most icons are lazily loaded when required: startup with the index cache is about 0.5 seconds faster than without, and starting up with the full cache is about the same as without a cache.

No Cache Index Cache Full Cache

The presence of the full icon cache is likely to improve performance when many icons need to be loaded, such as when the application menus are loaded, in the icon picker, etc.
<fixme> mclasen says the cache is mmaped
However, the obvious tradeoff is that loading all the icons together, whether they are necessary or not, consumes more memory than loading only those icons that are needed: the icon cache files for the gnome and hicolor themes (both of which are in use with the test settings) are 3.4MB and 5.9MB respectively.

Other factors contributing to slow perceived startup times

X server startup

Although the X server is not part of GNOME itself, since the X server must be started to start a GNOME session, bottlenecks in X server startup will of course directly contribute to GNOME startup time.

X server startup was not studied in detail, but bootcharts showed the presence of a 5-second gap in which the system was apparently completely idle:

Font server configured

Running strace X revealed that the idle time was due to the X server attempting to connect to the font server: if the font path configured in the X server configuration file contains a reference to a font server and the font server is not running, the X server will attempt to connect to the font server 6 times, sleeping 1 second between attempts:

15:53:24.193820 socket(PF_FILE, SOCK_STREAM, 0) = 11
15:53:24.194258 connect(11, {sa_family=AF_FILE, path="/tmp/.font-unix/fs7100"}, 24) = -1 ENOENT
15:53:24.194472 nanosleep({1, 0}, {1, 0}) = 0
15:53:25.212607 connect(11, {sa_family=AF_FILE, path="/tmp/.font-unix/fs7100"}, 24) = -1 ENOENT
15:53:25.212881 nanosleep({1, 0}, {1, 0}) = 0
15:53:26.214778 connect(11, {sa_family=AF_FILE, path="/tmp/.font-unix/fs7100"}, 24) = -1 ENOENT
15:53:26.215106 nanosleep({1, 0}, {1, 0}) = 0
15:53:27.216748 connect(11, {sa_family=AF_FILE, path="/tmp/.font-unix/fs7100"}, 24) = -1 ENOENT
15:53:27.217252 nanosleep({1, 0}, {1, 0}) = 0
15:53:28.219593 connect(11, {sa_family=AF_FILE, path="/tmp/.font-unix/fs7100"}, 24) = -1 ENOENT
15:53:28.220038 nanosleep({1, 0}, {1, 0}) = 0
15:53:29.222426 connect(11, {sa_family=AF_FILE, path="/tmp/.font-unix/fs7100"}, 24) = -1 ENOENT
15:53:29.222557 close(11)               = 0

Removing the reference to the font server in the X server configuration file eliminated the idle time:

f
file:///home/lor/work/soc/analysis/fontserver-off-small.png
Font server not configured

This is essentially a configuration error, albeit a common and innocent-looking one with an unexpectedly large penalty to startup time.

Opening the main menu

Strictly speaking, the time it takes to open the main menu ("foot menu") does not count as startup time. However, the main menu takes several seconds to open, and since it is likely to be the first thing that a user does when GNOME starts, it is interesting to see if anything can be done to speed this up.

To determine what happens when the main menu is opened, gnome-panel was killed, and once automatically reloaded, strace was used to attach to the gnome-panel process. Then the menu was opened:

$ killall gnome-panel
$ pidof gnome-panel
24726
$ strace -tt -p 24726 -e trace=open 2> /tmp/footmenu.strace

$ cat /tmp/footmenu.strace
[...]
15:27:42.988374 open("/etc/xdg/menus/applications.menu", O_RDONLY|O_LARGEFILE) = 19
15:27:43.034545 open("/etc/xdg/menus/debian-menu.menu", O_RDONLY|O_LARGEFILE) = 19
15:27:43.058884 open("/etc/xdg/menus/applications-merged", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
15:27:43.059678 open("/home/gnome/.config/menus/applications-merged", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
15:27:43.060491 open("/usr/local/gnome/share/applications", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 19
15:27:43.097767 open("/usr/local/gnome/share/applications/bonobo-browser.desktop", O_RDONLY|O_LARGEFILE) = 23
15:27:43.124535 open("/usr/local/gnome/share/applications/gnome-about.desktop", O_RDONLY|O_LARGEFILE) = 23
[...]
15:27:47.454771 open("/usr/local/gnome/share/icons/hicolor/24x24/apps/gnome-logout.png", O_RDONLY|O_LARGEFILE) = 40
$ wc -l /tmp/footmenu.strace
372 /tmp/footmenu.strace
$ grep -c /applications/ /tmp/footmenu.strace
285

The strace output of shows that opening the foot menu takes about 4 seconds, and requires opening 372 files, mostly desktop shortcuts. It would be convenient to reduce the number of files opened and thus the number of seeks, perhaps using a desktop shortcut cache.

Conclusions

It's clear that GNOME startup time is I/O bound, and is dominated by disk seeks. The proof-of-concept modifications proposed allow the reduction of GNOME startup time by close to 50%, showing that there is room for improvement. However, some of these modifications are hacks and will take time to integrate into the main code base. It is hoped that they will be part of GNOME 2.14.

Further Work

Look at effect of stripped libraries
Look at effect of optimizing for size
Look at diskstat, http://ds9a.nl/diskstat/

TODO

Document prelink non-benefits
Do "all changes" benchmark run
Add i/o breakdown?
Think about graphing I/O:
  1. lor krh: so you were thinking of a line graph of seeks over time
    lor (or I/O blocks read over time)
    lor every line would be a different process, right?
  2. top i/o contributors shown as graph?

Acknowledgements

Thanks go to the Google Summer of Code program, to Owen Taylor, my SoC mentor, who supported me above and beyond the call of duty, to Ziga Mahkovec for writing bootchart, and to Søren Sandmann and the other helpful people in #fedora-desktop: mclasen, jrb, and all the others I should remember but can't. Bugs and mistakes are my fault.