Go forward in time to October 2006.
Good Agile, Bad Agile. Excellent article on programming fads.
Wynton Marsalis is great hacking music.
I've integrated Rosanna Yuen's article on freedesktop.org standards into the GNOME Guide for ISVs. Check it out!
A while ago Ryan Lortie started a quest to fix poll()-happy applications. Applications that wake up several times per second, even if they are apparently idle, are almost surely doing something wrong. This wastes battery power and raises your electricity bill.
You can know if an application is waking up frequently by running strace on it. For example,
cacharro$ strace -ttt -p `pidof nautilus` Process 3339 attached - interrupt to quit 1158260315.503199 gettimeofday({1158260315, 503226}, NULL) = 0 1158260315.503290 ioctl(3, FIONREAD, [0]) = 0 1158260315.503357 gettimeofday({1158260315, 503369}, NULL) = 0 1158260315.503409 poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=14, events=POLLIN}], 5, 79) = 0 1158260315.581607 gettimeofday({1158260315, 581627}, NULL) = 0 1158260315.581680 ioctl(3, FIONREAD, [0]) = 0 1158260315.581736 gettimeofday({1158260315, 581749}, NULL) = 0 1158260315.581788 poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=14, events=POLLIN}], 5, 0) = 0 1158260315.581883 ioctl(3, FIONREAD, [0]) = 0 1158260315.581935 gettimeofday({1158260315, 581948}, NULL) = 0 1158260315.581987 poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=14, events=POLLIN}], 5, 71) = 0 1158260315.654240 gettimeofday({1158260315, 654307}, NULL) = 0
Something in Nautilus is waking up roughly every 0.08 seconds. That's 12 times per second! How do you find the culprit code?
In GNOME applications, the main place where poll() gets called is deep inside the bowels of Glib. Various file descriptors get poll()ed: the FD for the connection to the X server, any pipes and sockets which you may have opened, etc. In addition, this poll() at the core of the main loop is where things like timeouts and idle callbacks are scheduled.
When poll() returns (i.e. when your process wakes up), Glib's main loop code sees which callbacks it needs to call. Maybe there's a callback associated to input on a file descriptor, or maybe there's a timeout handler. So, how do you find which callback is getting called, so that you can fix the surrounding code to not schedule the callback if possible?
First, compile your Glib with debugging information, as you'll need to set breakpoints in its internals. Then, look at gmain.c and look for the code inside g_main_dispatch() that looks like this:
(*depth)++; need_destroy = ! dispatch (source, callback, user_data); (*depth)--;
Glib's main loop code is highly pluggable, so that you can define your own sources of events. All the normal things (file descriptors, timeouts, idle handlers) are sources of events. Each of these types of sources installs a dispatch handler, which takes care of invoking the user's callback as appropriate. If you called g_timeout_add (1000, my_callback, my_data), then the internal dispatch handler in glib will be g_timeout_dispatch(), and the user's callback will be my_callback().
The internal dispatchers in Glib are these:
For most applications, it is sufficient to set a breakpoint in each of these functions. When the breakpoint triggers, examine the "callback" argument in the dispatch handler. This is your program's callback, which lets you now the culprit code.
For example, this is what I get if I break in g_timeout_dispatch() in Nautilus.
(gdb) break g_timeout_dispatch Breakpoint 2 at 0xb6f32664: file gmain.c, line 3283. (gdb) cont Continuing. Breakpoint 2, g_timeout_dispatch (source=0x807d490, callback=0xb6c71965 <ik_process_eq_callback>, user_data=0x0) at gmain.c:3283 (gdb)
Aha! So now I can look for that ik_process_eq_callback function.
(gdb) break ik_process_eq_callback Breakpoint 7 at 0xb6c71977: file inotify-kernel.c, line 586. (gdb) cont Continuing. Breakpoint 5, ik_process_eq_callback (user_data=0x0) at inotify-kernel.c:586 586 G_LOCK(inotify_lock);
There we go. That inotify-kernel.c comes from gnome-vfs; I'm running GNOME 2.12 from SLED 10. Fortunately, this bug was fixed last June.
Summary: to find what's causing poll() to wake up, put breakpoints in Glib's dispatchers. Then, look for what user callback they are calling. The code which surrounds that callback is your culprit.
Two papers which everyone should read:
When should a test be automated? This gave me a bunch of epiphanies. Automated tests are often more expensive to write than manual tests, so when should you do automated tests versus manual ones? What bugs can manual tests catch that would be missed by automated tests, and vice-versa? What sort of automated tests do you want to have? Why is "feature code" easier to test, and less buggy than "support code"? How can you distinguish bad tests which become obsolete each time you change the code they are intended to test?
Classic testing mistakes. Do you test isolated functions, instead of related functions which a user will use together? Do you postpone stress-testing until the end? Do you lack an estimate of quality? Do you ignore usability problems — "Testers are restricted to checking whether the product does what was intended, not whether what was intended is useful"? Do you rely too much on beta testing, given that beta testers are early adopters and thus are more tolerant to bugs? Do you think your testing team is responsible for ensuring quality?
Women's Summer Outreach Program
The Women's Summer Outreach Program is over now. Cecilia González, my ass-kicking student, had a project to profile the time it takes for Evolution to switch between its various components (mailer, calendar, contacts, etc.).
Cecilia's report is online, and it's very interesting! In it you can read the details about the careful process she followed to track down and fix this performance problem in Evolution.
That toilet paper is made of ballots.
Today I wanted to upgrade the firmware on our Linksys WPS54G printer sharing dongle. Installing it had been enough of a pain, but it had worked more or less fine after that.
I'll emphasize the "more or less" part. Sometimes when you print pages with images, they fail to print. Originally I thought it was a problem with the dongle's memory getting full or something, but the pages printed fine if I switched my printer driver from Postscript to HPGL. So, what the hell? Is it like in the old days, when typing "+++" would hang up your modem? Other times, the dongle is just flaky and unpredictable. Something will screw it up when printing Postscript, and it will spew garbage to the printer.
Anyway, I downloaded new firmware, and used the WPS54G's administration web page to upload it to the device.
You upload it, and a blank window comes up in the web browser. The second time I tried it, another blank window appeared. The third time, nothing happened. The fourth time, the dongle reset itself properly and advertised the new firmware. Finally.
Does Linksys ever test this shit!?
Today is the last state of the union address by president Fox. The PRD legislators "took over" the legislative palace. Fox arrived to the palace, handed a printed version of his report, and left. Just like that. You could see on TV that his face changed to relief after he handed the report and then returned to his armored car: the kind of relief that says, "fuck this shit, I'm getting out of here".
The legislators will follow up on Tuesday. In the meantime, if you are feeling nostalgic, you can revive a bit of Mexican election fraud in 1988, so that you can see how things have not changed. That time, they never opened up the urns to do a recount, and they burned the ballots. Will it be the same this time?
One particularly ridiculous bit: Martha Hilda González Calderón, a PRI legislator, made a speech during the opening of the session. She said how six years ago it was the PRI itself who made possible an uncontested election, the one that gave the presidency to Fox. She went on and on about how the PRI was honest and the only means to achieve clean elections. Maybe she forgot that the PRI owned the presidency for 70 years.
Like in Animal Farm, we get the same kind of garbage, all over again.
Speaking of polling, Evolution does this about every two or three seconds:
1157159843.226184 poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=26, events=POLLIN}, {fd=67, events=POLLIN}, {fd=22, events=POLLIN}, {fd=18, events=POLLIN}, {fd=43, events=POLLIN}], 9, 0) = 0 1157159843.274366 ioctl(3, FIONREAD, [0]) = 0 1157159843.310148 gettimeofday({1157159843, 318320}, NULL) = 0 1157159843.334151 poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=26, events=POLLIN}, {fd=67, events=POLLIN}, {fd=22, events=POLLIN}, {fd=18, events=POLLIN}, {fd=43, events=POLLIN, revents=POLLIN}], 9, 6989) = 1 1157159845.183995 gettimeofday({1157159845, 184071}, NULL) = 0 1157159845.184193 read(43, "l\4\1\0018\0\0\0I\17\0\0\205\0\0\0\1\1o\0\37\0\0\0/org"..., 2048) = 208 1157159845.184475 read(43, 0x81badd0, 2048) = -1 EAGAIN (Resource temporarily unavailable) 1157159845.184673 ioctl(3, FIONREAD, [0]) = 0 1157159845.184850 gettimeofday({1157159845, 184918}, NULL) = 0
And then, every 15 seconds or so it goes crazy:
1157159850.307707 poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=26, events=POLLIN}, {fd=67, events=POLLIN}, {fd=22, events=POLLIN}, {fd=18, events=POLLIN}, {fd=43, events=POLLIN}], 9, 581) = 0 1157159850.891084 gettimeofday({1157159850, 891156}, NULL) = 0 1157159850.891272 stat64("/opt/gnome/share/pixmaps/gnome", 0xbfeb0d30) = -1 ENOENT (No such file or directory) 1157159850.891479 stat64("/opt/kde3/share/pixmaps/gnome", 0xbfeb0d30) = -1 ENOENT (No such file or directory) 1157159850.891667 stat64("/etc/opt/kde3/share/pixmaps/gnome", 0xbfeb0d30) = -1 ENOENT (No such file or directory)[pid 17873] 1157159850.891857 stat64("/usr/share/pixmaps/gnome", 0xbfeb0d30) = -1 ENOENT (No such file or directory) 1157159850.892041 stat64("/usr/local/share/pixmaps/gnome", 0xbfeb0d30) = -1 ENOENT (No such file or directory) 1157159850.922625 stat64("/opt/gnome/share/icons/gnome", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 1157159850.962701 stat64("/opt/kde3/share/icons/gnome", 0xbfeb0d30) = -1 ENOENT (No such file or directory) 1157159850.983540 stat64("/etc/opt/kde3/share/icons/gnome", 0xbfeb0d30) = -1 ENOENT (No such file or directory) 1157159850.983754 stat64("/usr/share/icons/gnome", 0xbfeb0d30) = -1 ENOENT (No such file or directory) 1157159850.983939 stat64("/usr/local/share/icons/gnome", 0xbfeb0d30) = -1 ENOENT (No such file or directory) 1157159850.984124 stat64("/home/federico/.local/share/icons/gnome", 0xbfeb0d30) = -1 ENOENT (No such file or directory) 1157159850.984314 stat64("/home/federico/.icons/gnome", 0xbfeb0d30) = -1 ENOENT (No such file or directory) 1157159850.990432 stat64("/opt/gnome/share/pixmaps/Industrial", 0xbfeb0d30) = -1 ENOENT (No such file or directory) [... lots more stuff ...]
I've filed a bug for this.
UPDATE: All the stat()ing going on is the fault of a Novell patch to gtk+. I have no idea why the hell it does that.
Go backward in time to August 2006.
Federico Mena-Quintero <federico@gnome.org> Fri 2006/Sep/01 19:46:56 CDT