7.4. Case-insensitive String Comparing

As described above in the "GtkCellRendererText, UTF8, and pango markup" section, all strings that are to be displayed in the tree view need to be encoded in UTF8 encoding. All ASCII strings are valid UTF8, but as soon as non-ASCII characters are used, things get a bit tricky and the character encoding matters.

Comparing two ASCII strings ignoring the case is trivial and can be done using g_ascii_strcasecmp, for example. strcasecmp will usually do the same, only that it is also locale-aware to some extent. The only problem is that a lot of users use locale character encodings that are not UTF8, so strcasecmp does not take us very far.

g_utf8_collate will compare two strings in UTF8 encoding, but it does not ignore the case. In order to achieve at least half-way correct linguistic case-insensitive sorting, we need to take a two-step approach. For example, we could use g_utf8_casefold to convert the strings to compare into a form that is independent of case, and then use g_utf8_collate to compare those two strings (note that the strings returned by g_utf8_casefold will not resemble the original string in any recognisable way; they will work fine for comparisons though). Alternatively, one could use g_utf8_strdown on both strings and then compare the results again with g_utf8_collate.

Obviously, all this is not going to be very fast, and adds up if you have a lot of rows. To speed things up, you can create a 'collation key' with g_utf8_collate_key and store that in your model as well. A collation key is just a string that does not mean anything to us, but can be used with strcmp for string comparison purposes (which is a lot faster than g_utf8_collate).

It should be noted that the way g_utf8_collate sorts is dependent on the current locale. Make sure you are not working in the 'C' locale (=default, none specified) before you are wondering about weird sorting orders. Check with 'echo $LANG' on a command line what you current locale is set to.

Check out the "Unicode Manipulation" section in the GLib API Reference for more details.