5.5. GtkCellRendererText, UTF8, and pango markup

All text used in Gtk+-2.0 widgets needs to be in UTF8 encoding, and GtkCellRendererText is no exception. Text in plain ASCII is automatically valid UTF8, but as soon as you have special characters that do not exist in plain ASCII (usually characters that are not used in the English language alphabet), they need to be in UTF8 encoding. There are many different character encodings that all specify different ways to tell the computer which character is meant. Gtk+-2.0 uses UTF8, and whenever you have text that is in a different encoding, you need to convert it to UTF8 encoding first, using one of the GLib g_convert family of functions. If you only use text input from other Gtk+ widgets, you are on the safe side, as they will return all text in UTF8 as well.

However, if you use 'external' sources of text input, then you must convert that text from the text's encoding (or the user's locale) to UTF8, or it will not be rendered correctly (either not at all, or it will be cut off after the first invalid character). Filenames are especially hard, because there is no indication whatsoever what character encoding a filename is in (it might have been created when the user was using a different locale, so filename encoding is basically unreliable and broken). You may want to convert to UTF8 with fallback characters in that case. You can check whether a string is valid UTF8 with g_utf8_validate. You should, in this author's opinion at least, put these checks into your code at crucial places wherever it is not affecting performance, especially if you are an English-speaking programmer that has little experience with non-English locales. It will make it easier for others and yourself to spot problems with non-English locales later on.

In addition to the "text" property, GtkCellRendererText also has a "markup" property that takes text with pango markup as input. Pango markup allows you to place special tags into a text string that affect the style the text is rendered (see the pango documentation). Basically you can achieve everything you can achieve with the other properties also with pango markup (only that using properties is more efficient and less messy). Pango markup has one distinct advantage though that you cannot achieve with text cell renderer properties: with pango markup, you can change the text style in the middle of the text, so you could, for example, render one part of a text string in bold print, and the rest of the text in normal. Here is an example of a string with pango markup:

"You can have text in <b>bold</b> or in a <span color='Orange'>different color</span>"

When using the "markup" property, you need to take into account that the "markup" and "text" properties do not seem to be mutually exclusive (I suppose this could be called a bug). In other words: whenever you set "markup" (and have used the "text" property before), set the "text" property to NULL, and vice versa. Example:


  ...

  void
  foo_cell_data_function ( ... )
  {
    ...
    if (foo->is_important)
      g_object_set(renderer, "markup", "<b>important</b>", "text", NULL, NULL);
    else
      g_object_set(renderer, "markup", NULL, "text", "not important", NULL);
    ...
  }

  ...

Another thing to keep in mind when using pango markup text is that you might need to escape text if you construct strings with pango markup on the fly using random input data. For example:


  ...

  void
  foo_cell_data_function ( ... )
  {
    gchar *markuptxt;

    ...
    /* This might be problematic if artist_string or title_string
     *   contain markup characters/entities: */
    markuptxt = g_strdup_printf("<b>%s</b> - <i>%s</i>",
                                artist_string, title_string);
    ...
    g_object_set(renderer, "markup", markuptxt, "text", NULL, NULL);
    ...
    g_free(markuptxt);
  }

  ...

The above example will not work if artist_string is "Simon & Garfunkel" for example, because the & character is one of the characters that is special. They need to be escaped, so that pango knows that they do not refer to any pango markup, but are just characters. In this case the string would need to be "Simon &amp; Garfunkel" in order to make sense in between the pango markup in which it is going to be pasted. You can escape a string with g_markup_escape (and you will need to free the resulting newly-allocated string again with g_free).

It is possible to combine both pango markup and text cell renderer properties. Both will be 'added' together to render the string in question, only that the text cell renderer properties will be applied to the whole string. If you set the "markup" property to normal text without any pango markup, it will render as normal text just as if you had used the "text" property. However, as opposed to the "text" property, special characters in the "markup" property text would still need to be escaped, even if you do not use pango markup in the text.