Linux Screen Reader in Retrospect

Evolving the Architecture for an Open Access Engine

Authors: Peter Parente
Addresses:pparent@us.ibm.com
Date: 2007-06-20
Revision: 932
Status: final
Copyright: Copyright © 2007 IBM Corporation under the BSD License
Source:http://svn.gnome.org/svn/lsr/trunk/doc/retro

Abstract

The Linux Screen Reader (LSR) architecture is based on numerous IBM assistive technologies of the past: Screen Reader for DOS (SR DOS), Screen Reader for OS/2 (SR/2), Home Page Reader (HPR), and the Java Self-Voicing Development Kit (SVDK). The underlying design concepts have evolved over time as technologies have changed and best practices have been formulated.

The initial for LSR closely matched that of SVDK. Over the past two years, some pieces have changed to account for features specific to the Python language, the need to provide multiple user interfaces, the need to produce accessible configuration panels per extension, the desire to support a cross-platform core, and the desire to support other functions besides screen reading. Some core concepts that could have changed to better support these and other requirements remained static, however, to avoid breaking backward compatibility with existing LSR extensions.

Now that LSR is no longer developed as a screen reader, breaking backward compatiblity with the existing screen reader extensions is a non-issue. This document suggests further refinements to the LSR architecture as an open platform for building assistive technologies. Ideally, the recommendations listed herein should be considered and executed before the LSR core is re-used in new endeavors.

Contents

1   The Deep Core

1.1   pyLinAcc

pyLinAcc is our original Python wrapper for AT-SPI on GNOME. We included it in LSR because an official Python binding did not exist at the project outset. Such a binding does exist today: pyatspi.

pyLinAcc should be removed from LSR in favor of using this new module. Such a migration involves changing the code in Adapters/AT-SPI to pyatspi from pyLinAcc. The decorators on the adapter methods can be removed as pyatspi now raises Python exceptions, not CORBA exceptions. Caching of AT-SPI interfaces can also be safely enabled when the AT-SPI adapters are imported, but we do not recommend enabling property caching.

In addition, the EventManager class need not queue raw AT-SPI events anymore as a way of decoupling LSR from the rest of the desktop. pyatspi subsumes this feature.

1.2   Event Manager

The EventManager class currently derives from pyLinAcc.Event.Manager. This close coupling prevents the use of other platform accessibility API bindings. This inheritence should be changed to composition. EventManager should try to load all possible platform API bindings on initialization, and use the one that succeeds. EventManager should provide an interface for other managers (e.g. ViewManager) to add and remove observers for platform specific events. The other managers should continue to call methods on the IEventHandler interface to get the appropriate raw event identifiers for the given internal AEEvent type.

This work on EventManager might be best accomplished when migrating to pyatspi as described above.

1.3   AccAdapt

AccAdapt selects the best Adapter object for the given AEInterface based on the result of a static when() method defined by the Adapter subclass. Methods of narrowing the number of Adapters to test based on toolkit and application names will improve performance and enahance the ability of Adapters to unify access across applications. The Adapter class might grow properties called when_toolkit and when_application to indicate

1.4   Default Adapters

Some existing AT-SPI Adapters do not define when() methods. AccAdapter registers these Adapters as defaults applying to all objects when no other more appropriate Adapter is available. The concept of a default is worth retaining, but the existing Adapters are not truly defaults. They do require that the objects being wrapped are PORs or accessibles for AT-SPI. These Adapters should grow when() methods to avoid strange errors when other objects are accidentally passed to the AEInterfaces or other platform APIs are adapted.

1.5   New Extension Types

The UIRegistrar recognizes five classes of user interface elements (UIEs) at present: Perk, AEMonitor, AEChooser, AEInput, AEOutput. All of these extensions are subclasses of UIElement, a class providing basic metadata.

Three additional extention points exist in LSR, but management of such extensions is not current supported by the registrar. A new primary class, DevElement, should be defined allowing AETools, AEAdapters, and AEWalker classes to become known extension types managed by a Registrar (renamed from UIRegistrar).

1.5.1   Tool Extensions

Perks and Tasks use the Tools API to query accessible objects for properties, contact the UIRegistrar, register and chain Tasks, generate output, bind Tasks to input gestures, and so forth. Some of the Tools API, especially the say() method and it derivatives, are provided specifically to aid screen reader development.

It makes sense to refactor the Tools API such that sets of Tools can be installed using the Registrar and loaded under certain profiles, depending on the purpose of the profile. The Tools API is already split into a number of classes: View, Input, Output, System, and Util. These should be further split and then placed under Registrar management.

1.5.2   Adapter Extensions

Adapters are hard-wired into the LSR system today through import statements in the Adapters package. Some projects may which to provide their own Adapter classes without modifying the core code to import them directly. The Registrar should take over management of adapters, allowing them to be installed and used from any location on disk.

1.5.3   Walker Extensions

Walkers are directly imported by the Tools API modules. The Tools API methods for navigating previous, next, etc. are hard wired to use certain Adapters. These methods should change to allow a Perk to name the Walker it wishes to use when navigating over accessible objects. The Registrar should manage the installation and use of new Walkers from which Perks can select.

1.6   Chaining Performance

We put some optimization into how chained Tasks are computed and executed in the 0.5.3 release of LSR. We meant the existing code as a stepping stone for even more optimization, namely the caching of precomputed chains. Once a chain has been executed in response to an event or named Task invocation, the full ordered history of Task names that executed is available. This list should be hashed under the name of the event or Task that triggered it. The cached copy remains valid until a Perk or Task registers a new Task or creates a new chain. In response, the Tier should invalidate one or more of its cached chain histories.

1.7   Continuations

Some Tasks are responsible for performing long operations. For instance, a Task might try searching a very long document in a Web browser for a heading. In the existing architecture, this operation must run to completion before the next event may be handled. If the ability to interrupt is desired, the work of the Task must be manually split across invocations of a Timer Task.

Python generators offer interesting prospects for cooperative-multitasking in scripts. A Task could use the yield statement to cease processing temporarily, and return a value indicating whether no other Task should be allow to run until this one completes, whether other Tasks should be allowed to run for this event before the paused Task completes, or whether all Tasks for any events should be allowed to run before the paused on completes. The Tier would later resume execution of the Task by reinvoking it as a generator. The Task could then choose to continue or stop processing based on what had happened in the meantime (e.g., user press of a cancelation key). Yielding in Tasks could also allow other main loop processes to run (e.g., processing raw accessibility events) while the long-running Task is paused.

2   Scripting

2.1   Perks and Tasks

In the design carried over from SVDK, Perk extensions are modules containing a Perk subclass of the same name as the module and a number of Task subclasses acting as event handlers. Typically, a Task shares state with other Tasks across events by storing data in Perk instance variables. Tasks may maintain their own instance variables for data that is needed by individual Tasks alone.

This design is not very Pythonic. In Python, methods are objects themselves, and may be treated as objects in all operations. Tasks are nothing more than event handlers, which in most event-driven programs, are methods, not custom class instances.

We suggest removing the idea of a Task in favor of having methods on a Perk or other object act as event, input, or named handlers. Having separate Task instaces does improve separation of concerns in some cases, but, more often than not, it forces various Tasks to refer to the Perk object to store or retrieve data that must be shared among all Tasks. Such references are cumbersome and more expensive, as is the existence of N Task instances for nearly N different events across M Tiers. Consider the following code generated as a new Perk in the current architecture:

class Foobar(Perk.Perk):
  def init(self):
    self.registerTask(HandleFocusChange('read test focus'))
    self.registerTask(ReadPerkName('read perk name'))
    kbd = self.getInputDevice(None, 'keyboard')
    self.addInputModifiers(kbd, kbd.AEK_CAPS_LOCK)
    self.registerCommand(kbd, 'read perk name', False,
                       [kbd.AEK_CAPS_LOCK, kbd.AEK_A])

  def getDescription(self):
    return _('Does little of interest.')

class ReadPerkName(Task.InputTask):
  def execute(self, **kwargs):
    self.stopNow()
    self.sayInfo(text='%s handling an input gesture' % self.perk.getName())

class HandleFocusChange(Task.FocusTask):
  def executeGained(self, por, **kwargs):
    # stop current output
    self.mayStop()
    # don't stop this output with an immediate next event
    self.inhibitMayStop()
    self.sayInfo(text='%s handling a focus change' % self.perk.getName())

In the proposed architecture, the Tasks would become methods as shown in the following:

class Foobar(Perk.Perk):
  def init(self):
    # Tasks register under names only
    self.registerTask('read test focus', self.onInput)
    self.registerTask('read perk name', self.onFocusGained)

    # bind a Task to an input gesture
    kbd = self.getInputDevice(None, 'keyboard')
    self.addInputModifiers(kbd, kbd.AEK_CAPS_LOCK)
    self.bindToGesture('read perk name', kbd, False, [kbd.AEK_CAPS_LOCK,
                                                      kbd.AEK_A])

    # bind a task to an event
    self.bindToEvent('read test focus', AEEvent.FOCUS_GAINED)

  def getDescription(self):
    return _('Does little of interest.')

  def onInput(self, **kwargs):
    self.stopNow()
    self.sayInfo(text='%s handling an input gesture' % self.getName())

  def onFocusGained(self, por, **kwargs):
    # stop current output
    self.mayStop()
    # don't stop this output with an immediate next event
    self.inhibitMayStop()
    self.sayInfo(text='%s handling a focus change' % self.getName())

Tasks should be defined as methods with string identifiers. Task identifiers could then be bound to input gestures, to events identified by constants, to chooser signals, and to timer messages. Task methods could be chained or blocked as they are today. A second method provided to the registerTask() call could account for the ability the update() method provides today: execution when a prior Task returns False.

2.2   Tool Objects

At present, Tasks and Perks derive from the Task.Tools.All base class, which, in turn, derives from all known tool classes. Inheritence is misused in this situation as neither a Task nor Perk is a Tool, rather they use Tools. Since the Tool base class maintains state, each Task needs to be pre-executed before it is executed on every event or named invocation.

A single tool object should exist per Tier. The Tool object should be updated with new state when needed by certain events or Task actions. The Tool object should be provided to each Task as a parameter when it executes. This change from inheritence to composition facilitates Tool Extensions as well.

In addition, the task_por attribute of Task.Tools should be removed entirely. This extra state information was originally conceived as a convenience. In practice, it causes tremendous confusion as Perks and Tasks may not share the same {{task_por}} when their methods are invoked. All PORs given to Tool objects should be given explicitly as parameters.

2.3   Event Parameters

The number and type of parameters passed to Tasks differs based on the kind of event. All Tasks may also define optional keyword arguments to be provided by other Tasks when invoked using doTask. The lack of a standard for event Task parameters makes Task definition and invocation more complex than it needs to be.

Assuming Tasks are now methods, the standard signature should be the following:

taskName(self, tools, **kwargs)

The Tools object should be reponsible for returning specific details about the last event received through methods such as getEventPOR(), getEventOffset(), getEventText(), and so on. A Task method may define other optional keyword parameters as it sees fit, but should always retain **kwargs for forward compatibility.

3   Input and Output

3.1   Semantic Tags

We originally designed the tagging system such that a Perk could send text output to any device tagged with semantic information about what the text described (e.g. a widget role, the number of items in a container). The ideas was that the same output would be sent to all devices, and the devices would be smart enough to know how to render the content based on the tag and the user settings for the tag.

The tagging system worked well for speech, but was limiting for a multi-interface configuration combining speech, Braille, and magnification. In most cases, we did not want to send the same content to all three devices. And in the case of magnification, we did not even want to tag the content as it was typically screen coordinates only.

The tag system also became a hassle for persisting and configuring user settings. For instance, some users wanted to use the same voice for all semantic tags rather than the default mappings of certain semantics to certain voices. We never got around to giving the user this level of control because of complications designing a UI showing the global settings and per semantic tag settings in a meaningful, usable way.

A much simpler, effective design would be to remove the tag concept and move all concept of device setting persistence and control to Perks. One style is in effect at all times per AEOutput device per Tier. A Perk may programmatically modify the parameters of this style to affect output or it may expose settings a person may use to configure how the device behaves. The device need only apply a new style when either the style object received with output is different than the last one received or if the style object is marked dirty: essentially the same processing it does today.

The key benefit of this new design is that the objects that understand the context in which output is being generated have greater control over the style used to present them. This design does not preclude giving users complete control over presentation, as was our original intent. Perks just need to be written to give the user such control.

4   Monitors

4.1   Saving State

Monitors have no way to save state at present. This is a problem when a developer chooses a set of events to watch, restarts LSR, and wishes to continue the same monitoring. The user must reconfigured all of the settings in the monitor.

Monitors should able to associate with an AEState objects which are loaded when the monitor is created and saved when the monitor is destroyed, just like Perks. This feature would solve the problem of lost configuration time for developers.

4.2   Building Menus

Monitors currently build their menus based on the subclasses of Tasks and AEEvents registered with their respective packages, as well as with constants representing the different kinds of AEOutput and AEInput messages. If Tasks are refactored as methods, building menus based on object types no longer holds. Either the kinds of information to log are hard-coded into the monitors, all al monitors must map sets of constants to human-readable names representing the kinds of events that may be logged.

5   Other Changes

5.1   Project Naming

The name Linux Screen Reader has always been a misnomer, and is even more so now. Only the extensions IBM was developing made this project a screen reader. The core is a reusable platform for building software that receives application and user events, and responds by producing output on one or more devices.

Since the event loop in the LSR core is termed the AccessEngine, we suggest Open Access Engine (OAE) as a decent moniker for the core project. Projects building off of OAE could provide their own names for their extensions.

5.2   Class Names

Everything should stick with the AE prefix naming convention. Perk should definitely become AEScript. Adapters should perhaps turn into AEAdapters. Walkers might be recoined as AEWalkers. Tools might become AETools. While having the prefix might seem silly, there are likely to be quite a few other modules in the Python path named Tools, Scripts, Adapters, and so forth. Worse yet, scripts and other extensions might want to distinguish these external modules from the internal ones, and import both.

5.4   Minor Updates

The LSR Bugzilla module contains a number of other reports about changes and enhancements that are also worth considering.

6   Revision History

2007-06-20 Peter Parente <pparent@us.ibm.com

2007-06-20 Peter Parente <pparent@us.ibm.com

2007-06-09 Peter Parente <pparent@us.ibm.com

2007-06-08 Peter Parente <pparent@us.ibm.com>