Input to NCITS V2: A Multi-modal Computing Framework to Universal Access
StŽphane H. Maes, smaes@us.ibm.com
Shari Trewin, trewin@us.ibm.com,
IBM
11/14/01
The input presentation addresses different multi-modal computing concepts pertinent to the NCITS V2 activities and currently considered or developed within other standard forums:
o Device-independent authoring based on the introduction of an interaction logic layer for developing applications plus customization meta-data.
o A DOM-based MVC architecture to synchronize browsers and therefore that:
o provides multi-modal access ãaudio and/or GUIä to applications.
o implements the concepts of ãremote consoleä as a multi-device browser
We recommend that NCITS V2 considers these approaches in order to fit what we believe will be the evolution of the web (fixed and mobile).
We propose considering a device-independent framework (runtimes and authoring), also presented at the W3C Device-independent working group. As discussions of these activities are limited to W3C members, we will limit the discussion to IBMâs view and proposals in this matter.
This framework relies on the introduction of an abstract layer, the interaction logic layer, that captures an abstract description of the interaction with the application as intended by the author. In our experience, this layer can be built re-using existing W3C specifications:
o XForms to describe the data model manipulated by the user
o XForms UI and some extensions (needed depending on the features that should be supported) to describe the interaction: how the user manipulates the data model.
o XHTML events
o Xlink
o Xpath
Following the terminology introduced in the device-independence principles working draft, it is then possible to consider delivery systems (runtimes) that will be able to adapt the interaction logic layer representation into functional presentations for a particular delivery context. The concept of delivery context includes different user agents, different devices, different modalities etc· The specification of the vocabularies and protocols used for exchange of such delivery contexts is also important and is still incompletely specified today.
In general, in order to satisfy the userâs and service providerâs desire for optimized user experience for the delivery context, it is important to provide a means to produce customized presentations for these delivery contexts, i.e. the necessary meta-information to enable the author to exactly specify the presentation that should be produced by the adaptation process and sent to the user agent.
We define multi-modal, computing as the capability to share different channels (i.e. delivery context) simultaneously or sequentially available to the user to interact with the application.
With such a capability we expect to ultimately offer to the user the possibility to decide at any time, what is the channel(s) that is (are) the most appropriate to perform a particular interaction in a particular situation. This may change based on user preferences, the nature of the interaction, the access mechanisms that are available, the environment and the activity of the user, etc·
The exact granularity of the synchronization (sequential, page level, block level, slot level, event level, totally merged) between channels and the degree of freedom to switch (imposed by author, shared, at any time) depend on the infrastructure and the authoring methodology.
Different authoring approaches can be considered for such applications. The interaction-logic-based approach described as part of the DI framework implicitly supports synchronization granularity and freedom to switch.
The proposed architecture relies on existing standard interfaces and work items: DOM and remote DOM (SOAP). As such it can be used with existing browsers that implement at least DOM L2.
Standardization must be pursued to promote and specify DOM interface for some categories of user agents and support by the infrastructure. Such on-going work and proposals are taking place at standard bodies like ETSI, WAP Forum, 3GPP and W3C. This may lead to wide adoption and interoperability of solutions based on the same architecture.
We illustrate different implementation alternatives for multi-modal browsers based on the proposed architecture. In particular, we discuss the value proposition of such approaches over mono-channel and multi-channel applications.
It is directly applicable to universal access and accessibility.
As defined earlier, multi-modal computing encompasses cases where we synchronize different devices or user agents instead of different modalities.
This has an interesting consequence: with the proposed authoring and infrastructure, it becomes very simple to provide the capability to the user to interact simultaneously, in a synchronized manner, or sequentially through different devices with a single application.
This way it is possible to implement remote consoles as envisaged in the NCITS V2 description document with todayâs DOM L2 compliant browsers.
The use of DOM and SOAP and the proposed MVC architecture as well as the authoring makes such an approach aligned with the evolution of the web and as explained earlier it has a significant chances of being widely supported in the future by numerous key fixed and wireless infrastructures.