MOBILE APPLICATIONS: The New Services Apple Cart

VoicenData Bureau
Human beings are multi-modal in nature–they like to see, read, or feel a

certain thing, depending on their comfort level, and choice. Unfortunately,

communication devices are not multi-modal.


Consider the two scenarios below:

Scenario 1:
An area sales manager is about to clinch a big deal when the

client asks him for a special discount. The sales manager must speak to his CEO

for a go-ahead, who is participating in a high-level conclave and won’t talk.












Technology Providers




Multi-modality Means...

Mobile Operators…


will allow them to combine visuals, speech, text, and touch for offering a

diverse range of mobile applications and new services


will act as a key driver for mass adoption of value-added services,

leading to expansion of user base

Mobile Users…


will be  freedom to choose

among multiple modes of

as they wish


use of value-added services

A better and

richer experience while using value-added services

Scenario 2: The sales manager calls up his CEO on a device that runs a

multi-modal platform. Since the latter can’t listen to the call, he switches

over to the text mode. The speech is converted to text and the matter is

displayed on the screen. After quietly going through it, he types in his

instructions and sends it as a mail. The sales manager negotiates accordingly

and closes the deal.


When they first came in, mobile phones catered to our desire to make and

receive calls while on the move. Then, gradually, data applications became hot

on wireless phones. In the next few years the world would have more people

accessing the Internet via mobile devices rather than via PCs. And

multi-modality will be a key enabler of value-added data services and the

wireless Internet, and the mass adoption of these applications.

As mobile operators in India, like elsewhere in the world, strive for

creating new revenue streams, multi-modality can play an important role in not

only differentiating services from rival operators but also delivering value and

convenience to users.

What the Heck Is Multi-modality?

Just like multimedia represents the integration of media types (audio,

video, text, and images), multi-modality signifies the merging of input and

output modes in the user interface of an information appliance (touch-screen,

keypad and voice commands for input; text, images and sounds for output). So if

a mobile user wants information about the restaurants in his locality, he’ll

have the choice of sending the request in text or speech, and then receiving the

information as voice, visual or text, regardless of how the information was

originally created.


Unlike single-mode voice and visual applications that exist today,

multi-modal applications are easier and more intuitive to use.


Trial runs are on

They give users the power to change the method of interaction (visual,

speech, or touch) at any stage of the communication process without having to

end the session.


The choice of mode can vary from circumstance to circumstance. Consider the

example of a user who has subscribed to an hourly sports news update through his

service provider. Now, while sitting in a meeting, the user will prefer a

silent, visual display of the update, comprising text and graphics. On the other

hand, while driving, he will prefer to have the information in voice mode.

The choice of mode can also vary from person to person. Some people prefer

interacting in voice mode, while others are more comfortable with a text or

touch-pad interface. Multi-modality lets users define a preferred mode.

Do Mobile Operators Need It?

As value-added data services and the mobile Internet gather steam, and voice

increasingly becomes a secondary application, mobile operators will have to make

sure that services are accessible to users as conveniently as possible. While

the user will define ‘convenience’, multi-modality will be a key enabler for



A recent survey by the US-based In-Stat/MDR’s Wireless Internet Panel

indicated that multi-modality would be a key wireless Internet enabler. Defining

multi-modality as the ability to deliver and receive information either

visually, or as voice, or through combinations of both voice and visual modes,

the panel said that the key benefit of the technology to users was flexibility.

A majority of respondents of the In-Stat/MDR survey indicated that there were

times when they would like to choose the way information was delivered to them.

The survey found that given the situation of driving to a place they had

never been to before, half of the respondents preferred to view a map on the

mobile device. Similarly, three out of four respondents preferred to speak new

contact information into their cellphones rather than key it in. Two out of

three respondents said they would like to have the choice between reading or

listening to e-mail messages over their mobile devices. Half of the respondents

were interested in responding to an SMS or e-mail message by placing a call or a

recorded voice reply. Also, assuming that their mobile devices had the

capability, half of the respondents said they would like to view graphics or

text information while talking.

"For mobile operators, easy-to-use multi-modal interfaces and compelling

applications will result in increased subscriber application usage and adoption.

This will lead to increased revenues for carriers, and provide them a unique

competitive advantage," Inderpal Singh Mumick, founder and CEO, Kirusa,

observes. Kirusa, an US-based multi-modal platform provider, is doing the trail

of its multi-modal platform with France Telecom and Bouygues Telecom, who are

looking at offering multi-modal services.


There are many more mobile operators in Europe and North America doing

lab/user trials with multi-modal applications.




Platform Trials

According to Atul Suri, director (strategic marketing), V-Enable Inc, Orange

in Mumbai is the only global operator that has launched what he calls a ‘pseudo-multi-modal’

application. The application uses voice-based input and SMS-based output for

cricket lovers in India to deliver score updates over their phones. "The

multi-modal services that will ultimately be adopted will have an utilitarian

value," he observes. V-Enable Inc has developed veMail–a multi-modal

e-mail application–where the user can ‘see’ his e-mail and ‘listen’ to

a selected e-mail for better comprehension.


The Applications

Multi-modality can be used in a diverse set of applications to enrich user

experience. It can also add value and convenience to the use of e-mail,

voicemail, instant messaging, and MMS on mobile phones. Utility services like

directory assistance, telematics, and other information-based services too gain

from multi-modality.

Businesses can make use of multi-modality to facilitate more effective use of

mobile phones and other handheld devices by their sales people, thus leading to

increased productivity.

Hurdles in the Path

There are two main hurdles in the way of full multi-modality–lack of a

uniform standard and immaturity of speech recognition technologies.

The Multi-modal Interaction Working Group of W3C is developing markup

specifications for synchronization across multiple modalities and devices with a

wide range of capabilities. The specifications will be implemented on a

royalty-free basis. These specifications will be built on top of W3C’s

existing specifications, for instance, combining XHTML, SMIL, and XForms with

markup for speech synthesis and speech recognition. Alternatively, it can

provide mechanisms for loosely coupling visual interaction with voice dialogs

represented in VoiceXML. Additional work will focus on a means to provide the

ink component of the Web-based, multi-modal applications. "The W3C work

will take some time to complete as is inevitable, given the amount of work

needed to build a shared understanding and a strong consensus," Dave

Raggett of Openwave says. Raggett is actively involved with the standards

activities at W3C, including the multi-modal activities.


Standards will take time to


As of now, most of the multi-modal implementations are markup language-based

(for example, using VoiceXML and HTML to describe the voice and visual aspects

of the application interaction). Another industry body, SALT (founded by

SpeechWorks, Microsoft, Comverse, Philips, Intel, and Cisco), has offered an

approach to a multi-modal markup and architecture, while IBM and Opera have

suggested a different approach based on XHTML and VoiceXML, called X+V.

Both have been presented to the W3C for consideration. The recently formed

Open Mobile Alliance is also working on a common multi-modality standard.

While the industry is still at least a year away from a single standard,

platform suppliers do not see the lack of it as a hindrance. They are busy

developing platforms that support multiple languages. For example, Kirusa offers

WML for data content and VoiceXML for voice content. "The two standards

being proposed at W3C are X+V, championed by IBM; and SALT, being pushed by

Microsoft. As a multi-modal solutions provider, we currently support both the

standards on our platform, though we were favoring X+V initially," Suri of

V-Enable informs.

Handset features will be a key to the success of multi-modal applications.

Today, there are only a few handsets that have some kind of built-in multi-modal

support. However, there aren’t many handsets that support, say, text-to-speech

conversion or vice-versa. "Varied handset implementations and capabilities

by different vendors is one of the major challenges faced while deploying

multi-modal solutions," Pratapa Bernard, director, marketing, OnMobile,

points out. Also, the network capacity that was to be available with

next-generation networks is not yet there, which limits the nature of services

that can be offered.

AirTel and Orange are using OnMobile’s MMP2500 multi-modal system to offer

value-added services.

As and when the roadblocks get removed, mobile users would be more than eager

to embrace multi-modality. A large number of them do not use SMS just because

they find keying in the text a complicated affair. Given an option, they would

be more than happy to use SMS in the voice mode. And when the mass adoption of

multi-modality takes place, won’t that spur the growth of mobile usage? Of

course, it will, and help the subscriber base shoot up.

Ravi Shekhar Pandey