Voice as a primary user interface has shown flickers of promise but has never taken off. Today, with AI at hand, it is still failing to go mainstream.
Approximately 300 years ago, an ancient Arabic folklore became popular globally. Thus came the likely example of perhaps the first-ever voice interface in the world—when the fictional character from Alibaba and the Forty Thieves said to a cave, “Open, Sesame!”
In three centuries, one would have imagined voice interfaces in machines to have become an established order. Countless futuristic science fiction and comic movies have shown the same—look no further beyond Iron Man and his trusted AI assistant, Jarvis. However, even in an era of Alexa, Siri and OK Google, and where generative artificial intelligence continues to make waves, voice assistants remain basic, erroneous, intermittent, and stunted. Will voice interfaces never, ever take off?
A Brief History
Beyond fiction and folklore, the first instance of a machine that could listen to and understand (albeit an elementary version of) what humans said was Audrey—shortened from Automatic Digit Recognition. Created by researchers at Silicon Valley’s Bell Laboratories in 1952, Audrey could only understand digits—that too when spoken to by specific individuals. Still, Audrey was a roaring success, the first instance when humans could speak with machines. This becomes more impressive in a pre-semiconductor, pre-general purpose computer era.
Besides the complications of recognising and executing cross-platform operations in vernacular language, voice interfaces are still not mature enough.
In 1962, IBM introduced its experimental voice-generating machine, Shoebox, which could understand 16 English words. By 1976, a United States government-funded initiative gave birth to a new machine called Harpy at Carnegie Mellon University, which ballooned Shoebox’s 16-word understanding range to a then-impressive 1,011 words.
In 1993, Apple brought to its Mac lineup of desktops a feature called ‘speakable items’—which allowed voice-enabled interface control based on specific keywords pre-saved in a folder. In 2002, Microsoft brought voice instructions to its workplace platform, Office. For the longest time, these two events were the two most significant progress in voice interfaces—showcasing how voice as a primary user interface has always remained short on innovation and progress.
About a decade ago, though, the floodgates briefly opened. In October 2011, Apple launched ‘Siri’, an AI-powered voice assistant, on its then-latest iPhone 4s. In July 2012, Google launched a voice search feature for its search engine called ‘Google Now’, which evolved into an integrated part of Android through Google Assistant in May 2016. In April 2014, Microsoft introduced Cortana to all its Windows platforms, attempting to counter Apple directly. The most significant salvo was fired by Amazon—seeing all the voice interfaces crop up everywhere. In November 2014, the company introduced its Echo lineup of cylindrical smart speakers and the most well-recognised name in consumer electronics today, Alexa.
Since then, innovations have been few and far between. In the past decade, voice interfaces, including Alexa, have remained sketchy at best. In November last year, a Silicon Valley upstart called Humane introduced a device called ‘AI Pin’.
The Hype around the AI Pin
The Humane AI Pin is not the first consumer-grade gadget to promise to run purely on voice interactions. Over the past decade since its introduction, Amazon has fared reasonably well with Alexa-powered Echo speakers and devices.
But, the reason for all the hype is that a legitimate venture appeared, at least on paper, for the first time to create a gadget that would end the smartphone era. To do so, the AI Pin runs on OpenAI’s GPT-4-powered ChatGPT—which it uses as an operating system instead of a tool. It has no screen but does recognise gestures through a built-in 13MP camera.
In India, communicating via voice notes or issuing voice commands to AI assistants on smartphones and other gadgets remains low in adoption.
The Pin takes in all commands and inputs through voice commands, which justifies ChatGPT’s use as an operating system. Also, it would not require app downloads for a wide range of services, such as hailing a cab or ordering food. This kicker led to the hype that Humane, through this AI Pin, can completely revolutionise how smartphones and consumer technology work today.
Why did it Attract Bad Press?
Unfortunately, since Humane started selling these AI Pin models, media reports have written heavily against it, questioning its purpose and pointing out flaws. Some of the most common takeaways include their failure to always recognise conversational commands built for generative AI platforms. Apart from this, first-round users have complained about the AI Pin not responding with accurate responses.
Most importantly, however, is the question: who was the Humane AI Pin for?
What Does it say about AI Today?
When Amazon introduced its Alexa-powered Echo speakers, its pitch was simple—they were still selling speakers, but now they were speakers that could speak with their owners. This created novelty and played a functional role in being a smart home automation hub.
The Humane AI Pin fails to do any of this.
Instead, its pitch as a smartphone replacement is too high, given that no applications and web APIs (snippets of code that make siloed applications communicate with one another) have yet become ready to run on it. It has too many flaws, making it a device that could not win any favour among technology critics.
In 1993, Apple introduced a feature called ‘speakable items’ to its Mac desktop lineup, allowing voice-enabled interface control based on specific keywords.
This says a lot about AI today. Amid all the advancements, there is no denying that large pockets of AI remain rudimentary. Much work must be done before generative AI can reach the maturity and finesse that an operating system or platform requires. For the most part, generative AI remains gimmicky in many ways—and there is much work to be done if we want generative AI to one day be the operating system on our devices.
The Future of Voice Interfaces
While the Humane AI Pin’s failure is mainly due to AI’s role, the future of voice interfaces does not seem promising. In India, communicating via voice notes or issuing voice commands to AI assistants on smartphones and other gadgets remains low in adoption.
In addition to the complications of clearly recognising and executing cross-platform operations in vernacular language, voice interfaces do not seem close to maturity. Alexa and Siri, the poster kids of the voice interface push, remain bit-part and casual parts of devices at best.
But developers will likely take note of the Humane AI Pin’s rocky start (we do not know yet if it will fail) and try to develop voice interfaces further. With compute chip sizes shrinking, it will not be too distant in the future to think of a mixed/extended reality pair of glasses that look just like a pair of average sunglasses. Once such devices start coming up, voice interfaces could re-emerge as important primary interface candidates.
For businesses and consumers alike, this represents a massive opportunity. Humane must stand as a lesson and a blueprint for us to build upon.
feedbackvnd@cybermedia.co.in