The Future of Voice
[Is telco voice innovation dead? Or will smartphones and LTE deliver a much wider breadth of voice applications? Guest author Dean Bubley argues that ‘voice’ is about to experience several discontinuities as it goes beyond our limited notion of ‘telephony’.]
Telecom operators are facing a huge problem: in developed markets, we are close to the point of “Peak Telephony” – or maybe even past it already. Peak Telephony – inspired by the notion of reaching “Peak Oil” production – refers to the point after which voice revenues will face terminal decline. Already the traditional fixed and mobile telecoms industry is potentially facing a bleak outlook as call volumes stagnate and prices are eroded. While fixed operators have long recognised the threat to their core telephony business, mobile networks are now also facing the inevitable as well. Globally, over 70% of wireless operators’ revenues still comes from voice services and SMS, so this is an existential threat – one which threatens profound change to business models and even extinction of some operators as we know them today.
To an extent, many older telcos had a ten-year extension granted to them by the rise of mass-market mobile services. These appeared at exactly the right point, just as fixed voice prices (especially long-distance) started suffering the competitive onslaught from early VoIP players. But at a group level, declines in fixed-line profits were offset by the rise in mobile. The inherent value of mobility, and the convenience of handsets loaded with easy contact-lists and call-registers, postponed the onset of saturation and substitution.
But now, finally, a combination of Moore’s Law, devices and the Internet are catching up – mobile voice is about to experience several discontinuities and radical change in coming years.
The limitations of “distant voice”
For the past 100 years, we have pretty much only had three ways to communicate over long distance between people: letters, telegraph and telephone (from the Greek words for â€˜distant sound’). The traditional phone call has been wonderfully transformative and yet, at the same time, very limiting – even in mobile guise. It has enabled revolutions in both commerce and society greater than virtually any other invention since the wheel and printing press.
But phone calls do not correspond to the way humans really communicate with each other. We don’t generally think of conversations as “sessions”, or measure their value in terms of their length.
In essence, we have surrendered our natural modes of communication to the restrictions of telephony. We have boiled down “distant voice” interactions into Person A calling Person B for X minutes, via numbered identifiers. Compare that to the more normal style of “close voice” of dropping in and out of conversation, with interruptions, breaks in the flow, background tasks, simultaneous interactions with other people and so forth – using our names.
Normal in-person conversation is enhanced by non-verbal communications, physical context and a multiplicity of other factors. We use a broad range of volume levels, tonal frequencies and gesticulation. Some conversations are synchronous, some asynchronous – people talking over each other, or speaking in turn, perhaps based on relative authority or another social construct. Some are unique to the specific two people or particular cultures, others are generally accepted universally. In a crowded room, we might hear snippets of other conversations, by chance or deliberately, through eavesdropping.
The phone call has been an excellent lowest-common denominator baseline for “distant voice”. Telecom operators have profited immensely from its enablement, especially with the enhancements of mobility and the “wrapper” of a cellphone and its user interface. But in doing so, they have provided us with a single speech product that is intended to span myriad use cases and social/business needs. Only a few other distant-voice technologies have emerged to address niches: push-to-talk, voice messaging, walkie-talkies, CB radio and private radio systems addressing fringe-cases such as taxi dispatch or public safety services.
But now, the landscape is shifting. The combination of smartphone platforms, thriving developer ecosystems, smartphones, PCs and the Internet have enabled new communications formats to evolve. These formats can map much better onto natural human communications preferences. We no longer need to constrain our innate ways of interacting, because of the constraints of a piece of wire (or air) and a switch. We can “politely interrupt” with a soft alert or IM before escalating to a call, locate team-mates in virtual worlds with stereo cues, or interact directly with a voicemail for simple tasks, rather than calling back.
We already have in-game voice chat between players, remote baby monitors, always-on voice telepresence, audio surveillance and all sorts of other voice applications which really are not calls, as such. Numerous other voice communication modes are evolving, especially those linked to social and messaging applications.
In a nutshell, we no longer need to shoehorn all of our “distant voice” communications needs into the unnatural format of a “phone call”. We are able to visualise, contextualise, obfuscate, interrupt, lie, drop in and out, waffle, multi-task, spy, listen, store, mumble, overhear, translate, declaim, announce and recall speech over a network in many, many different ways.
Not only that, but the supply of basic “phone calling” functionality has grown much faster than demand. If we do want to make a traditional A-B for X minutes call, we have many modern variants on the theme of a “piece of wire and switch”, now over mobile networks as well as fixed lines. It’s not that hard to do. Sure, numbering is a constraint, and ultimate quality may be a limit – but that is quality measured against the yardstick of the “telephony application”, and not a more general measurement of social communications. We don’t really complain about the QoS of speech in a noisy pub – or pay extra for a quieter venue.
Will LTE voice be “old telephony” again… or something new?
But the final kicker is the imminence of a major transition point – the adoption of LTE and all-IP mobile networks which are not yet optimised for telephony. Although various initiatives – notably the GSMA’s VoLTE (Voice over LTE) specifications – are developing carrier-grade LTE telephony, the likelihood is that it will take several years to get to the quality, reliability, scalability and cost/power performance of today’s basic GSM. 4G networks have not really been designed with voice in mind – or viewed more cynically, it has always been “someone else’s problem” to solve.
Nobody yet knows what happens when we have 1,000 mobile VoIP users in a cell, moving around, handing off to other cells, causing interference, audio glitches and so forth. Experience from fixed VoIP suggests that tuning networks to mass-market perfection takes a very long time, and it seems unlikely that the extra variables of RF and mobility will make the task easier.
This implies that smartphones on LTE networks – and, by extension, 3G networks as well – risk creating a vacuum, which could well be filled by other “non-telephony” voice applications, while we wait for “plain vanilla mobile calling” to catch up to the realities of wireless IP. The telecoms standards and market representation bodies (3GPP, GSMA and others) have made little effort to diversify efforts into the more generic “distant sound” world, instead focusing on replicating what we have today. Much-trumpeted enhancements such as “HD” (high-definition) codecs go only a tiny distance towards the more complex human-interaction models discussed above.
There is an argument that plain-old telephony (fixed or mobile) can be packaged up and “distributed” through various new “delivery” channels. Linked to the Web and appropriate call-control APIs, many operators are hoping to create new “cloud communications” platforms. But it is unclear whether the underlying telephony control mechanisms and the “session philosophy” of calling really represent the best possible basic ingredient. Add in the usual rigid telco attitudes towards numbering, security, pricing and specific acoustic mechanisms and it seems unlikely that telco-powered telephony will be the best way of creating all of the new “distant voice” applications that will emerge.
Filling the voice innovation gap
What will fill the gap, becoming the platform(s) of choice for the plethora of innovative voice apps and services? It is still too early to tell. It could be some of the larger VoIP incumbents such as Skype or Google, or an established software-client provider like CounterPath. But it could also be one of the new breed of speech-centric application developers such as Viber, Vivox or RebelVox. Major carrier-voice infrastructure vendors such as Cisco, Sonus, Acme Packet and Broadsoft also have roles to play, with some attempting to become more open platforms – although with an eye to their traditional operator customer base.
From a handset standpoint, things are likely to get quite complex. Ordinary phone calls are not going to disappear – but we will start to see multiple voice applications present on each device. This is already happening with Skype and GVoice apps, but looking further ahead, more fragmentation is probable. This will present huge challenges for UI and “contact” applications, as well as a debate about which voice and audio/acoustic components are best installed in the OS, on the baseband or apps processors, in individual apps or even in dedicated audio chips.
One thing is certain however; making a clear and careful distinction between “voice” and “telephony” is a critical starting point for understanding the landscape. Telephony is what telcos do today. It’s a closely-defined service, subject to rules and regulations, and billed in a structured way. But increasingly, general voice applications will go beyond homogeneous “calls” – for example, chat between players of an online game. This will require new business models, new platforms, and new forms of user interaction. How the traditional telephony industry deals with these new voice innovations will be fascinating to watch.
Dean Bubley is the founder of research company Disruptive Analysis. He is currently developing a programme of “Future of Voice” master-classes together with communications industry visionary Martin Geddes. Dean can be reached at AT disruptive-analysis DOT com.