Since the introduction of Apple’s iconic voice assistant, Siri, with the launch of the iPhone 4S in 2011, voice devices have come a long way.
While we initially predicted that voice assistants would be mostly confined to our smartphones, that changed with the launch of the Amazon Echo in 2014, which in turn sparked off a smart speaker “arms race” amongst the major tech companies.
Smart speakers have achieved widespread popularity: according to voicebot.ai, an estimated 47.3 million people in the United States own a smart speaker, which works out to roughly one in five adults. In the UK, uptake has been slightly slower, but a recent report from Ofcom found that one in every eight households now has a smart speaker.
But it’s not the devices themselves so much as what they represent that’s important. Voice assistants are becoming embedded in everything: from your microwave – which Amazon now wants you to talk to – to your car. And smart speakers are designed to be connected to other components of your smart home, such as smart lighting, locks, and appliances, which are increasingly looking like the future of housing. All of these would be voice-controlled.
This means that in the future, we could find ourselves surrounded by voice-activated devices that are all connected to the internet. Will we be able to search with all of them? In 20 years’ time, maybe we will – but only if we see a significant shift in the way that voice search operates.
How voice search could work – really work
The number of actual internet searches conducted via voice is fairly small because voice can’t truly be used to explore the web.
It lacks an onward journey: either a voice query will produce a single answer that doesn’t give users the opportunity to further browse any of the results returned for their search, or it takes the user to a regular page of results on their smartphone or PC – no different to if they’d input their query using text, and certainly no good in a situation where their hands are occupied, which is one of the use cases voice is supposed to be perfect for.
Schema.org markup is often cited as key to optimising for voice search, and a type of schema was recently introduced, called “Speakable schema”, which is designed to allow website owners to mark up sections of their website that are particularly suitable for being read aloud by a voice assistant.
An introduction to schema.org markup for voice
Speakable schema is still officially pending, meaning that it’s awaiting further feedback, but that hasn’t stopped Google from announcing a beta programme with select news publishers that enables news results to be read aloud in response to a voice query.
Users could ask their Google Home device, “Hey Google – what’s the news on NASA?” and receive a short audio summary of the top headline, followed by an invitation to listen to another article on the subject. The Google Assistant will also send links to two relevant answers to the user’s phone, allowing them to read the full content at their leisure.
It’s not yet clear how many parts of this process work, such as how Google decides which news articles “rank” top for this kind of voice query (does it use regular SEO ranking factors, or some other criteria?). Nonetheless, it’s a big deal for voice search, because it’s a real step towards what a “true” search experience with voice could look – or sound – like.
It also opens up some possibilities for how voice advertising, if it comes to pass, might work.
The possibilities for paid voice search
Thus far, major players like Amazon and Google have held back from introducing any kind of paid search advertising for voice, though rumours abound that Amazon is exploring ad options for Amazon Echo devices.
One of the reasons that companies have been cautious about wading into this area is that voice advertising has the potential to be a lot more intrusive and irritating than visual or text-based ads. There’s no option to skim or scroll past a voice ad. In early 2017, a number of Google Home devices surprised their owners by delivering what appeared to be an advertisement for the new live-action Beauty and the Beast – though Google stridently denied that the short plug was intended as an ad – giving rise to instant backlash.
In all its forms so far, the voice “SERP” has had significantly fewer results than a regular search results page, because no-one wants to sit through ten search results being read aloud. This means that while sponsored voice search results would enjoy a lot more prominence, there is a real risk of damaging user trust.
How could voice advertising get around these obstacles? One possible solution would be to have sponsored results appear after organic results. The voice assistant would still read them out, ensuring visibility – or audibility – but searchers wouldn’t feel forced to sit through sponsored search results in order to hear the organic results.
A “Cost Per Consent” model?
Imagine this scenario: a person says to their Google Home device, “Hey, Google: what is performance marketing?”
“According to Wikipedia,” the Assistant replies, “Performance-based advertising, also known as pay for performance advertising, is a form of advertising in which the purchaser pays only when there are measurable results. I also have a relevant sponsored result for you. Would you like to hear it?”
If the person responds with “Yes” – if they consent to hearing the ad – then the Assistant reads out the sponsored result, and accompanies that by sending a link to the sponsored product or service to the person’s smartphone, in the same manner as Google’s news beta.
This would be the equivalent of a click in PPC (Pay-Per-Click) or CPC (Cost Per Click) advertising. Perhaps we could even see a “Cost Per Consent” model emerge, in which the advertising brand or publisher is charged for every searcher who consents to hearing a sponsored result.
Other ways of monetising voice search
Some argue that the nature of voice search means it will never be viable to monetise it through search advertising, and that companies like Google and Amazon will stick to more reliable methods of money-making, like ecommerce.
It’s true that Amazon already benefits plenty from purchases being made through Echo devices, but in my view, there will always be a limit to how well ecommerce works with a voice-only interface. For repeat purchases of known products, it makes sense, but in all other scenarios, a visual component is badly needed.
With that said, voice devices like the Amazon Echo and the Google Home are now available with screens (the Echo Show and the Home Hub, respectively). I made the argument in my Google Performance Firestarters presentation that we are unlikely to relinquish some kind of visual display for browsing the web, be that a smart speaker with a screen, a pair of smart glasses, or a smart phone linked up to wearable technology. Therefore, in most scenarios, it will still be possible to visually browse products.
There’s also Alexa Skills and Google Assistant Actions – the voice equivalent of apps – which already give businesses the opportunity to have a brand presence on voice devices. Earlier this year, Amazon introduced the option for developers to monetise Alexa skills, making things like games and premium audio content available for a fee – the equivalent of in-app purchases, or paid-for apps. Harry McCracken of Fast Company wrote of the development,
Monetisation of voice is still very much a developing area, and at the moment, all things are possible. Cynic that I am, I don’t believe that companies like Google and Amazon will pass up the opportunity to explicitly monetise voice search if it does take off, but we will see. My advice to the audience at Google Firestarters when it comes to both voice and visual search was to keep an eye on developments, experiment with the technology, and if budget permits, invest in ad options as they become available.