Will Siri, Cortana and Alexa make the move from personal assistant to best friend?

Talking instead of typing, swiping, and tapping: the future is speech recognition
Machines are getting smarter, but can they be our friends?
In the future, smart homes may hold conversations with their owners

Plenty of us rely on smart assistants like Siri, Alexa, or Cortana to keep track of our schedules. As The Economist reports, no less than 20 per cent of the internet searches on Android mobiles are initiated by voice, and Apple’s Siri receives a staggering two billion spoken queries every week. Numbers like these are evidence of the popularity of voice recognition software. But that being said, everyone’s frustrated by how dumb these supposedly smart assistants can be: they rarely understand anything more than a simple command, and you’d need to be stranded on a desert island to find conversation with any of them entertaining.

But that’s changing faster than you might think. Advances in natural language processing, machine learning, and facial recognition promise a future in which your personal assistant might be more like Her – a realistic artificial personality that relates to you, listens to you, and is more friend than cloud-based calendar.

Talking instead of typing, swiping, and tapping: the future is speech recognition

Voice recognition tech is an obvious next step for tech companies as it bypasses the need for a traditional user interface. And once the wrinkles are ironed out, voice recognition opens new possibilities for interaction, providing hands-free convenience for those who want it and increased independence for those who need it, including people suffering from motor impairment or poor eyesight. But the real attraction isn’t just ease. For most of us, talking is how we communicate, especially with friends and family. We’re biologically driven – hardwired from birth – to be vocal, something which is strikingly obvious to any parent who remembers their child’s first words. Our lives are entangled in language, and the spoken word resonates deeply with almost everyone.

We connect with, and are connected to, a world of speech and conversation. When we’re alone, many of us leave the television on in the background just to hear someone talking, and there’s nothing that replaces a chat with a friend over coffee, tea, or something stronger. That’s why the future of smart assistants is voice recognition and social smarts, a substantive transformation from digital helper to best friend. Much of this depends, of course, on improvements to existing tech – more bad conversation isn’t what we’re looking for. We certainly don’t want the stilted, painfully dense responses that the current tech provides. We crave rich, intuitive, engaging conversation – the kind we have with humans. We can only accomplish this with advances in machine learning, natural language processing, and facial recognition software.

Young man looking at himself in a mirror displaying financial information — Voice recognition tech is an obvious next step for tech companies as it bypasses the need for a traditional user interface.

Machines are getting smarter, but can they be our friends?

The Holy Grail is real social intelligence, the understanding of context and cues that bring meaning to interaction. To get there, we’ll need artificial intelligence (AI) that knows the difference between sarcasm and seriousness, that connects the dots between what we say and what we mean, and that looks to our faces for clues about our mood. The good news is that we’re getting closer every day. As Matt Hunckler writes for Forbes, big names like Apple, Google, and Microsoft are now being challenged by crowdfunded startups like Mycroft, clearing the way for dramatic improvements in natural language processing. And when processing power starts to catch up with what’s possible – just think of IBM’s Watson on Jeopardy – we can expect some pretty amazing stuff. Watson’s machine learning, for instance, can contextualise wordplay, understand jokes, and pick up on double entendres. In just a few years, that kind of conversational savvy may be commonplace in our everyday devices.

And when joined with facial recognition, empathic communication is a real possibility. We already use machines to help children with autism learn to recognise and mimic emotional responses. As Colin Smith explains, “Specialist teaching and therapy for children with autism focuses on helping them to learn to convey their emotions and understand other people’s emotions. This is done by getting them to understand the conventional ways that people express themselves through a combination of facial expressions and gestures”. The robots that are helping these children learn to communicate are adept at reading facial cues by turning people’s faces into 3D maps. By recognising where things on your face are – say, the corners of your mouth or the ends of your eyebrows– they can interpret something about your emotional state, much like you do every time you talk to someone. Now, imagine the camera on your mobile reading your face for clues to your mood. By measuring what it sees and comparing that with an analysis of your voice, it could sense when you’re stressed, happy, excited, or nervous – steering the conversation to match your feelings and mood.

In fact, we already use voice analysis software, paired with AI, to alert call centre workers to the emotional states of their customers. For instance, Cogito Corporation’s algorithms measure key aspects of a customer’s voice during a call, using this data to better direct customer service. As Shaun Sutner explains, “Cogito’s algorithms parse each nanosecond of a caller’s speech over the phone and simultaneously flash recommendations and feedback to the customer service center worker mid-call. As in, ‘you’re speaking too much,’ ‘frequent overlaps,’ ‘breathe, pause between phrases,’ or ‘you’re tense”. This kind of AI-based behavioural science is in its infancy, but it won’t be for long.

In the future, smart homes may hold conversations with their owners

When the tech matures, we’ll see a real shift. The days where Siri, Alexa, and Cortana were best left to simple Google searches and updating appointments will be behind us, and in their place will come a new world of synthetic friends and artificial confidants. When your mobile, your car, and maybe even your smart home can tell what you’re feeling and connect with you like a living, breathing person, the way we relate to technology is bound to change. Will this mean an end to loneliness? Will we redraw the lines between people and machines, adjusting our metrics to reflect our changing social dynamics? Will we really comprehend what it means to have an artificially intelligent friend, or will we naively assume that Siri is more like us than she is? Here, Her might be prophetic. But experts argue that we’ll get the AI we deserve as it matures through our interactions. Remember that the next time you get frustrated with ‘the ladies’.