Designing Voice Driven Experiences
Designing Voice Driven Experiences
Alexa, the voice service that powers Amazon Echo, Echo Dot, Amazon Tap and Amazon Fire TV provides a set of built-in abilities that enable customers to interact with devices in a more intuitive way using voice. Examples of these skills include the ability to play music, answer general questions, set an alarm or timer and more.
- Diving into Amazon’s machine learning model *Making AI and machine learning more accessible to all developers, product managers and designers
- Proven best practices for designing voice user interfaces (VUI)
- How to maximise the usability of your voice experience
- How to create compelling voice experiences
- Building voice activated products - where products will turn to in the future
Noelle LaCharite, Sr Technical Program Manager,Amazon
Good to see you guys, what a nice crowd we have. We'll see how this goes. So you've seen this before? Yes. Raise your hand if you've seen an Alexa device. Okay, put them down. That was everybody. That's awesome. So you saw the Super Bowl then maybe or a commercial or two. How many of you own an Alexa device? A smaller number, but well done you guys. Love it. Okay, cool. So, as you can tell, right? This Alexa device actually has a nice red ring on it, which means that Alexa is not listening to me right now.
However, I have connected Alexa to the RDS Wi-Fi. Alright, cool. Let's see how this goes. Alexa, what time is it?
The time is 5:12 am.
In Boston, which is where I'm from. So if I don't if I'm not completely coherent now you understand. Alexa, tell me a joke.
What's the pirate's favorite exercise, the plank.
Haha. And she's full of jokes just like that. So go run and get yours today. So I'm going to mute Alexa again. Oh, actually, that. Well, I'll come back to this. But we have a couple of interesting things I want to share with you today. My name is Noel Assurity. I'm an early adopter of Alexa. I built some of her very first skills I've been building for Alexa. Oh, since 2014, I guess. And my skills you might help some of you know them. If not, you certainly will want to. One is called Daily Affirmation, which you just ask for an Affirmation, and it will say you're awesome and wonderful and people like you. And you can go on your way. There's a, do you guys know, like Saturday Night Live? Well, I'm not rating myself. But there's this guy, Stuart Smalley. Anyway, you can look him up on YouTube. I built it around that. But then it took off and people were actually seriously using it. So I got serious about it.
Ended up building then one minute mindfulness, which is exactly what it says one minute of mindfulness. Yeah, I know all of you are thinking one minute, that's not nearly enough. Well, that's exactly what my reviewers have said. But I stand firm in my marketing. And I have not elongated my mindfulness skill. Because I think it's advertised in a minute, it should be a minute, but my customers all of them to the tune of I don't know, 2500 reviews have said it should be longer than a minute. Don't fault people for doing exactly what they're telling you they're going to do. But I like it. So you should try that one. But I bought many skills, probably 20 plus skills are published on Alexa, that I have either built or helped to build. And so my goal over the next 20 minutes is really to just share with you my journey, what was very different about it from a UX perspective. And then hopefully give you some inspiration to try some of this fun stuff yourself.
So this is us. Really, right, I actually have been part of probably the majority of this slide, right? Where, when I first became interested in software, and software engineering, we were just on the cusp of moving from things like mainframes, which are still alive and well thankfully, mainframes into a more user interface world. And then we were all part of that wonderful 2006 launch of the iPhone, which changed us forever. And now we're seeing another one of these kinds of big shifts, right? I remember early I used to teach for IBM teach mainframe people how to write Java programming language. And at that point I had never used a mouse before like I'm teaching them object oriented technology. And they never used a mouse. It was a very bizarre time.
However, I'm at the same time now. Right? I'm teaching people to write user interfaces, which when we say UI, people instantly think of a screen instantly. And now we do have a screen in Alexa, but we very rarely designed for it, right? If it happens to have a screen, awesome if you happen to be able to show a card, or some kind of display even better, but very rarely do we consider that as part of our user interface. And I'll actually be on a panel later this afternoon, talking to you more about some of the intricacies of what those decisions and how those decisions changed as we moved from kind of a mobile, centralized focused world, even at Amazon into interviewee or voice user interfaces.
So we do believe that voice. And I apologise, we can't read it all. But it's like one of those tests, right? Where we give you some of the words, but not all of them, and see if you can still understand it. But we believe voice represents kind of the next major disruption in computing similarly, right? When screens became a thing, and then when little screens became a thing, now, no screens are a thing. And we actually found that it's extremely delightful for our customers. So I started with Amazon three years ago, which means I'm a dinosaur in that sense. But I have worked for AWS. I now work for Alexa machine learning and I work on the brain or the heart of Alexa. Right? How does Alexa get smarter every day? I can't really talk to you too much about all those pieces. So happy to discuss with you. I'll be here all day and the panel this afternoon. So happy to answer questions, you know, over coffee or whatever, as I see.
Oh, yeah, feel free to take Twitter pictures and stuff. I mean, I hope I get a copy of that picture. So I don't fully buy into this, but it just sets the stage for what we're talking about. Right? Soon touch could be gone. And all we do is talk. It's similar to what my dad used to think. He's old Asimov, like the golden age of science fiction, loves, you know, the ideas that were presented in those acts of fiction. But now he's in his 70s, actually been hit by a car. So he's cognitively probably more in his 80s. But I bought him an Alexa device. And he uses it every single day to turn on his lights with his voice, right? And he finds it fascinating. Especially any Star Trek, you don't have to raise your hands. But Star Trek lovers, I know you.
We now have the ability to use a different wake word other than Alexa, which you heard me say, we also have a computer, right? So now he can be like a computer would turn the lights on or computer. He can even say "Open the pod bay doors". I should totally try that for you. But it works. She has a response. But it's interesting because he never thought in his lifetime that would happen, right? He read those books when he was 12 and 15 years old. He never thought in his lifetime, he would be speaking to his software and having things actually happen. He navigates music, like the best of them, now plays random classical music from composers I've never heard of, and I'm pretty much a classical buff. But he's like, I went to that concert in 1952. And I'm like, wow, it's incredible. So it really opens up the world, not only for eldercare but also for little, little people, as well as anyone with disabilities who maybe can't see a screen, right? It just creates a huge opportunity.
So though this is kind of like you can imagine a day when we never have screens, Amazon's not looking to end the screen, even though Firephone was a rough experience for us. I own one, I loved it. But we're not looking to end the screen. We're looking to really augment it, right? Add to it. And this certainly has turned out to be a pretty good experiment. So voice is going to be everywhere. You can imagine a time and even today this happens. I used to live in Seattle before I moved to Cambridge, Massachusetts. And in Seattle, I was part of this wonderful ecosystem in Amazon where I could at work, call Amazon Prime Now and say, which is our service that gets you anything within an hour. And I would say alright, I'm on my way home. It's Tuesday. We do tacos on Tuesday. I think it's actually international taco day today. I'm just saying you might want to Google it.
But so let's just say I want to make tacos. I would call or get on my phone and say here's what I need for taco Tuesday. You would imagine a time when Amazon's smart enough to know exactly what I mean by that. Exactly what I mean by that. Right? It probably will be a time where it's "Oh, you ordered the same exact 12 things every Tuesday or every Monday night or whatever". So it goes out, grabs that stuff and starts shipping it to my house within an hour while I get in traffic and spend an hour in traffic getting home. By the time I get home the stuff I ordered is sitting on my doorstep. I pick it up and walk inside to say hey "Alexa, I'm ready to cook." Alexa already knows I made that order. Alexa already knows what I want to do and says "great don't forget to set the you know temperature on your stove." You can imagine a time when your stove was smart enough to hear you say that right? And it can turn itself on scary stuff I know.
It could be that the smartphone was a very scary thing when we first got it. So it all starts scary and then it's cool. But you can see where I'm going, right? Where it becomes it's just kind of a conversational state, you're always in with your device to help you get through, in this case, meal preparation. As I mentioned, maybe there's a time where I forget to make the order when I'm at my phone or at my desk with my phone. So instead, maybe I want my car to be able to take that order. Or I want my car to be able to navigate me because I'm in traffic, and find a detour. And I don't have to open my phone, I don't have to click a button, I don't even want to have to click the button on my steering wheel. I know first world problems.
But it can happen. And I do find it quite delightful. And actually, if I have a dot I don't know, I actually have one here. So I'll show it to you guys if you see me after. But I have a little dot, it's like a hockey puck size fits conveniently right in my cup holder, I tether it to my phone and bluetooth it through my car speakers. And it's awesome. I can have it read me a Kindle, I can have it do Audible. You know, like, it's amazing. And of course, I can have it randomly play any music I can think to ask for. So I think it's really cool.
The last part is really about bringing, that's like a living room, right? Bringing Alexa into the home. So today we have products like the show. I've actually become very fond of it because it allows you to see the time and allows you to see what recipe you're on. You can even use your screen to kind of scroll through. But so it does give us the ability to in addition to our voice designed for what we call now a multimodal experience, and a multimodal experience that is the future, right? The customer wants to be able to communicate and whatever is super convenient for them at the time. Jeff Bezos always says, "We will never be in trouble because customers will never say, Oh, I don't want more features and I'd really wish these were more expensive and, gee, I wish it was less convenient, right?" People always want better stuff, more convenient and available immediately.
So indexing on those desires will always be okay. And that's why our mission at Amazon is not about Alexa. It's not about books. It's not about really anything specifically, it's about customers, we want to be the Earth's most customer centric company. So we look at our customers and all of our roadmaps in our organisation, including UX, are designed by the customer. And that's really what changes the way we do business. So that's kind of the inside of that, in case you were ever wondering, it does have seven microphones, we use echolocation.
And you'll see kind of over here, there's seven of them. When I start speaking to Alexa, you'll notice there's a light ring. I used to wear it on my shirt, but I kind of have a light ring on my shirt. Alexa, being kind of seeing this light ring, some of you in the back might have trouble. But you can see a light rain gets a little bit lighter. That's the microphone that's actually focused on my voice, and beamforming what I'm saying into the device and sending it to the cloud. Eventually, Alexa realized I was not talking to Alexa and shut down. So there's what's called end pointing that occurs in the cloud that decides if you were actually speaking to this device, or where you accidentally saying it or I actually am on a panel, I think today or maybe next month with someone named Alexa, which is always fun.
Anyway, that's just to give you an idea of what's in there. It's not much, right? That's all of Alexa right there. Alexa is actually the software in the Cloud. This is just the echo device. But the rest of that device other than that little piece at the top is the speaker. Right? So that's ultimately what you're paying for is a really good speaker.
So speech advancement, this is another reason why UX is really important right now for voice to think about it. Even if you're not doing it. You should be thinking about this right now. Because if you were thinking about this 10 years ago, you probably even with the technology that you had, wouldn't have a great experience. I won't say the experiences we might have had back then even just maybe 7, 10 years ago. But our voice experiences were not delightful. Customers did not believe in them. They didn't end up working out very well. There's a reason why Alexa today is doing as well as Alexa is doing. And the reason is, is because our speech and language understanding has gotten exponentially better.
And now we have even more players in the market, right? Like Google and Siri. And there's others, I'm sure right. There's tons of startups that are building their own. There's this one I can't remember. It's like XAI or something like that. Anyway, but there's tons of machine learning tools that are being built, all of which require a good user experience and most of which were not created by people that excel at that. Right, such a huge opportunity. So in the store, one of the things that differentiates Alexa from any other platform really is the ability for us to improve Alexa and make Alexa smarter ourselves as individual contributors.
So as I mentioned, I was not working for Alexa when I started building apps for Alexa or skills for Alexa. So my daily affirmation skill actually built as part of a challenge out at work. But none of us worked for the team at that time. And then I was like, it was pretty fun. Now, at that time, I think there were 140 skills in the store. There are now over, I think this is even now 16,000, it's growing massively, it'll be 20,000 in a month, like, it'll just grow exponentially from here as well. But this is your opportunity, right? And like, just think about the App Store, when it only had 10,000 apps in it. Now it has billions, and we're still leveraging it quite heavily every day, or at least I am. So this is our skills concept is how we build applications. For Alexa, how do we make Alexa smarter all the time?
So these are our devices, as some of you might be Kindle owners or Kindle Fire owners. One of the really interesting things that happened recently, as your Kindle Fire became a multi modal Alexa device, we released a fix or over the air update. And that over the air update enabled your Alexa device to operate this way. So you couldn't call it like it wasn't what we call far-field. Meaning I could stand far away and be like, hey, Fire tablet, turn the TV on, you couldn't do that. But what you could do though, is go up to it, press a button and say, open Kindle or start my latest audio book or resume my last Kindle read. And it would rather than you having to press a button and slide and click and go down a menu, right? It would just instantly do whatever I was asking of it.
And that's kind of what we're looking to provide that contextual instant gratification of I ask, I get, as opposed to me having to navigate to any degree. So these are all of our devices. This is the thing that I've got right here, it does have a battery base just so I can carry it around. But that's an accessory. Some company made some money building accessories for Alexa, I invested. The tap is the one that is portable and small. But why would I carry a small little Alexa traveling internationally? I didn't do that. And then I do have a couple dots with me. So I can show you guys those as well. But that's just to give you an idea.
So this is just a little understanding of where we are with devices. This is the longer tail of what we're doing with Alexa, you can imagine Alexa pretty much on anything. Why can you do that because any company has the ability to leverage the Alexa API's into their hardware. It's how I started at Alexa, I was a solutions architect for our Alexa Voice Service, which meant I worked with companies that were building in car entertainment or infotainment systems that we're building refrigerators that we're building televisions. Today, we currently have, you know, I say, Alexa, we currently have a collection of these devices, either in development with these partners, or are already on the market allowing you to integrate Alexa we have for example, of course, phones apps are available that have Alexa built in, e readers our Kindles are also available, alarm clock so I use my dot fridges, I don't think we're there yet. But you could see where we're going. Right? It could be in anything. And ultimately, it's our Alexa everywhere who really voice everywhere methodology.
So how does this happen? It happens through the skills, this kind of the, you know, rolling to the end getting you're probably thinking to yourself, I'm so excited. How do I start right now I've opened my laptop, I'm ready to build a skill. If you're not you will, skills meaning the apps that you build for Alexa, it's the one thing you can do today. Let's say you have a brand in mind. Or let's say you worked for a marketing agency or you work for a brand. What's one thing you could do right now that might be delightful for your customer? Try and think about how I'm going to answer the question but think about what you might think. The customer will get a new way of interacting with your brand. Again, you might not even be providing a brand new flashy service, a brand new piece of functionality. It's something they might do every single day.
But instead, they get to do it with their voice. You'd be surprised how childlike people become when they realize that they can say something and it happens. Like I said, like with my dad, I have a dad who is, you know, 70 ish. And I have four children. And my youngest two are like two years old. And so the same level of excitement occurs when we turn the lights on in our house with our voice. And we've been doing it for a couple years now and still every time It's like, wow, it's so cool. And it is cool. But imagine how amazing it could be. If you could provide that level of excitement, even for the smallest thing to your customer, it's just that little thing, it creates a connection with your customer that they really can't get anywhere else. And like I said, it's pretty darn delightful.
We're creating what are called Voice first apps, we say voice first because as I mentioned, we are doing multi modal, we are doing multi modal devices. So it's not just voice, but it should be voice first. Because like I said, you just don't get that level of delight with any other thing than just being able to say, hey, rather than go three levels down in a menu, can I just ask for that, and being able to, you know, accommodate those requests. So this is the Alexa app, it's actually old. I think this image might be the one that's in the Kindle app store. But we have an app on a phone, if you own a device, you would have this app because you need it in order to play. So in order to set up your first device, you have to set up an Alexa app. And this is how you configure it, I just did it in order to configure Alexa to work here.
So this is what the store looks like. I didn't want to be self gratuitous and like to have all of my apps on display up here, that would have been a little bit too much. But lots of them to choose from. And I know most of these developers, I built many of the tutorials that we currently have to help you. So if you go out and Google how to build an app for Alexa, chances are if you're on Amazon's website, and developer blog, you would find me. So you're welcome now you and I were close friends, you can now email me or hit me up on the tweet. Or I would be happy to answer any questions that you have. And I totally do it, people, you'll know if you are in my twitter circle, I'm super responsive. But this is how people find us and how we find your brand. You'll notice there's this kind of big banner.
So ever so often, if you have a really if you create a relationship with Alexa with our marketing team, they will actually feature you big time, I've never been featured that way. But it's just as meaningful to get featured in some of these smaller areas. It increases your utilization of your skill by like 900%. And I'm only speaking from experience on over 20 skills, at least 900%, sometimes even more. So it's nice to be able to get there. Again, you might ask how do you get there, if you're in with 16 20,000 other skills? Well, we have one thing we're looking for. And that's customer satisfaction. So all of our top rated skills get featured, you'll notice they all pretty much are in the fours and fives.
Keep in mind, not a whole lot of the skills in the store have that because they didn't have some of your skill sets. Right? I know, this audience isn't totally UX people. But even like some of these developers, they weren't even developers, right? We gave them a handheld GitHub repo, and they just published something. So there's a lot of opportunity to even go to some of these skills and offer to assist them.
So the framework that we use, we have Alexa skills kit for you to build skills for your brands, we have Alexa voice service, if you happen to have hardware associated, let's say you're building your ear pods, or let's say you're building your own tablet, maybe we tried the Fire Phone. So you know, you could do that. Try to go into any hardware market, you can build Alexa into it, it'll instantly add a level of interestingness to it. The reason I say that is because we do lots of hackathons, 1000s of them in a year, and almost always the one who wins has somehow integrated Alexa. Why? Because it's just cool. Customers think it's cool. And being able to leverage that relatively easily. Without shifting any major act of development in your organization, it might be a useful thing to think about.
Alright, so just some skills that you should think about when you're building, right? High value. So here's my super sneaky trick. I built several brand based skills. And what I did is I went into their websites, and you DevOps people might be able to help with this. I went into the websites identified the most popular click trails or the most popular things on a website or mobile app that people click through. Pick what one and then like, get that working, or in my case, I usually pick 10. Because I like challenges, like demoing live, right? Nobody really does that. But I do, right? So pick those top 10 click trails, things that people all the time go click click, click to get to on your website.
And I guarantee you if you create an intent structure, in other words, the ability for someone to execute that same request with one question to Alexa. They might be happy to do that. It might work out really well. It has worked out for me. So that's kind of my first and fourth. We're the most kind of tip for building. But the other thing is to think about all the different ways that someone will talk to Alexa. And I'll talk more about this in the I'll talk more about this this afternoon in the panel. So I hope you'll join me if you're interested in this world. But there's a lot of work that goes into how we make sure that if someone uses a slightly different phrase, Alexa doesn't fail. So your skill should evolve, your users should be able to speak naturally, that's part of accommodating right? The difference with this UX is that it's not like a button, and the button looks pretty much the same to everyone. Everyone talks differently, and you have to accommodate them all. It's not called computer intelligence. It's called artificial intelligence, right? And guess who does the artificial thing?
That'd be us, right? So we build what makes it look intelligent to the user, and the developer does that work. So yay, for us job security. But it's tough. It's tough work, so enjoyable, and delightful. But it's not easy. So Alexa should understand most requests, your certification testing process will ask you to test your skill. And they're not just saying like, it works on my Alexa device, right? They're asking you to build a beta program and have a bunch of users test it because all those users should not get a script of what to read to Alexa, you should tell them what you want them to do. And let them say it however they want. And I guarantee you, I was working with a financial firm, never knew that someone would use bucks in a transactional context on Alexa. But they do like I want 20 bucks transferred to this account.
Okay, so we added that we ended up adding 2700 variations on how to transfer money to different accounts. Skill should respond in an appropriate format. Really, if you're interested, if this like, you're like, man, that sounds kind of cool, or you're super interested. But even if you're a little bit, you really should think about how to start with Alexa, we encourage a crawl, walk run concept, right? We encourage you to go through the process of building an MVP, anyone know what that means? What's that mean? Minimum Viable Product, but we don't like that because viable is not good. And for our Amazon customers, Viable doesn't fly.
So we call it the Minimum Remarkable Product. And that's why you go and figure out what your data tells you about the most important things people want, and build the minimum remarkable product, build only one if you have to, but build one. And then add stuff over time, like daily affirmation, I've evolved, I don't know, 100 times since it came out. I still have an average of 20,000 users, you know, a week, which I think is crazy. And it's mostly at lunchtime, just to let you know, your friends, like you need some love at lunchtime.
So it's just cool. We've got lots of metrics to display so you can decide how you're using and building your skills. But this is what we say we say really, like start simple, find something meaningful, and build for that. And then just remember, you're not building for the screen anymore. So you actually have to talk to people. But that's okay. You could start by talking with me. I'll be out. We won't have a huge amount of time for questions. We're going to open up for a couple, but I will definitely be here all day coffee hour and again after the panel this afternoon. So let's go ahead.