Two Years of Voice-Based Assistants, Echo and Home

Waterluvian · on Dec 26, 2016

What I need is natural natural language processing to become so good that it just works. I don't respect voice as an input method because I feel uncomfortable using it, since I have to pre plan my question rather than stream consciousness like I will to my family and peers. I want to be able to have this conversation:

"Computer, how much does a.. hmm what's it called.. uhh.. a loud keyboard, what's that called?"

"A mechanical keyboard?"

"Yes. The ones from the.. uh I forget the company name."

"Corsair? Razer?"

"Razer! How much does their newest mechanical keyboard cost?"

"The newest keyboard from Razer is...<etc>"

kalleboo · on Dec 26, 2016

Heck screw the helping you complete your thought for you, if it just handled umms and uhhs and pauses without going "I didn't get that" I would be 100% happier

dqv · on Dec 26, 2016

>What I need is natural natural language processing to become so good that it just works.

That's supernatural language processing. You're asking the system to recognize a thought you have in your mind that you can't effectively communicate with words.

The loud keyboard - makes sense. But the company name? My first thought was IBM or Cherry.

I think it can be improved, but I'm never going to expect voice recognition to fill in the blanks for something I can't describe with my voice.

frankomonster · on Dec 26, 2016

Filling in the blanks is something that humans do all the time with conversation. Just the other day I was asking my girl what was the last name of that woman named Susan from Britains Got Talent. Conversation went like this:

Me: Babe, do you remember the last name of Susan from Britains Got Talent?

Her: Susan... kinda...

Me: I think it began with a "B"

Her: Boyle!

Maybe todays systems would understand me if I phrased the question in a context they understand but why bother taking the time. I can just talk to a human.

It'll get interesting though as computers continue to improve.

harisenbon · on Dec 26, 2016

I was curious, so I asked Siri your question: what is the last name of Susan from Britain's got talent? She successfully answered Boyle.

visarga · on Dec 26, 2016

You know, the first and probably most important step into realizing natural-NLP is to make a dataset of such prompts and answers. I really like the examples.

enobrev · on Dec 26, 2016

Excellent examples. I can't imagine an AI ever picking up on conversations between my wife and I.

Me: Honey, can you grab me the thing on that thing.

Her: Oh, this thing?

Me: The other thing... Perfect, thanks!

skoocda · on Dec 26, 2016

Children's voices are a surprisingly tough problem in speech recognition. Mostly because there isn't much labeled data with children's voices- ASR follows the trend seen in other deep learning fields of working best for North American adult males.

When children talk, it's a significantly different pitch than any adult (male or female) and their enunciation is usually poor. Being robust to that range requires a ton of data and a very deep neural network. It will definitely be solved earlier on the cloud: don't expect super-adaptable speech recognition to be available on your phone any time soon.

gok · on Dec 26, 2016

There's also a legal/moral question of whether it's ok to capture the speech of children to build better models for this kind of thing.

grzm · on Dec 26, 2016

Interesting question, but I'm not sure why it would be any different from monitoring the adults who bought and installed the system. It would be part of the terms of service, I would think. The adults accept this on their behalf. Would there be another interpretation?

gok · on Dec 26, 2016

In the US, COPPA makes it very difficult to legally collect information from children under 13 for commercial purposes.

grzm · on Dec 26, 2016

Interesting. Do you know of specifics about COPPA (or COPA) that would apply to this situation?

gok · on Dec 26, 2016

COPA is a piece of dead legislation. COPPA is in effect since 2000.

The big issue recently is that in 2011 the FTC made the rules much stricter about data collection [1]. Parental consent now requires identification checks which are hard. Data retention is also a bit of a mess; guidelines now imply that the data should be deleted as soon as possible.

Morally, I do feel like there's a bit of a question here. Is it ok to have a 6 year old donate her voice to improve your speech recognition product even though she wouldn't directly see a benefit from it?

[1] http://www.natlawreview.com/article/ftc-will-propose-broader...

icebraining · on Dec 26, 2016

Is it ok to have a 6 year old donate her voice to improve your speech recognition product even though she wouldn't directly see a benefit from it?

Why couldn't you provide some direct benefit? IIRC that's was the point of Google Voice: provide a free product in exchange of getting people to help them improve voice recognition.

donw · on Dec 26, 2016

I'm going to stick my neck out as a potential Luddite here, but outside of playing music, and some general "answering questions", I don't see a use case for things like the Echo or Dot.

Being able to ask for timers, or unit conversions while cooking, is probably the biggest bang-for-the-buck that I get out of Siri.

But outside of that, there's nothing that a Dot does for me that warrants having a microphone in my house that is 24/7 connected to Amazon.

Not having some sort of voice-print analysis is also a real concern. A friend bought an Echo a little while back, and me being me, I couldn't resist the urge to ask it to order 10 large bags of kitty litter... which it cheerfully tried to do.

Maybe I'm just odd. What do other people use these things for?

xyzzy_plugh · on Dec 26, 2016

I use it to turn my lights on and off. It works great for this, I don't need to leave my chair/couch/bed and I can turn many lights on quicker than any other method I'm aware of.

I use it to play music, usually through Spotify. I use Amazon music occasionally when I'm looking for a song (usually while cooking or the like) but on the whole it's inferior.

I ask for the weather in two or three cities while I'm tying my shoes, so I can decide if I need a jacket.

I use it for timers when I'm already using my oven timer. I appreciate being able to ask how much time is left while I'm not in the kitchen and thus can't see the oven.

I use it for unit conversions (how many oz in a quart? How many tablespoons in half a cup?).

The most use I get from it is when my hands are busy, my phone is either: in my pocket, or in another room plugged in charging. Or dead (this happens less frequently since I bought a new phone recently). I also find that context matters. When I use my phone's voice assistant I'm usually trying to do something on my phone. With my Echo I'm just trying to do something. I have a lot more success with the Echo.

I bought a few dots when they went on sale this fall, but I've only set up one. It works surprisingly well with the echo. They rarely get confused. Sometimes the device much farther away responds instead.

You can set up a passcode for ordering, or disable it entirely -- I've never used it and probably never will. I already carry a microphone around with me all day and work in an office surrounded by literally hundreds of microphones, or a few dozen in my house (my Dropcam is always on and has a mic. My TV remote has a mic for some reason I hope to never discover. My laptop, my iPad, and various other devices have microphones too).

Look, I didn't see a use case for it either. But once I got on, it infiltrated my home and now feels like an extension of it. I also didn't see the point of smart phones, tablets or IoT devices in general, but now I have all of them.

It's clear this sort of interaction is the future.

agildehaus · on Dec 26, 2016

Under your definition, your phone is a 24/7 microphone connected to either Google or Apple, your PC a 24/7 microphone connected to Microsoft and literally every other vendor that has a service running.

The privacy concern is valid, but its also valid for literally every other piece of electronics. At some point you have to trust that it's only recording during the short period after you say "OK Google" and if it were doing something more nefarious someone would figure that out and it'd be huge and damaging news.

These are basically first-generation products. The idea is to improve upon their problems, find where they are useful and where they are not, and step closer to having a real virtual assistant.

visarga · on Dec 26, 2016

TL;DR - automate privacy protection in order to serve the uninformed masses of people leaking sensitive data online

We're disclosing more and more private information to assistants, but especially to Facebook, Google and the respective phone company we're using. I think it would be the case to study automatic detection of sensitive information disclosure, in order to place better privacy guards.

I envision a system where the web browser or voice agent would immediately know the sensitiveness level of information we are about to divulge, and route it through anonymous systems or block the leak before it happens.

A database containing our online identities, credit cards, passwords and text run through a topic classifier would make good features for sensitive disclosure protection. It would be the privacy equivalent of antivirus software. Maybe we could have a smart (privacy protecting) web browser and agent.

In the future, people are going to have to convince their personal assistants if they need to disclose any private info to third parties. It's not going to be so easy to collect massive hordes of private data about people. It's just the natural step for privacy, in a world where AI is already being used so much to undermine it. Time to get some AI fighting for our side of the privacy war.

chris_st · on Dec 26, 2016

My son-in-law was over for Christmas, and he didn't know that I had an Echo dot, so to show it off I said, "Alexa, tell me a joke". It told us a lame joke. My SIL had to whip out his android phone and say, "Hey Google, tell me a joke!" and it told us a lame, funny joke. Score one for Google.

Then he said, "Google, tell me a Christmas joke" and it told us a pretty poor Christmas joke. I then asked Alexa for a Christmas joke, and it told us a funny Christmas joke!

So, clearly, in by far the most import imaginable use case, they're pretty much tied :-)

Eridrus · on Dec 26, 2016

My Nexus 5X has asked me a few times if I'd like to unlock it with my voice, so it seems like they've got it working to some extent.

I think that beyond the security concerns, it will be really useful when you share the assistant, so that it can have the proper context for recommendations, calendar info, etc.

deegles · on Dec 26, 2016

What would a voice assistant have to do to convince you otherwise?

tootie · on Dec 26, 2016

I'm still ambivalent. I think this tech will end up as another dead end. Playing music by speech is a novelty. I don't see anything really interesting coming from this space.

wsh91 · on Dec 26, 2016

It's way more than playing music by speech. It's looking up stuff, controlling your house, checking restaurant hours, all sorts of functions that would otherwise be fulfilled by a smartphone with less input effort.

Give one of these devices a try, you might be surprised by how much you adapt to them.

te_chris · on Dec 26, 2016

This. Also, because it's a home device it can just sit there, being used when you need and ignored until you need it again.

Cidan · on Dec 26, 2016

https://www.lhup.edu/~dsimanek/neverwrk.htm

chris_st · on Dec 26, 2016

Possibly the coolest thing about the Echo is developing your own skill. If you're comfortable with JavaScript (or, I believe, python or Java) you can really quickly put together a skill, which in development mode only you can use. Their example templates are really clear.

So a lot of this would be easy to set up, if you're willing to do a bit of below-IFTTT type programming. And, since it uses AWS Lambda, your usage is almost certainly going to stay in the free usage zone (you get a comparative ton of compute time for free each month with Lambda).

I'm sure Google has, or will soon have, such a development environment for their device as well.

dgacmu · on Dec 27, 2016

I think it's Actions on Google (but I've never tried it): https://developers.google.com/actions/

It looks like there's a "user friendly" version that's api.ai, and then there's an SDK, and the SDK seems to be node.js based. I'm not clear from reading it if you can do a private version just for your device. I'll have to kick the tires one of these days - thank you!

westmeal · on Dec 26, 2016

Things like the Amazon Dot and Echo make me extremely uncomfortable. If I had to create some sort of voice assistant it'd have to store voice data locally instead of being on some server somewhere.

deegles · on Dec 26, 2016

You can delete all of your voice recordings from Alexa using the companion app.

mxstbr · on Dec 26, 2016

I got a Home for Christmas, now I'm even more excited to try it!

The article doesn't really got into it, but it'd be interesting to know why the authors family now only uses Home vs the Echo previously!

dgacmu · on Dec 26, 2016

The Home did a better job of interpreting requests, mostly. We found that we had to contort ourselves more to get the Echo to do some things. "Play the Nutcracker" -- no go. Even though it was in our library. "Play tchaikovsky's nutcracker suite" would work, IIRC. The home handled both. Informational requests also worked a bit better on the Home. ("What is a..."). We weren't using any extra skills that we'd miss, so it was a very easy transition.

I wish the Home had the equivalent of the Dot. I wish the Echo had the Home's inference capability.

lowglow · on Dec 26, 2016

If you're interested in building voice based personalization models, we're going to be tackling this problem. I'd be interested to chat with anyone with experience in this area. Hit me up. :)