What I need is natural natural language processing to become so good that it just works. I don't respect voice as an input method because I feel uncomfortable using it, since I have to pre plan my question rather than stream consciousness like I will to my family and peers. I want to be able to have this conversation:
"Computer, how much does a.. hmm what's it called.. uhh.. a loud keyboard, what's that called?"
"A mechanical keyboard?"
"Yes. The ones from the.. uh I forget the company name."
"Corsair? Razer?"
"Razer! How much does their newest mechanical keyboard cost?"
Heck screw the helping you complete your thought for you, if it just handled umms and uhhs and pauses without going "I didn't get that" I would be 100% happier
>What I need is natural natural language processing to become so good that it just works.
That's supernatural language processing. You're asking the system to recognize a thought you have in your mind that you can't effectively communicate with words.
The loud keyboard - makes sense. But the company name? My first thought was IBM or Cherry.
I think it can be improved, but I'm never going to expect voice recognition to fill in the blanks for something I can't describe with my voice.
Filling in the blanks is something that humans do all the time with conversation. Just the other day I was asking my girl what was the last name of that woman named Susan from Britains Got Talent. Conversation went like this:
Me: Babe, do you remember the last name of Susan from Britains Got Talent?
Her: Susan... kinda...
Me: I think it began with a "B"
Her: Boyle!
Maybe todays systems would understand me if I phrased the question in a context they understand but why bother taking the time. I can just talk to a human.
It'll get interesting though as computers continue to improve.
You know, the first and probably most important step into realizing natural-NLP is to make a dataset of such prompts and answers. I really like the examples.
Children's voices are a surprisingly tough problem in speech recognition. Mostly because there isn't much labeled data with children's voices- ASR follows the trend seen in other deep learning fields of working best for North American adult males.
When children talk, it's a significantly different pitch than any adult (male or female) and their enunciation is usually poor. Being robust to that range requires a ton of data and a very deep neural network. It will definitely be solved earlier on the cloud: don't expect super-adaptable speech recognition to be available on your phone any time soon.
Interesting question, but I'm not sure why it would be any different from monitoring the adults who bought and installed the system. It would be part of the terms of service, I would think. The adults accept this on their behalf. Would there be another interpretation?
COPA is a piece of dead legislation. COPPA is in effect since 2000.
The big issue recently is that in 2011 the FTC made the rules much stricter about data collection [1]. Parental consent now requires
identification checks which are hard. Data retention is also a bit of a mess; guidelines now imply that the data should be deleted as soon as possible.
Morally, I do feel like there's a bit of a question here. Is it ok to have a 6 year old donate her voice to improve your speech recognition product even though she wouldn't directly see a benefit from it?
Is it ok to have a 6 year old donate her voice to improve your speech recognition product even though she wouldn't directly see a benefit from it?
Why couldn't you provide some direct benefit? IIRC that's was the point of Google Voice: provide a free product in exchange of getting people to help them improve voice recognition.
I'm going to stick my neck out as a potential Luddite here, but outside of playing music, and some general "answering questions", I don't see a use case for things like the Echo or Dot.
Being able to ask for timers, or unit conversions while cooking, is probably the biggest bang-for-the-buck that I get out of Siri.
But outside of that, there's nothing that a Dot does for me that warrants having a microphone in my house that is 24/7 connected to Amazon.
Not having some sort of voice-print analysis is also a real concern. A friend bought an Echo a little while back, and me being me, I couldn't resist the urge to ask it to order 10 large bags of kitty litter... which it cheerfully tried to do.
Maybe I'm just odd. What do other people use these things for?
I use it to turn my lights on and off. It works great for this, I don't need to leave my chair/couch/bed and I can turn many lights on quicker than any other method I'm aware of.
I use it to play music, usually through Spotify. I use Amazon music occasionally when I'm looking for a song (usually while cooking or the like) but on the whole it's inferior.
I ask for the weather in two or three cities while I'm tying my shoes, so I can decide if I need a jacket.
I use it for timers when I'm already using my oven timer. I appreciate being able to ask how much time is left while I'm not in the kitchen and thus can't see the oven.
I use it for unit conversions (how many oz in a quart? How many tablespoons in half a cup?).
The most use I get from it is when my hands are busy, my phone is either: in my pocket, or in another room plugged in charging. Or dead (this happens less frequently since I bought a new phone recently). I also find that context matters. When I use my phone's voice assistant I'm usually trying to do something on my phone. With my Echo I'm just trying to do something. I have a lot more success with the Echo.
I bought a few dots when they went on sale this fall, but I've only set up one. It works surprisingly well with the echo. They rarely get confused. Sometimes the device much farther away responds instead.
You can set up a passcode for ordering, or disable it entirely -- I've never used it and probably never will. I already carry a microphone around with me all day and work in an office surrounded by literally hundreds of microphones, or a few dozen in my house (my Dropcam is always on and has a mic. My TV remote has a mic for some reason I hope to never discover. My laptop, my iPad, and various other devices have microphones too).
Look, I didn't see a use case for it either. But once I got on, it infiltrated my home and now feels like an extension of it. I also didn't see the point of smart phones, tablets or IoT devices in general, but now I have all of them.
It's clear this sort of interaction is the future.
Under your definition, your phone is a 24/7 microphone connected to either Google or Apple, your PC a 24/7 microphone connected to Microsoft and literally every other vendor that has a service running.
The privacy concern is valid, but its also valid for literally every other piece of electronics. At some point you have to trust that it's only recording during the short period after you say "OK Google" and if it were doing something more nefarious someone would figure that out and it'd be huge and damaging news.
These are basically first-generation products. The idea is to improve upon their problems, find where they are useful and where they are not, and step closer to having a real virtual assistant.
TL;DR - automate privacy protection in order to serve the uninformed masses of people leaking sensitive data online
We're disclosing more and more private information to assistants, but especially to Facebook, Google and the respective phone company we're using. I think it would be the case to study automatic detection of sensitive information disclosure, in order to place better privacy guards.
I envision a system where the web browser or voice agent would immediately know the sensitiveness level of information we are about to divulge, and route it through anonymous systems or block the leak before it happens.
A database containing our online identities, credit cards, passwords and text run through a topic classifier would make good features for sensitive disclosure protection. It would be the privacy equivalent of antivirus software. Maybe we could have a smart (privacy protecting) web browser and agent.
In the future, people are going to have to convince their personal assistants if they need to disclose any private info to third parties. It's not going to be so easy to collect massive hordes of private data about people. It's just the natural step for privacy, in a world where AI is already being used so much to undermine it. Time to get some AI fighting for our side of the privacy war.
My son-in-law was over for Christmas, and he didn't know that I had an Echo dot, so to show it off I said, "Alexa, tell me a joke". It told us a lame joke. My SIL had to whip out his android phone and say, "Hey Google, tell me a joke!" and it told us a lame, funny joke. Score one for Google.
Then he said, "Google, tell me a Christmas joke" and it told us a pretty poor Christmas joke. I then asked Alexa for a Christmas joke, and it told us a funny Christmas joke!
So, clearly, in by far the most import imaginable use case, they're pretty much tied :-)
My Nexus 5X has asked me a few times if I'd like to unlock it with my voice, so it seems like they've got it working to some extent.
I think that beyond the security concerns, it will be really useful when you share the assistant, so that it can have the proper context for recommendations, calendar info, etc.
I'm still ambivalent. I think this tech will end up as another dead end. Playing music by speech is a novelty. I don't see anything really interesting coming from this space.
It's way more than playing music by speech. It's looking up stuff, controlling your house, checking restaurant hours, all sorts of functions that would otherwise be fulfilled by a smartphone with less input effort.
Give one of these devices a try, you might be surprised by how much you adapt to them.
Possibly the coolest thing about the Echo is developing your own skill. If you're comfortable with JavaScript (or, I believe, python or Java) you can really quickly put together a skill, which in development mode only you can use. Their example templates are really clear.
So a lot of this would be easy to set up, if you're willing to do a bit of below-IFTTT type programming. And, since it uses AWS Lambda, your usage is almost certainly going to stay in the free usage zone (you get a comparative ton of compute time for free each month with Lambda).
I'm sure Google has, or will soon have, such a development environment for their device as well.
It looks like there's a "user friendly" version that's api.ai, and then there's an SDK, and the SDK seems to be node.js based. I'm not clear from reading it if you can do a private version just for your device. I'll have to kick the tires one of these days - thank you!
Things like the Amazon Dot and Echo make me extremely uncomfortable. If I had to create some sort of voice assistant it'd have to store voice data locally instead of being on some server somewhere.
The Home did a better job of interpreting requests, mostly. We found that we had to contort ourselves more to get the Echo to do some things. "Play the Nutcracker" -- no go. Even though it was in our library. "Play tchaikovsky's nutcracker suite" would work, IIRC. The home handled both. Informational requests also worked a bit better on the Home. ("What is a..."). We weren't using any extra skills that we'd miss, so it was a very easy transition.
I wish the Home had the equivalent of the Dot. I wish the Echo had the Home's inference capability.
If you're interested in building voice based personalization models, we're going to be tackling this problem. I'd be interested to chat with anyone with experience in this area. Hit me up. :)
"Computer, how much does a.. hmm what's it called.. uhh.. a loud keyboard, what's that called?"
"A mechanical keyboard?"
"Yes. The ones from the.. uh I forget the company name."
"Corsair? Razer?"
"Razer! How much does their newest mechanical keyboard cost?"
"The newest keyboard from Razer is...<etc>"