Cloud based voice recognition, no thanks!

Story: Mycroft's AI Could Power Ubuntu's Unity and Give Users Voice ControlTotal Replies: 13
Author Content
penguinist

Aug 28, 2015
6:59 PM EDT
Umm, sorry I'm not piping my living room microphone to anyone's server for processing, let alone to Google's. They like to keep everything forever and use what they keep in innovative ways.

This Mycroft offering as I understand it offloads your voice to Google for processing, so therefore it's not something for me or my family, as interested as I am in adding voice recognition to my RPi2-based media center.

It is valid to say that the RPi2 is now getting up to the processing horsepower needed to do voice recognition on its own. I'm especially interested in the progress that's being made with the Carnegie Mellon PocketSphinx open source project. It is a slimmed down version of the full capability Sphinx project, but designed to run on small machines. This guy seems to have made it work well for him running directly on the raspberry pi 2.

Maybe it's time to spend a spare day coming up to speed on Pocket Sphinx.
helios

Sep 04, 2015
3:44 PM EDT
However, this might be something I would be interested in. I gave a "talk" at Texas Linux Fest this year, bemoaning the horrible shape of text to speech software in the Linuxsphere. Of course the TTS was done from a subscription-based website that translates text to speech. It does a pretty good job actually, although some of the voice inflection glitches are pretty funny.

I am currently the project manager for a GUI to make accessing MaryTTS much easier. Mary is an open source TTS application written in Java. We have an alpha out now and I will post it in an article, possibly here in the coming week.

But I will give the Mycroft project a better look.

The pre-article for my presentation is much better than the presentation itself. A mixture of technical issues make my presentation hard to orchestrate due to my hearing loss and poor speaker placement. Instead, I think I will write a feature for LXer if The Powers That Be can use it.
penguinist

Sep 04, 2015
6:20 PM EDT
Text to speech (TTS) is the easy part. I've installed espeak on my raspberry pi media center and it works great. espeak is even in the raspbian repo:

From your RPi do:

sudo apt-get install espeak
Then give it something to say and do like this:

espeak -ven+f3 -k5 -s150 "Playing Big Buck Bunny.  I hope you enjoy it Mr and Mrs Penguinist."
omxplayer big_buck_bunny_1080p_surround.avi


The hard part is going in the other direction, from speech to text, i.e. voice recognition. The Pocket Sphinx project looks pretty promising even for small systems like the RPi2. I'm hoping to free up some time to explore it soon.

JaseP

Sep 05, 2015
9:38 AM EDT
Quoting: The hard part is going in the other direction, from speech to text, i.e. voice recognition.


Speech to text has always been a trickier job to handle,... More AI potential & processor intensive...
penguinist

Sep 05, 2015
10:55 AM EDT
My dream project would be to have a "headless" RPi in the car. No display, just speech in both directions. Each word I speak launches a corresponding script. (i.e. - my words are fed to the shell)

When I'm not driving, the RPi would be accessible over a wifi ssh connection for programming, configuration and data transfers, and then while driving I can keep my eyes on the road and still have full voice control over the system.

Is this an unrealistic dream, or are we close enough now to this reality?
nmset

Sep 05, 2015
1:26 PM EDT
Your eyes would be on the road but your mind would be in the scripts, specially if something goes wrong. I won't do that, too dangerous, my scripts won't go down in my tomb (:-
penguinist

Sep 05, 2015
1:57 PM EDT
Yes I can see that trying to construct a sed command line to edit a configuration file while driving might be hazardous to one's driving concentration. :)

I was thinking more along the lines of just invoking some scripts that were prepared in advance, like for example:

close GarageDoor

play TaylorSwift

volume up

tea EarlGray hot (joke)

And on longer trips maybe something like:

read WarAndPeace

Then again, now that I think about it, listening to Taylor Swift while driving is already a distraction no matter how the music is invoked. :)

helios

Sep 05, 2015
5:03 PM EDT
"Text to speech (TTS) is the easy part."

No, no it's not. Not text to speech that has ease of use and voices that don't sound like a melding of Cher and a constipated robot. And just so it's said, I could not get the sound link above to play. What am I missing?

The Linux Desktop is our target. Not gadgets. And while I appreciate the Maker Mentality, and often hack stuff that shouldn't be hacked, there isn't a decent software answer for the Linuxsphere. Hopefully, I can have an article here on LXer in the coming week to show folks how TTS in Linux should be done. Should be. Those are two words that can be meaningless and profound at the same time.

But on the positive side, Penguinist has enticed me to the dark side. I will be purchasing a PI myself and plink around with it, to see what mayhem I can produce. Stand back folks...nobody needs to get hurt here.
penguinist

Sep 05, 2015
6:02 PM EDT
Quoting:I could not get the sound link above to play. What am I missing?


Without a few clues it would be hard to say. I've been running espeak on the RPi because it is connected to my home entertainment center and has the best sound quality of anything I have, but since you mentioned having some trouble with it I just now tried it with Fedora 22 running on my Lenovo notebook. The session went like this:

# dnf install espeak
$ espeak "hello world"


I was greeted with a pleasant male voice. If you don't like the default voice, there are 166 files in the /usr/share/espeak-data/voices directory to explore.

I found these voices to be interesting ones to check out:

espeak -v f4 "hello world"
espeak -v whisper "hello world"
espeak -v whisperf "hello world"
espeak -v  f3 "hello world"


My favorite is "f3" which is a female voice with a sexy British accent.
gus3

Sep 06, 2015
1:14 PM EDT
"Cher and a Constipated Robot" sounds like the name of a rock band.
jdixon

Sep 06, 2015
3:25 PM EDT
> "Cher and a Constipated Robot"

You mean that wasn't the nickname for her and Sonny Bono when they were together?
cybertao

Sep 07, 2015
2:28 AM EDT
espeak -v  f3 "do you believe in life after love, ooooooh"
jdixon

Sep 07, 2015
9:23 AM EDT
OK, you all have me curious now. Time to install espeak with sbopkg and see what I get. I'll let you know the results.

OK, this is all with Slackware 14.1. Espeak requires portaudio, so it has to be installed first. Both portaudio and espeak installed with no problems. Once both were installed it produces recognizable speech. Nothing spectacular, but on par with the GPS units I've used. It sends lots of errors to standard output about unknown PCM card interfaces, but apparently that's normal.
cybertao

Sep 07, 2015
10:08 AM EDT
It's not bad, but it's not great either. The accessibility option in Android is better for example.

I've always been curious about accessibility options for the disabled. They just don't seem to be given enough attention. I'm using Fedora 22 at the moment and espeak was already installed because it's used for the 'Screen Reader'. It's not very practical for someone with deteriorated eyesight though, which is probably more to do with the environment not designed to convey useful information in a consistent manner - GNU/Linux distributions aren't solely to blame as accessibility isn't a priority for most projects. Pressing the command key, typing in 'term...', opening the terminal windows, typing 'ls /home', and listening to the results, is possible but incredibly frustrating if you close your eyes. The text to speech is barely understandable unless you already know what items it is reading. In comparison, asking Google to do a search and having it read the answer must be a godsend.

I'm part of the problem myself, I've never designed a website that also caters to the visually impaired beyond appropriate alt tags for images (and many websites don't do that). Blindness can unexpectedly happen to any one of us or loved ones for wide range of reasons.

Posting in this forum is limited to members of the group: [ForumMods, SITEADMINS, MEMBERS.]

Becoming a member of LXer is easy and free. Join Us!