hpr1599 :: Interview with Ingmar Steiner from the MaryTTS project

Ken interviews Ingmar Steiner from the MaryTTS text to speech project.

Hosted by Ken Fallon on 2014-09-18 is flagged as Clean and is released under a CC-BY-SA license.
Tags: text to speech, MaryTTS, ORCA. Comments: 9.
The show is available on the Internet Archive at: https://archive.org/details/hpr1599

Listen in ogg, spx, or mp3 format. Play now:

Duration: 01:25:48

Part of the series: Interviews.

HPR Correspondents bring you Interviews from interesting people and projects

In today's show Ken interviews Ingmar Steiner who is the lead developer for the mary text to speech project. MaryTTS is an open-source, multilingual text-to-speech synthesis system written in pure java and is released under the LGPL. During the interview we get a history of the project, and dive into speech synthesis and we look at how to make your own voices.

Photo of Ingmar sitting on a rock in a pine forest with eyes focused on his grey mac laptop

Show Transcript

Automatically generated using whisper

whisper --model tiny --language en hpr1599.wav

You can save these subtitle files to the same location as the HPR Episode, and they will automatically show in players like mpv, vlc. Some players allow you to specify the subtitle file location.

<< First, < Previous, Next >, Latest >>

Comments

Subscribe to the comments RSS feed.

Comment #1 posted on 2014-09-19 15:01:51 by laindir

This is me laughing

Absolutely loved the part after the interview. It gives a real sense of the work they're doing and the incredible strides in quality that have been made in open source TTS tech.

Comment #2 posted on 2014-09-21 00:06:30 by Kevin O'Brien

Great show

I really enjoyed this show Ken. I appreciated learning more about how you develop an application like this. Please do have Ingmar back at some time to continue.

Comment #3 posted on 2014-09-22 20:05:55 by johanv

Dutch voice

I am certainly looking forward to a follow up show about creating a Dutch voice. :-)

Comment #4 posted on 2014-09-23 15:45:00 by davidWHITMAN

Mary TTS

Great show. Gotta admire those who have put the effort imto projects like this. go GNU!

Comment #5 posted on 2014-11-09 15:01:29 by Steve Bickle

How to for Debian

I've put together a how-to showing how to get MaryTTS installed and running on Debian. It is at https://blog.bickle.co.uk/podcasts/marytts-voice-synthesizer-how-to-for-debian/

Comment #6 posted on 2014-11-13 15:39:59 by Mike Ray

MaryTTS howto etc

Thanks for the great howto on installing MaryTTS.

I have installed it and run it on my Debian desktop and I have to say so far that I fail to see what everybody is raving about.

Writing any kind of software speech synthesiser is a massive undertaking and I take my hat off to anybody that can do it.

But to those who gripe about eSpeak and rave about MaryTTS I have to say; eSpeak is lean and mean and supports dozens of languages. MaryTTS on the other hand is bloated and the voice I have heard is not very much better than I am used to with eSpeak.

Speaking as a blind computer user, small footprint and fast, crisp operation is far more important than the sound of the voice. I fail to see how I could write a long text document on a modest machine and expect MaryTTS to keep up with the fact I have been typing for thirty years.

And I am speaking as a blind person.

With one or two notable exceptions, possibly the use of a TTS engine by children with print disabilities other than blindness, nobody need look any further than eSpeak, IMHO.

Where something like MaryTTS _might_ win, is in the creation of static wav files for repeated use, but for on-the-fly tts, nothing beats eSpeak.

Comment #7 posted on 2014-11-26 23:24:36 by Steve Bickle

Horses for courses

Mike,

I am not a TTS developer either, I agree that eSpeak is a fantastic piece of code. As someone who started programming on the ZX81 and Atari 400 I can appreciate compact code.

The eSpeak voice is intelligible, and so I'm lead to believe, can still be understood at high speeds.

The eSpeak voice is generally not aesthetically pleasing to those less familiar with TTS. I think Ken was looking to the MaryTTS voices to find something more appealing to the general listener.

Having had a little more time to play with MaryTTS I can now appreciate that although seemingly more natural some elements of the voices are less intelligible at times. This may be the clipping you referred to on the mail list (I don't know because I don't really have a vocabulary to describe TTS voice quality). What I have noticed is that there are two types of MaryTTS voices, conversely to expectations the ones with the larger data-set appear to be less intelligible.

Which if any of the Mary voices are the clearest/cleanest?

The goals of eSpeak and MaryTTS are somewhat different, the Mary project appears to be a university research project. Having had a bit of a dig around in the MaryTTS code, I've found that it includes a lot of tools for recording and creating voices. There is also a whole range of effects processing and other tools to amend the vocal output model. Its definitely not a light weight TTS solution, but I don't think that was ever the intention.

I did notice that eSpeak can create static wav files using the -w switch so it probably wins there too.

Where MaryTTS or similar projects may win out over eSpeak would be to provide a more suitable voice to those who rely on speech synthesis to be able to speak. I recently heard this TED talk. https://www.ted.com/talks/rupal_patel_synthetic_voices_as_unique_as_fingerprints . The voices featured here appear to be a great improvement over MaryTTS, but I don't know what software they are using or if it is open source.

Comment #8 posted on 2014-11-26 23:29:30 by Steve Bickle

Maryspeak project now on github

Just wanted to add a quick note to the episode to say that the maryspeak project is on now on github along with the documentation in markdown files at https://github.com/scbickle/maryspeak

Comment #9 posted on 2014-11-29 00:31:12 by Mike Ray

maryspeak, great stuff

Hello Steve. Great stuff again with maryspeak. I've cloned it from github and at the moment I can't get any speech out of it but I suspect that's a permissions issue or something. Which user does maryspeak run as? If it runs as the user that executes the maryspeak command I would expect sound if the user belongs to the 'audio' group. I will solve it though because I am sure it is something I have not done.

I will pass this stuff on to Fernando of the F123 project because he has aksed me if I can produce a MaryTTS speech-dispatcher module and maryspeak may be an easy to hack the espeak-generic module to make marytts-generic.

On the subject of eSpeak; I suspect some folks have problems with languages other than English. Certainly Fernando says it is hard to understand when it is speaking Porteugese (I probably spelt that wrong).

I guess this is quite possible since I doubt Jonathan Duddington is polyglot :)

Nice to see that the maryspeak repo also contains the MaryTTS Debian howto.

Thanks again.

Mike

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Your Name/Handle:
Title:
Comment:
Anti Spam Question:	What does the letter P in HPR stand for?
Are you a spammer?	Yes No
What is the HOST_ID for the host of this show?
What does HPR mean to you?

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes every weekday Monday through Friday.
This page was generated by The HPR Robot at Sat, 27 Apr 2024 06:23:36 +0000