A way to generate .mp3 files from ePUB.
I wanted to continue "reading" even after my eyes got tired or if I went away from the screen. For these cases, what I did was select a given text, and select "Speech". If you've used this before, you know it's fine for small sections, but not for big chunks of text.
So, I challenged myself to create a project that would "text-to-speech" a given book. After some attempts, I came up with a good-enough initial version using Golang. It will parse an ePUB into sections and run MacOS's `say` command to generate audio files.
For now it's quite basic, but worked fine for the few files I tested.
What does this do for longer texts? I have beem using Android TTS but produces unpleasant audio in terms of describing feelings (prosody). NotebookLM although very limiting produces very live audio.
Longer texts work the same, I'd say. Of course, they take longer to TTS. I've listened to 2 whole books already and it felt pretty good. I've never used NotebookLM, so I can't say how they differ.
Sometimes it gets a bit messy around punctuation, but I've never had to go back and listen again.
If I interpreted the "prosody" part correctly, then there is some intonation, but it doesn't mimic emotions, so it's very basic. For example, it will slightly differentiate "Shut up" from "Shut up!" (with exclamation mark).
Did you build this for Christmas?
Because I did. For the kids. Not exactly the same:
1. Put in any ebook, in any language. 2. LLM translates it into German. 3. XTTS turns it into audio.
Devil’s in the details, of course.
I’m usually very picky about translations. Many books that I’ve read in English and wanted to gift to someone in my family have turned out so rotten in their translation that their “soul” was lost. Against that background, I am very pleased with the results.
Cost: About $0.20 per book. A bit more if it’s Asimov’s New Guide to Science.
That's cool, specially for that cost.
I think if the ePUB and the OS are in the same language, it should (?) work, although I haven't tested it.
Eleven labs does it for free (for now) and it works pretty well, multiple voices available and some are high quality
I did look for some options at the time, but wasn't satisfied by them, specially because of cost. I even found a ~6gb container that would supposedly do this. But in the end, I already had the tools to produce the audio, so I just needed to orchestrate it.
Another way of doing this is by using the 'Record TTS' facility of Librera Reader - a book reader for Android, available on F-Droid, no iOS version available as far as I know - which can use the TTS service on Android to narrate all formats supported by Librera to audio files.