The Flight of a Wild Duck

Here is an update on my quest to create an audio book using a voice clone


As I wrote earlier, many people requested an audio version of my book, The Flight of a Wild Duck. I was surprised to learn that many people prefer audiobooks. Audiobooks are the only segment of the book publishing industry seeing growth.  

While I wanted to produce an audio version of my book, I did care to use the voice to be that of some hired actor. I wanted to be my voice. However, I could not bring myself to read 320 pages. If I read it without stopping or correcting it, it would take 10 hours. I am not skilled at reading out loud. I am sure I would have to do some editing and retakes. 

So, I was very excited when I started experimenting with creating a clone of my voice. I first did that on Descript, the tool I have been experimenting with to do videos and podcasts. It is amazing. I was able to clone my voice as well. Descript converts voice to text, which you can scan/scroll to edit. For instance, if you decide to delete a sentence, the video and audio for that sentence are also deleted. I cloned my voice so I could use their “overdub” capability. It works well to correct a word or insert something small.  

Descript and the five other AI voice clone apps I tried can speak in my voice. But they lack my phrasing, emotion, etc. They are a poor substitute. I do not doubt that sometime soon, an AI voice clone will be able to mimic my style as well. It would be indistinguishable from me in five years or less.

However, I discovered a way to get close now when I tried the Ukrainian company Respeecher. They sent me an example: Reid Hoffman, the founder of LinkedIn, audiobook, Amplifying Our Humanity through AI.

Here is how the process works. An actor studies the audio of the person whose voice will be cloned. Then, they read the book, imitating the style of the author. Respeecher takes many samples of the author’s voice (this can be done using audio that has already been recorded, for instance, podcasts. Then, they perform voice-to-voice cloning. They can change the sound of the voice but keep all the expressions, like pauses, sighs, and laughs. This process can be done in multiple languages as well. You need actors to read in each language. Of course, the book must first be translated into the languages used.  

Unfortunately for me, the whole process is rather expensive. I estimated that producing the audiobook this way would cost more than $10,000, including hiring the actor. I only sold 500 copies of my book so far, sadly. If I sold $1000 audiobooks, each would cost $100 to produce. So, I will wait until the costs of the cloning work come down, and the technology has advanced to the point that I do not need an actor.  

Leave a comment