Author: Axel Axel Fernández Curros

Guide to transform text to AI voice audio. To continue the work of the whisper audio-to-text and text-to-text translate.py script from the previous post we create a script to convert any given text to an audio file with our own voice tone from a given 10 second audio .wav file as input. The script has been modified to accept large amount of phrases so it would be possible to use a large amount of text and would not affect the audio quality.

Record your own voice

The clip should have about 8-10 seconds in a good quality. Audacity can be used to record and export the voice. Remind to export it as .wav format.

Install python dependencies

  • Linux
    pip3 install gitpython
    pip3 install gdown
    
  • Windows (chocolatey)
    pip install gitpython
    pip install gdown
    

Clone repository

git clone https://github.com/Axlfc/UE5-python
cd UE5-python/Content/Python

Run the python script

  • Linux
    python3 voice_cloning.py "Whatever I write here is going to be read with the same voice" audio.wav
    
  • Windows
    python voice_cloning.py "Whatever I write here is going to be read with the same voice" audio.wav
    

<
Previous Post
Translating speech to text
>
Next Post
Generating text with AI