Tuesday, October 15, 2024

Train F5-TTS Voice Model on Custom Dataset for Free Locally - Step by Step Tutorial

 This video is a step-by-step tutorial to fine-tune or do full training of a voice model F5-TTS and E2-TTS on your own custom voice dataset locally. 





Code:
git clone https://github.com/SWivid/F5-TTS.git && cd F5-TTS

cd ckpts

mkdir F5TTS_Base

wget https://huggingface.co/SWivid/F5-TTS/resolve/main/F5TTS_Base/model_1200000.safetensors?download=true

-- In train.py, in Line 75, make sure that path points to your model's directory
-- In models/trainer.py , in Line 94, make sure that path points to your model's directory


conda create -n ai python=3.11 -y && conda activate ai

pip install torch torchaudio
pip install git+https://github.com/huggingface/transformers
pip install git+https://github.com/huggingface/accelerate
pip install huggingface_hub
pip install pandas datasets

import pandas as pd
from datasets import load_dataset

dataset = load_dataset("amphion/Emilia-Dataset")

dataset.save_to_disk("/home/Ubuntu/mydataset/emilia_subset")

# prepare custom dataset up to your need
# download corresponding dataset first, and fill in the path in scripts , you may tailor your own one along with a Dataset class in model/dataset.py.

# Prepare the Emilia dataset
python scripts/prepare_emilia.py

# Prepare the Wenetspeech4TTS dataset
python scripts/prepare_wenetspeech4tts.py

Training
Once your datasets are prepared, you can start the training process.

# setup accelerate config, e.g. use multi-gpu ddp, fp16
# will be to: ~/.cache/huggingface/accelerate/default_config.yaml    
accelerate config
accelerate launch train.py

No comments: