This video is a step-by-step tutorial to fine-tune or do full training of a voice model F5-TTS and E2-TTS on your own custom voice dataset locally.
Code:
git clone https://github.com/SWivid/F5-TTS.git && cd F5-TTS
cd ckpts
mkdir F5TTS_Base
wget https://huggingface.co/SWivid/F5-TTS/resolve/main/F5TTS_Base/model_1200000.safetensors?download=true
-- In train.py, in Line 75, make sure that path points to your model's directory
-- In models/trainer.py , in Line 94, make sure that path points to your model's directory
conda create -n ai python=3.11 -y && conda activate ai
pip install torch torchaudio
pip install git+https://github.com/huggingface/transformers
pip install git+https://github.com/huggingface/accelerate
pip install huggingface_hub
pip install pandas datasets
import pandas as pd
from datasets import load_dataset
dataset = load_dataset("amphion/Emilia-Dataset")
dataset.save_to_disk("/home/Ubuntu/mydataset/emilia_subset")
# prepare custom dataset up to your need
# download corresponding dataset first, and fill in the path in scripts , you may tailor your own one along with a Dataset class in model/dataset.py.
# Prepare the Emilia dataset
python scripts/prepare_emilia.py
# Prepare the Wenetspeech4TTS dataset
python scripts/prepare_wenetspeech4tts.py
Training
Once your datasets are prepared, you can start the training process.
# setup accelerate config, e.g. use multi-gpu ddp, fp16
# will be to: ~/.cache/huggingface/accelerate/default_config.yaml
accelerate config
accelerate launch train.py
No comments:
Post a Comment