Skip to content

Conversation

dathudeptrai
Copy link
Collaborator

@dathudeptrai dathudeptrai commented Nov 24, 2020

This PR is an implementation of the HiFi-GAN vocoder (https://arxiv.org/abs/2010.05646). The training process follows melgan_stft. The model logic follows by original HiFi-GAN PyTorch code (https://github.com/jik876/hifi-gan).

@dathudeptrai dathudeptrai marked this pull request as draft November 24, 2020 05:12
@dathudeptrai dathudeptrai marked this pull request as ready for review November 24, 2020 07:17
@dathudeptrai dathudeptrai self-assigned this Nov 24, 2020
@dathudeptrai dathudeptrai added new feature new feature enhancement 🚀 New feature or request labels Nov 24, 2020
machineko
machineko previously approved these changes Nov 24, 2020
Copy link
Contributor

@machineko machineko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGFM

ZDisket
ZDisket previously approved these changes Nov 24, 2020
Copy link
Collaborator

@ZDisket ZDisket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@dathudeptrai
Copy link
Collaborator Author

@machineko @ZDisket can you guys try to training around 2k steps to verify if it works :))) I do not have GPU right now to test :))). There are so many differences between a private library and this opensource:v.

@machineko
Copy link
Contributor

@machineko @ZDisket can you guys try to training around 2k steps to verify if it works :))) I do not have GPU right now to test :))). There are so many differences between a private library and this opensource:v.

Same not available GPU at the moment but will test in 2/3 days 📦

@machineko
Copy link
Contributor

@dathudeptrai Did u wanna to use config v2 or v1 for it?

@ZDisket
Copy link
Collaborator

ZDisket commented Nov 25, 2020

@dathudeptrai

can you guys try to training around 2k steps to verify if it works :))) I do not have GPU right now to test :))). There are so many differences between a private library and this opensource:v.

I could train 4k steps and counting with v1 config and mixed precision without problems. I even got eval samples at 5k.

@dathudeptrai
Copy link
Collaborator Author

@dathudeptrai Did u wanna to use config v2 or v1 for it?

v2 for faster :D

@dathudeptrai
Copy link
Collaborator Author

@dathudeptrai

can you guys try to training around 2k steps to verify if it works :))) I do not have GPU right now to test :))). There are so many differences between a private library and this opensource:v.

I could train 4k steps and counting with v1 config and mixed precision without problems. I even got eval samples at 5k.

is the loss ok ? , can you try to continue training both G and D around 1k steps :D

@ZDisket
Copy link
Collaborator

ZDisket commented Nov 25, 2020

@dathudeptrai For some reason the loss exploded after 10k and the eval samples are either noise or silence, although I think it's just because of the small dataset. Going to restart training and train discriminator from 0 steps
ta
ta2

@dathudeptrai dathudeptrai dismissed stale reviews from ZDisket and machineko via e1ff1ec November 25, 2020 02:41
@ZDisket
Copy link
Collaborator

ZDisket commented Nov 25, 2020

can you try to continue training both G and D around 1k steps :D

Completed 2k steps of G+D starting from 0 steps. No problems.

@dathudeptrai dathudeptrai deleted the hifigan branch November 25, 2020 03:30
@lesswrongzh
Copy link

@ZDisket could you share your hifigan tensorboard like this?
image

@EmreOzkose
Copy link

I also have this problem. My tensorboard:

Screenshot from 2021-04-05 21-44-00

and predictions are all same noisy sound. For example:

example

What could be the problem? I first trained generator and after resume.

@ZDisket
Copy link
Collaborator

ZDisket commented Apr 5, 2021

@EmreOzkose The TensorflowTTS implementation is not faithful to the original when it comes to the optimizer. The official implementation uses AdamW optimizer with ExponentialLR, while the one in this repo uses Adam with PiecewiseConstantDecay. Plus there is no generator pretraining in the original.
https://github.com/jik876/hifi-gan/blob/4769534d45265d52a904b850da5a622601885777/train.py#L63-L72
A while back I implemented some changes to make it the same and it didn't die during training, but I didn't train it to completion so I haven't evaluated it enough to warrant a PR. Still, the .zip I'm attaching has the training script and a v2-based 44.1KHz config with all the changes. See if it helps.
hf.zip

@EmreOzkose
Copy link

I am checking out, thank you @ZDisket.

@EmreOzkose
Copy link

@EmreOzkose The TensorflowTTS implementation is not faithful to the original when it comes to the optimizer. The official implementation uses AdamW optimizer with ExponentialLR, while the one in this repo uses Adam with PiecewiseConstantDecay. Plus there is no generator pretraining in the original.
https://github.com/jik876/hifi-gan/blob/4769534d45265d52a904b850da5a622601885777/train.py#L63-L72
A while back I implemented some changes to make it the same and it didn't die during training, but I didn't train it to completion so I haven't evaluated it enough to warrant a PR. Still, the .zip I'm attaching has the training script and a v2-based 44.1KHz config with all the changes. See if it helps.
hf.zip

I tried a training with the same setup and got different signals from a noise. Thank you 😃

@ZDisket
Copy link
Collaborator

ZDisket commented Apr 20, 2021

@EmreOzkose Any updates? Did it do well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🚀 New feature or request new feature new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants