[Community] Training AutoencoderKL

Hi!

I am working on latent diffusion for audio and music. It seems to me that Diffusers 🧨 is the place to be! There is a feature I would like to request: Training AutoencoderKL (Variational Autoencoder).

What I would love to do, is training AutoencoderKL on square and non-square images, either with one or more than one channels. I checked the implementation, and it seems to me that due to its fully convolutional nature, this would be perfectly possible.

A good start would be a script/notebook that shows how to train AutoencoderKL on a Hugging Face dataset. On the long run it could be even a Trainer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Community] Training AutoencoderKL #894

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Community] Training AutoencoderKL #894

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions