I'm getting the following error when i try to run the finetuning example: RuntimeError: CUDA out of memory. Tried to allocate 85.00 MiB (GPU 0; 4.00 GiB total capacity; 3.04 GiB already allocated; 9.21 MiB free; 15.31 MiB cached) Reducing the batch_size didn't help.