Skip to content

Crash if --processors > 8 #2521

@jrosener

Description

@jrosener

I am writing an application that is able to transcribe multiple audio in parallel using the same model.
For that I use one common whisper_context for multiple whisper_state used by worker threads where transcriptions processing are performed with whisper_full_with_state().
It works perfectly until 8 parallel transcriptions but crashes into whisper_full_with_state() if running more transcriptions.

Because this implementation is based on whisper_full_parallel() used by the main sample application, it is possible to reproduce the issue by running it using more than 8 --processors: ./build/bin/main --model ggml-tiny.bin --processors 9 10min_audio_french.wav
It does not matter if it is running on cpu, openvino, cuda,...: it always crashes.

Results:

$ ./build/bin/main --model ggml-tiny.bin --processors 9 10min_audio_french.wav 
whisper_init_from_file_with_params_no_state: loading model from 'ggml-tiny.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =    77.11 MB
whisper_model_load: model size    =   77.11 MB
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   13.19 MB
whisper_init_state: compute buffer (encode) =   64.79 MB
whisper_init_state: compute buffer (cross)  =    3.88 MB
whisper_init_state: compute buffer (decode) =   95.89 MB

system_info: n_threads = 36 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0

main: processing '10min_audio_french.wav' (9600000 samples, 600.0 sec), 4 threads, 9 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   13.19 MB
whisper_init_state: compute buffer (encode) =   64.79 MB
whisper_init_state: compute buffer (cross)  =    3.88 MB
whisper_init_state: compute buffer (decode) =   95.89 MB
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   13.19 MB
whisper_init_state: compute buffer (encode) =   64.79 MB
whisper_init_state: compute buffer (cross)  =    3.88 MB
whisper_init_state: compute buffer (decode) =   95.89 MB
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   13.19 MB
whisper_init_state: compute buffer (encode) =   64.79 MB
whisper_init_state: compute buffer (cross)  =    3.88 MB
whisper_init_state: compute buffer (decode) =   95.89 MB
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   13.19 MB
whisper_init_state: compute buffer (encode) =   64.79 MB
whisper_init_state: compute buffer (cross)  =    3.88 MB
whisper_init_state: compute buffer (decode) =   95.89 MB
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   13.19 MB
whisper_init_state: compute buffer (encode) =   64.79 MB
whisper_init_state: compute buffer (cross)  =    3.88 MB
whisper_init_state: compute buffer (decode) =   95.89 MB
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   13.19 MB
whisper_init_state: compute buffer (encode) =   64.79 MB
whisper_init_state: compute buffer (cross)  =    3.88 MB
whisper_init_state: compute buffer (decode) =   95.89 MB
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   13.19 MB
whisper_init_state: compute buffer (encode) =   64.79 MB
whisper_init_state: compute buffer (cross)  =    3.88 MB
whisper_init_state: compute buffer (decode) =   95.89 MB
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   13.19 MB
whisper_init_state: compute buffer (encode) =   64.79 MB
whisper_init_state: compute buffer (cross)  =    3.88 MB
whisper_init_state: compute buffer (decode) =   95.89 MB
Segmentation fault (core dumped)

Questions:

  • Are 8 parallel transcriptions (so 8 running whisper_state for 1 common context) a known limitation ?
  • If yes, is there a way to increase that limitation ?
  • Or is it a bug ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions