Skip to content

Obtaining an embeddings vector for a larger text #2712

@s-trooper

Description

@s-trooper

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I would like to obtain an embedding vector for larger texts, e.g. 4K, 8K or more.

Current Behavior

I get an error:
ggml_new_object: not enough space in the context's memory pool (needed 12747504, available 12747472)

Environment and Context

When I create a text file with 197 lines of "Hello World", like:

Hello World
Hello World
...

I get the embedding vector as expected.
However, when I add just one more line, I receive the error not enough space in the context's memory pool.
Yet, my RAM/VRAM is being used less than 15%!

I know there are many issues related to this error, but I haven't found any solution for embeddings.

  • Physical (or virtual) hardware you are using, e.g. for Linux:

    • CPU i7-9700K @ 3.60GHz
    • GPU RTX 3090 TI VRAM 24 GB
    • RAM 80 GB
  • Operating System, e.g. for Linux:

    • Windows 11

Failure Information (for bugs)

ggml_new_object: not enough space in the context's memory pool (needed 12747504, available 12747472)

Steps to Reproduce

  1. Create a text file named "text-of-2367-bytes.txt" containing over 198 lines of "Hello World".
  2. .\llama-master-cb1c072-bin-win-cublas-cu11.7.1-x64\embedding.exe -ngl 80 -c 2048 -m .\models\wizard-vicuna-13b-uncensored-superhot-8k.ggmlv3.q4_K_M.bin -f .\text-of-2367-bytes.txt

Failure Logs

Example run with the Windows command embedding

.\llama-master-cb1c072-bin-win-cublas-cu11.7.1-x64\embedding.exe -ngl 80 -c 2048 -m .\models\wizard-vicuna-13b-uncensored-superhot-8k.ggmlv3.q4_K_M.bin -f .\text-of-2367-bytes.txt 
main: build = 1010 (cb1c072)
main: seed  = 1692704725
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6
llama.cpp: loading model from .\models\wizard-vicuna-13b-uncensored-superhot-8k.ggmlv3.q4_K_M.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_head_kv  = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 15 (mostly Q4_K - Medium)
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.11 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required  =  582.00 MB (+ 1600.00 MB per state)
llama_model_load_internal: allocating batch_size x (640 kB + n_ctx x 160 B) = 480 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 40 repeating layers to GPU
llama_model_load_internal: offloading non-repeating layers to GPU
llama_model_load_internal: offloading v cache to GPU
llama_model_load_internal: offloading k cache to GPU
llama_model_load_internal: offloaded 43/43 layers to GPU
llama_model_load_internal: total VRAM used: 9493 MB
llama_new_context_with_model: kv self size  = 1600.00 MB

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
ggml_new_object: not enough space in the context's memory pool (needed 12747504, available 12747472)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions