Skip to content

v2.3.2 update made some large models stop working due to Out Of Memory #183

@felladrin

Description

@felladrin

For the record, the v2.3.2 update (#179) made some large models (2.3 GB+ [Gemma 3 4B, Qwen 3 4B and Llama 3.1 Nemotron Nano 4b, all at Q4_K_S with 4096 context]) stop working due to Out Of Memory.

Not sure if it's because llama.cpp started requiring more memory for those models.

Current workaround options:

  • Reduce the context size
  • Downgrade to Q3 quant
  • Downgrade the @wllama/wllama to v2.3.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions