v2.3.2 update made some large models stop working due to Out Of Memory

For the record, the v2.3.2 update (https://github.com/ngxson/wllama/pull/179) made some large models (2.3 GB+ [Gemma 3 4B, Qwen 3 4B and Llama 3.1 Nemotron Nano 4b, all at Q4_K_S with 4096 context]) stop working due to Out Of Memory.

Not sure if it's because llama.cpp started requiring more memory for those models.

Current workaround options:
- Reduce the context size
- Downgrade to Q3 quant
- Downgrade the [@wllama/wllama](https://www.npmjs.com/package/@wllama/wllama?activeTab=versions) to v2.3.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v2.3.2 update made some large models stop working due to Out Of Memory #183

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

v2.3.2 update made some large models stop working due to Out Of Memory #183

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions