Skip to content

Commit 722e29d

Browse files
authored
feat: gpt-oss support (#487)
* feat: `gpt-oss` support * fix: Qwen3 memory estimation
1 parent ea0d815 commit 722e29d

28 files changed

+3129
-237
lines changed

.vitepress/theme/index.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ import type {EnhanceAppContext} from "vitepress";
1919
export default {
2020
extends: Theme,
2121
Layout: () => {
22-
const text = "DeepSeek R1 is here!";
23-
const link = "/blog/v3.6-deepseek-r1";
24-
const hideDate = new Date("2025-06-01T00:00:00Z");
22+
const text = "gpt-oss is here!";
23+
const link = "/blog/v3.12-gpt-oss";
24+
const hideDate = new Date("2025-11-01T00:00:00Z");
2525

2626
return h(LayoutContainer, null, h(Theme.Layout, null, {
2727
"home-hero-info-before": () => h(LatestVersionHomeBadge, {

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515

1616
</div>
1717

18-
[DeepSeek R1 is here!](https://node-llama-cpp.withcat.ai/blog/v3.6-deepseek-r1)
18+
[`gpt-oss` is here!](https://node-llama-cpp.withcat.ai/blog/v3.12-gpt-oss)
1919

2020
## Features
2121
* Run LLMs locally on your machine

docs/blog/v3.12-gpt-oss.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
---
2+
title: gpt-oss is here!
3+
date: 2025-08-09T18:00:00Z
4+
lastUpdated: false
5+
author:
6+
name: Gilad S.
7+
github: giladgd
8+
category: Release
9+
description: Learn how to use gpt-oss to its full potential with node-llama-cpp
10+
image:
11+
url: https://github.com/user-attachments/assets/df5f1f59-a2cd-4fdb-b60c-3214f4a1584b
12+
alt: "node-llama-cpp + gpt-oss"
13+
width: 3072
14+
height: 1536
15+
---
16+
[`node-llama-cpp`](https://node-llama-cpp.withcat.ai) v3.12 is here, with full support for [`gpt-oss`](https://huggingface.co/openai/gpt-oss-20b) models!
17+
18+
---
19+
20+
## gpt-oss
21+
[`gpt-oss`](https://huggingface.co/openai/gpt-oss-20b) comes in two flavors:
22+
* [`gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b) - 21B parameters with 3.6B active parameters
23+
* [`gpt-oss-120b`](https://huggingface.co/openai/gpt-oss-120b) - 117B parameters with 5.1B active parameters
24+
25+
Here are a few highlights of these models:
26+
* Due to the low number of active parameters, these models are very fast
27+
* These are reasoning models, and you can adjust their reasoning efforts
28+
* They are very good at function calling, and are built with agentic capabilities in mind
29+
* These models were trained with native MXFP4 precision, so no need to quantize them further.
30+
They're small compared to their capabilities already
31+
* They are provided with an Apache 2.0 license, so you can use them in your commercial applications
32+
33+
34+
## Recommended Models
35+
Here are some recommended model URIs you can use to try out `gpt-oss` right away:
36+
| Model | Size | URI |
37+
|--------------------------------------------------------------------|--------|-----------------------------------------------------------------------|
38+
| [`gpt-oss-20b`](https://huggingface.co/giladgd/gpt-oss-20b-GGUF) | 12.1GB | `hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf` |
39+
| [`gpt-oss-120b`](https://huggingface.co/giladgd/gpt-oss-120b-GGUF) | 63.4GB | `hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.MXFP4-00001-of-00002.gguf` |
40+
41+
::: info TIP
42+
[Estimate the compatibility](../cli/inspect/estimate.md) of a model with your machine before downloading it:
43+
```shell
44+
npx -y node-llama-cpp inspect estimate <model URI>
45+
```
46+
:::
47+
48+
49+
### Try It Using the CLI
50+
To quickly try out [`gpt-oss-20b`](https://huggingface.co/giladgd/gpt-oss-20b-GGUF), you can use the [CLI `chat` command](../cli/chat.md):
51+
52+
```shell
53+
npx -y node-llama-cpp chat --ef --prompt "Hi there" hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf
54+
```
55+
56+
57+
## Customizing gpt-oss
58+
You can adjust `gpt-oss`'s responses by configuring the options of [`HarmonyChatWrapper`](../api/classes/HarmonyChatWrapper.md):
59+
```typescript
60+
import {
61+
getLlama, resolveModelFile, LlamaChatSession,
62+
HarmonyChatWrapper
63+
} from "node-llama-cpp";
64+
65+
const modelUri = "hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf";
66+
67+
68+
const llama = await getLlama();
69+
const model = await llama.loadModel({
70+
modelPath: await resolveModelFile(modelUri)
71+
});
72+
const context = await model.createContext();
73+
const session = new LlamaChatSession({
74+
contextSequence: context.getSequence(),
75+
chatWrapper: new HarmonyChatWrapper({
76+
modelIdentity: "You are ChatGPT, a large language model trained by OpenAI.",
77+
reasoningEffort: "high"
78+
})
79+
});
80+
81+
const q1 = "What is the weather like in SF?";
82+
console.log("User: " + q1);
83+
84+
const a1 = await session.prompt(q1);
85+
console.log("AI: " + a1);
86+
```
87+
88+
### Using Function Calling
89+
`gpt-oss` models have great support for function calling.
90+
However, these models don't support parallel function calling, so only one function will be called at a time.
91+
92+
```typescript
93+
import {
94+
getLlama, resolveModelFile, LlamaChatSession,
95+
defineChatSessionFunction
96+
} from "node-llama-cpp";
97+
98+
const modelUri = "hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf";
99+
100+
101+
const llama = await getLlama();
102+
const model = await llama.loadModel({
103+
modelPath: await resolveModelFile(modelUri)
104+
});
105+
const context = await model.createContext();
106+
const session = new LlamaChatSession({
107+
contextSequence: context.getSequence()
108+
});
109+
110+
const functions = {
111+
getCurrentWeather: defineChatSessionFunction({
112+
description: "Gets the current weather in the provided location.",
113+
params: {
114+
type: "object",
115+
properties: {
116+
location: {
117+
type: "string",
118+
description: "The city and state, e.g. San Francisco, CA"
119+
},
120+
format: {
121+
enum: ["celsius", "fahrenheit"]
122+
}
123+
}
124+
},
125+
handler({location, format}) {
126+
console.log(`Getting current weather for "${location}" in ${format}`);
127+
128+
return {
129+
// simulate a weather API response
130+
temperature: format === "celsius" ? 20 : 68,
131+
format
132+
};
133+
}
134+
})
135+
};
136+
137+
const q1 = "What is the weather like in SF?";
138+
console.log("User: " + q1);
139+
140+
const a1 = await session.prompt(q1, {functions});
141+
console.log("AI: " + a1);
142+
```

0 commit comments

Comments
 (0)