|
| 1 | +--- |
| 2 | +title: gpt-oss is here! |
| 3 | +date: 2025-08-09T18:00:00Z |
| 4 | +lastUpdated: false |
| 5 | +author: |
| 6 | + name: Gilad S. |
| 7 | + github: giladgd |
| 8 | +category: Release |
| 9 | +description: Learn how to use gpt-oss to its full potential with node-llama-cpp |
| 10 | +image: |
| 11 | + url: https://github.com/user-attachments/assets/df5f1f59-a2cd-4fdb-b60c-3214f4a1584b |
| 12 | + alt: "node-llama-cpp + gpt-oss" |
| 13 | + width: 3072 |
| 14 | + height: 1536 |
| 15 | +--- |
| 16 | +[`node-llama-cpp`](https://node-llama-cpp.withcat.ai) v3.12 is here, with full support for [`gpt-oss`](https://huggingface.co/openai/gpt-oss-20b) models! |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +## gpt-oss |
| 21 | +[`gpt-oss`](https://huggingface.co/openai/gpt-oss-20b) comes in two flavors: |
| 22 | +* [`gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b) - 21B parameters with 3.6B active parameters |
| 23 | +* [`gpt-oss-120b`](https://huggingface.co/openai/gpt-oss-120b) - 117B parameters with 5.1B active parameters |
| 24 | + |
| 25 | +Here are a few highlights of these models: |
| 26 | +* Due to the low number of active parameters, these models are very fast |
| 27 | +* These are reasoning models, and you can adjust their reasoning efforts |
| 28 | +* They are very good at function calling, and are built with agentic capabilities in mind |
| 29 | +* These models were trained with native MXFP4 precision, so no need to quantize them further. |
| 30 | + They're small compared to their capabilities already |
| 31 | +* They are provided with an Apache 2.0 license, so you can use them in your commercial applications |
| 32 | + |
| 33 | + |
| 34 | +## Recommended Models |
| 35 | +Here are some recommended model URIs you can use to try out `gpt-oss` right away: |
| 36 | +| Model | Size | URI | |
| 37 | +|--------------------------------------------------------------------|--------|-----------------------------------------------------------------------| |
| 38 | +| [`gpt-oss-20b`](https://huggingface.co/giladgd/gpt-oss-20b-GGUF) | 12.1GB | `hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf` | |
| 39 | +| [`gpt-oss-120b`](https://huggingface.co/giladgd/gpt-oss-120b-GGUF) | 63.4GB | `hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.MXFP4-00001-of-00002.gguf` | |
| 40 | + |
| 41 | +::: info TIP |
| 42 | +[Estimate the compatibility](../cli/inspect/estimate.md) of a model with your machine before downloading it: |
| 43 | +```shell |
| 44 | +npx -y node-llama-cpp inspect estimate <model URI> |
| 45 | +``` |
| 46 | +::: |
| 47 | + |
| 48 | + |
| 49 | +### Try It Using the CLI |
| 50 | +To quickly try out [`gpt-oss-20b`](https://huggingface.co/giladgd/gpt-oss-20b-GGUF), you can use the [CLI `chat` command](../cli/chat.md): |
| 51 | + |
| 52 | +```shell |
| 53 | +npx -y node-llama-cpp chat --ef --prompt "Hi there" hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf |
| 54 | +``` |
| 55 | + |
| 56 | + |
| 57 | +## Customizing gpt-oss |
| 58 | +You can adjust `gpt-oss`'s responses by configuring the options of [`HarmonyChatWrapper`](../api/classes/HarmonyChatWrapper.md): |
| 59 | +```typescript |
| 60 | +import { |
| 61 | + getLlama, resolveModelFile, LlamaChatSession, |
| 62 | + HarmonyChatWrapper |
| 63 | +} from "node-llama-cpp"; |
| 64 | + |
| 65 | +const modelUri = "hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf"; |
| 66 | + |
| 67 | + |
| 68 | +const llama = await getLlama(); |
| 69 | +const model = await llama.loadModel({ |
| 70 | + modelPath: await resolveModelFile(modelUri) |
| 71 | +}); |
| 72 | +const context = await model.createContext(); |
| 73 | +const session = new LlamaChatSession({ |
| 74 | + contextSequence: context.getSequence(), |
| 75 | + chatWrapper: new HarmonyChatWrapper({ |
| 76 | + modelIdentity: "You are ChatGPT, a large language model trained by OpenAI.", |
| 77 | + reasoningEffort: "high" |
| 78 | + }) |
| 79 | +}); |
| 80 | + |
| 81 | +const q1 = "What is the weather like in SF?"; |
| 82 | +console.log("User: " + q1); |
| 83 | + |
| 84 | +const a1 = await session.prompt(q1); |
| 85 | +console.log("AI: " + a1); |
| 86 | +``` |
| 87 | + |
| 88 | +### Using Function Calling |
| 89 | +`gpt-oss` models have great support for function calling. |
| 90 | +However, these models don't support parallel function calling, so only one function will be called at a time. |
| 91 | + |
| 92 | +```typescript |
| 93 | +import { |
| 94 | + getLlama, resolveModelFile, LlamaChatSession, |
| 95 | + defineChatSessionFunction |
| 96 | +} from "node-llama-cpp"; |
| 97 | + |
| 98 | +const modelUri = "hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf"; |
| 99 | + |
| 100 | + |
| 101 | +const llama = await getLlama(); |
| 102 | +const model = await llama.loadModel({ |
| 103 | + modelPath: await resolveModelFile(modelUri) |
| 104 | +}); |
| 105 | +const context = await model.createContext(); |
| 106 | +const session = new LlamaChatSession({ |
| 107 | + contextSequence: context.getSequence() |
| 108 | +}); |
| 109 | + |
| 110 | +const functions = { |
| 111 | + getCurrentWeather: defineChatSessionFunction({ |
| 112 | + description: "Gets the current weather in the provided location.", |
| 113 | + params: { |
| 114 | + type: "object", |
| 115 | + properties: { |
| 116 | + location: { |
| 117 | + type: "string", |
| 118 | + description: "The city and state, e.g. San Francisco, CA" |
| 119 | + }, |
| 120 | + format: { |
| 121 | + enum: ["celsius", "fahrenheit"] |
| 122 | + } |
| 123 | + } |
| 124 | + }, |
| 125 | + handler({location, format}) { |
| 126 | + console.log(`Getting current weather for "${location}" in ${format}`); |
| 127 | + |
| 128 | + return { |
| 129 | + // simulate a weather API response |
| 130 | + temperature: format === "celsius" ? 20 : 68, |
| 131 | + format |
| 132 | + }; |
| 133 | + } |
| 134 | + }) |
| 135 | +}; |
| 136 | + |
| 137 | +const q1 = "What is the weather like in SF?"; |
| 138 | +console.log("User: " + q1); |
| 139 | + |
| 140 | +const a1 = await session.prompt(q1, {functions}); |
| 141 | +console.log("AI: " + a1); |
| 142 | +``` |
0 commit comments