-
-
Notifications
You must be signed in to change notification settings - Fork 147
feat: use the best compute layer available by default #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LTGM
🎉 This PR is included in version 3.0.0-beta.13 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
🎉 This PR is included in version 3.0.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Description of change
inspect
commandGemmaChatWrapper
TemplateChatWrapper
- easier method to create simple chat wrappers, see the type docs for more infollama.cpp
breaking changeFixes #160
Fixes #169
How to use
node-llama-cpp
after this changenode-llama-cpp
will now detect the available compute layers on the system and use the best one by default.If the best one fails to load, it'll try the next best option until it manages to load the bindings.
To use this logic, just use
getLlama
without specifying the compute layer:To force it to load a specific compute layer, you can use the
gpu
parameter ongetLlama
:To inspect what compute layers are detected in your system, you can run this command:
Pull-Request Checklist
master
branchnpm run format
to apply eslint formattingnpm run test
passes with this changeFixes #0000