Skip to content

Conversation

giladgd
Copy link
Member

@giladgd giladgd commented Mar 1, 2024

Description of change

  • feat: detect the available compute layers on the system and use the best one by default
  • feat: more guardrails to not load an incompatible prebuilt binary, to prevent process crashes due to linux distro differences
  • feat: improve logs as to why system-related issues occur and how to fix them
  • feat: inspect command
  • feat: add GemmaChatWrapper
  • feat: TemplateChatWrapper - easier method to create simple chat wrappers, see the type docs for more info
  • fix: adapt to llama.cpp breaking change
  • fix: when a specific compute layer is requested, fail the build if it is not found
  • fix: return user-defined llama tokens
  • docs: update more docs to prepare for version 3.0

Fixes #160
Fixes #169

How to use node-llama-cpp after this change

node-llama-cpp will now detect the available compute layers on the system and use the best one by default.
If the best one fails to load, it'll try the next best option until it manages to load the bindings.

To use this logic, just use getLlama without specifying the compute layer:

import {getLlama} from "node-llama-cpp";

const llama = await getLlama();

To force it to load a specific compute layer, you can use the gpu parameter on getLlama:

import {getLlama} from "node-llama-cpp";

const llama = await getLlama({
    gpu: "vulkan" // defaults to `"auto"`. can also be `"cuda"` or `false` (to not use the GPU at all)
});

To inspect what compute layers are detected in your system, you can run this command:

npx --no node-llama-cpp inspect gpu

If this command fails to find CUDA or Vulkan although using getLlama with gpu set to one of them works, please open an issue so I can investigate it

Pull-Request Checklist

  • Code is up-to-date with the master branch
  • npm run format to apply eslint formatting
  • npm run test passes with this change
  • This pull request links relevant issues as Fixes #0000
  • There are new or updated unit tests validating the change
  • Documentation has been updated to reflect this change
  • The new commits and pull request title follow conventions explained in pull request guidelines (PRs that do not follow this convention will not be merged)

@giladgd giladgd requested a review from ido-pluto March 1, 2024 23:29
@giladgd giladgd self-assigned this Mar 1, 2024
@giladgd giladgd linked an issue Mar 2, 2024 that may be closed by this pull request
3 tasks
@giladgd giladgd mentioned this pull request Mar 2, 2024
3 tasks
@giladgd giladgd linked an issue Mar 2, 2024 that may be closed by this pull request
3 tasks
Copy link
Contributor

@ido-pluto ido-pluto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ido-pluto ido-pluto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM

@giladgd giladgd merged commit 5a70576 into beta Mar 3, 2024
@giladgd giladgd deleted the gilad/autoGpuDetection branch March 3, 2024 21:46
Copy link

github-actions bot commented Mar 3, 2024

🎉 This PR is included in version 3.0.0-beta.13 🎉

The release is available on:

Your semantic-release bot 📦🚀

@giladgd giladgd mentioned this pull request Mar 3, 2024
17 tasks
@giladgd giladgd added this to the v3.0.0 milestone Mar 3, 2024
@giladgd giladgd mentioned this pull request Mar 16, 2024
7 tasks
Copy link

github-actions bot commented Sep 24, 2024

🎉 This PR is included in version 3.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

EOS token is not detected properly for some models after upgrading to v3.0 Fail to run in docker image
2 participants