`GGML_ASSERT: /Users/runner/work/node-llama-cpp/node-llama-cpp/llama/llama.cpp/llama.cpp:5052: n_tokens <= n_batch` #94

jparismorgan · 2023-11-12T22:18:46Z

jparismorgan
Nov 12, 2023

Hi there, I'm getting GGML_ASSERT: /Users/runner/work/node-llama-cpp/node-llama-cpp/llama/llama.cpp/llama.cpp:5052: n_tokens <= n_batch when using "node-llama-cpp": "^2.8.0", running https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF, and setting batchSize: 512:

const model = new LlamaModel({
  modelPath: '~/repo/gguf-models/llama-2-7b-chat.Q4_K_M.gguf',
  useMlock: true
})
const context = new LlamaContext({
  model,
  batchSize: 512,
  threads: 8,
  contextSize: 4096
})
session = new LlamaChatSession({
  context,
  systemPrompt: "This is a transcript of a never ending conversation between Paris and Siri. This is the personality of Siri: Siri is a knowledgeable and friendly AI. They are very curious and will ask the user a lot of questions about themselves and their life.\nSiri is a virtual assistant who lives on Paris's computer.",
  printLLamaSystemInfo: true,
  promptWrapper: new LlamaChatPromptWrapper()
})

I can work around it by using 512 * 4, but I'd like to keep it lower. My system info:

Apple M2 Pro 32 GB running 13.3 (22E252)
Llama system info AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 |

I could put together a better repro of this, but wondering, has anyone else run into this? Or does anyone have suggestions here on what to change?

Thank you for any help, I appreciate it!

Answered by giladgd

Nov 14, 2023

There's currently an issue with prompts that are longer than the batchSize; it'll be fixed as part of #85.
For a workaround for now, see #76

View full answer

giladgd · 2023-11-14T19:41:54Z

giladgd
Nov 14, 2023
Maintainer

There's currently an issue with prompts that are longer than the batchSize; it'll be fixed as part of #85.
For a workaround for now, see #76

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`GGML_ASSERT: /Users/runner/work/node-llama-cpp/node-llama-cpp/llama/llama.cpp/llama.cpp:5052: n_tokens <= n_batch` #94

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

GGML_ASSERT: /Users/runner/work/node-llama-cpp/node-llama-cpp/llama/llama.cpp/llama.cpp:5052: n_tokens <= n_batch #94

jparismorgan Nov 12, 2023

Replies: 1 comment

giladgd Nov 14, 2023 Maintainer

`GGML_ASSERT: /Users/runner/work/node-llama-cpp/node-llama-cpp/llama/llama.cpp/llama.cpp:5052: n_tokens <= n_batch` #94

jparismorgan
Nov 12, 2023

giladgd
Nov 14, 2023
Maintainer