You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to include my own system prompt file when start the Llamafile.
I have tried: -spf FNAME & --system-prompt-file FNAME both won't work.
How did I verify?
Simply ask assistant's name and it should reply "My name is Lisa".
After start Llamafile, it always said "her name is Nova".
Here is the simple command I issued: pi@raspberrypi:~/Downloads/llamafile $ ./Llama-3.2-1B-Instruct.Q6_K.llamafile --system-prompt-file ./myPrompt.json --verbose
here is my prompt file content: { "system_prompt": { "prompt": "Your name is Lisa. You are a helpful, kind, and honest assistant. You provide short, concise, and useful answers. *Before answering, carefully consider the question and the available information. Ensure your response is accurate and well-reasoned.* If you are unsure of the answer or do not know the answer, say 'I do not know' or 'I am unsure.' Do not fabricate or invent information.", "anti_prompt": "User:", "assistant_name": "Lisa:" } }
Version
llamafile 0.9.0
What operating system are you seeing the problem on?
Linux
Relevant log output
██╗ ██╗ █████╗ ███╗ ███╗ █████╗ ███████╗██╗██╗ ███████╗
██║ ██║ ██╔══██╗████╗ ████║██╔══██╗██╔════╝██║██║ ██╔════╝
██║ ██║ ███████║██╔████╔██║███████║█████╗ ██║██║ █████╗
██║ ██║ ██╔══██║██║╚██╔╝██║██╔══██║██╔══╝ ██║██║ ██╔══╝
███████╗███████╗██║ ██║██║ ╚═╝ ██║██║ ██║██║ ██║███████╗███████╗
╚══════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝╚══════╝
note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
llama_model_loader: loaded meta data with 28 key-value pairs and 147 tensors from Llama-3.2-1B-Instruct.Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.size_label str = 1.2B
llama_model_loader: - kv 3: general.license str = llama3.2
llama_model_loader: - kv 4: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...llama_model_loader: - kv 5: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...llama_model_loader: - kv 6: llama.block_count u32 = 16llama_model_loader: - kv 7: llama.context_length u32 = 131072llama_model_loader: - kv 8: llama.embedding_length u32 = 2048llama_model_loader: - kv 9: llama.feed_forward_length u32 = 8192llama_model_loader: - kv 10: llama.attention.head_count u32 = 32llama_model_loader: - kv 11: llama.attention.head_count_kv u32 = 8llama_model_loader: - kv 12: llama.rope.freq_base f32 = 500000.000000llama_model_loader: - kv 13: llama.attention.layer_norm_rms_epsilon f32 = 0.000010llama_model_loader: - kv 14: llama.attention.key_length u32 = 64llama_model_loader: - kv 15: llama.attention.value_length u32 = 64llama_model_loader: - kv 16: general.file_type u32 = 18llama_model_loader: - kv 17: llama.vocab_size u32 = 128256llama_model_loader: - kv 18: llama.rope.dimension_count u32 = 64llama_model_loader: - kv 19: tokenizer.ggml.model str = gpt2llama_model_loader: - kv 20: tokenizer.ggml.pre str = llama-bpellama_model_loader: - kv 21: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 22: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 23: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...llama_model_loader: - kv 24: tokenizer.ggml.bos_token_id u32 = 128000llama_model_loader: - kv 25: tokenizer.ggml.eos_token_id u32 = 128009llama_model_loader: - kv 26: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...llama_model_loader: - kv 27: general.quantization_version u32 = 2llama_model_loader: - type f32: 34 tensorsllama_model_loader: - type q6_K: 113 tensorsllm_load_vocab: special tokens cache size = 256llm_load_vocab: token to piece cache size = 0.7999 MBllm_load_print_meta: format = GGUF V3 (latest)llm_load_print_meta: arch = llamallm_load_print_meta: vocab type = BPEllm_load_print_meta: n_vocab = 128256llm_load_print_meta: n_merges = 280147llm_load_print_meta: vocab_only = 0llm_load_print_meta: n_ctx_train = 131072llm_load_print_meta: n_embd = 2048llm_load_print_meta: n_layer = 16llm_load_print_meta: n_head = 32llm_load_print_meta: n_head_kv = 8llm_load_print_meta: n_rot = 64llm_load_print_meta: n_swa = 0llm_load_print_meta: n_embd_head_k = 64llm_load_print_meta: n_embd_head_v = 64llm_load_print_meta: n_gqa = 4llm_load_print_meta: n_embd_k_gqa = 512llm_load_print_meta: n_embd_v_gqa = 512llm_load_print_meta: f_norm_eps = 0.0e+00llm_load_print_meta: f_norm_rms_eps = 1.0e-05llm_load_print_meta: f_clamp_kqv = 0.0e+00llm_load_print_meta: f_max_alibi_bias = 0.0e+00llm_load_print_meta: f_logit_scale = 0.0e+00llm_load_print_meta: n_ff = 8192llm_load_print_meta: n_expert = 0llm_load_print_meta: n_expert_used = 0llm_load_print_meta: causal attn = 1llm_load_print_meta: pooling type = 0llm_load_print_meta: rope type = 0llm_load_print_meta: rope scaling = linearllm_load_print_meta: freq_base_train = 500000.0llm_load_print_meta: freq_scale_train = 1llm_load_print_meta: n_ctx_orig_yarn = 131072llm_load_print_meta: rope_finetuned = unknownllm_load_print_meta: ssm_d_conv = 0llm_load_print_meta: ssm_d_inner = 0llm_load_print_meta: ssm_d_state = 0llm_load_print_meta: ssm_dt_rank = 0llm_load_print_meta: model type = ?Bllm_load_print_meta: model ftype = Q6_Kllm_load_print_meta: model params = 1.24 Bllm_load_print_meta: model size = 967.00 MiB (6.56 BPW)llm_load_print_meta: general.name = n/allm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'llm_load_print_meta: EOS token = 128009 '<|eot_id|>'llm_load_print_meta: LF token = 128 'Ä'llm_load_print_meta: EOT token = 128009 '<|eot_id|>'llm_load_print_meta: max token length = 256llm_load_tensors: ggml ctx size = 0.08 MiBllm_load_tensors: CPU buffer size = 967.00 MiB.............................................................INFO [ server_cli] build info | build=1500 commit="a30b324" tid="546162797472" timestamp=1739868432INFO [ server_cli] system info | n_threads=4 n_threads_batch=4 system_info="AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 1 | SVE = 0 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |" tid="546162797472" timestamp=1739868432 total_threads=4llama_new_context_with_model: n_ctx = 8192llama_new_context_with_model: n_batch = 2048llama_new_context_with_model: n_ubatch = 512llama_new_context_with_model: flash_attn = 0llama_new_context_with_model: freq_base = 500000.0llama_new_context_with_model: freq_scale = 1llama_kv_cache_init: CPU KV buffer size = 256.00 MiBllama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiBllama_new_context_with_model: CPU output buffer size = 0.49 MiBllama_new_context_with_model: CPU compute buffer size = 544.01 MiBllama_new_context_with_model: graph nodes = 518llama_new_context_with_model: graph splits = 1INFO [ initialize] initializing slots | n_slots=1 tid="546162797472" timestamp=1739868432INFO [ initialize] new slot | n_ctx_slot=8192 slot_id=0 tid="546162797472" timestamp=1739868432INFO [ server_cli] model loaded | tid="546162797472" timestamp=1739868432llama server listening at http://127.0.0.1:8080software: llamafile 0.9.0model: Llama-3.2-1B-Instruct.Q6_K.ggufINFO [ server_cli] HTTP server listening | hostname="127.0.0.1" port="8080" tid="546162797472" timestamp=1739868432 url_prefix=""compute: Raspberry Pi 4 Model B Rev 1.2server: http://127.0.0.1:8080/llama_new_context_with_model: n_ctx = 8192llama_new_context_with_model: n_batch = 256llama_new_context_with_model: n_ubatch = 256llama_new_context_with_model: flash_attn = 0llama_new_context_with_model: freq_base = 500000.0llama_new_context_with_model: freq_scale = 1VERB [ start_loop] new task may arrive | tid="546162797472" timestamp=1739868432VERB [ start_loop] callback_all_task_finished | tid="546162797472" timestamp=1739868432INFO [ update_slots] updating system prompt | tid="546162797472" timestamp=1739868432system prompt updatedVERB [ start_loop] wait for new task | tid="546162797472" timestamp=1739868433llama_kv_cache_init: CPU KV buffer size = 256.00 MiBllama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiBllama_new_context_with_model: CPU output buffer size = 0.49 MiBllama_new_context_with_model: CPU compute buffer size = 272.00 MiBllama_new_context_with_model: graph nodes = 518llama_new_context_with_model: graph splits = 1A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.>>> what's your name?Nice to meet you! My name is Nova, and I'm an AI assistant here to help answer any questions you may have. I'm a large language model, which means I've been trained on a vast amount of text data to provide accurate and helpful responses. I'm here to assist you with any topics you'd like to discuss, from science and history to entertainment and culture. How can I help you today?
The text was updated successfully, but these errors were encountered:
Contact Details
[email protected]
What happened?
I would like to include my own system prompt file when start the Llamafile.
I have tried: -spf FNAME & --system-prompt-file FNAME both won't work.
How did I verify?
Simply ask assistant's name and it should reply "My name is Lisa".
After start Llamafile, it always said "her name is Nova".
Here is the simple command I issued:
pi@raspberrypi:~/Downloads/llamafile $ ./Llama-3.2-1B-Instruct.Q6_K.llamafile --system-prompt-file ./myPrompt.json --verbose
here is my prompt file content:
{ "system_prompt": { "prompt": "Your name is Lisa. You are a helpful, kind, and honest assistant. You provide short, concise, and useful answers. *Before answering, carefully consider the question and the available information. Ensure your response is accurate and well-reasoned.* If you are unsure of the answer or do not know the answer, say 'I do not know' or 'I am unsure.' Do not fabricate or invent information.", "anti_prompt": "User:", "assistant_name": "Lisa:" } }
Version
llamafile 0.9.0
What operating system are you seeing the problem on?
Linux
Relevant log output
The text was updated successfully, but these errors were encountered: