You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.
from transformers import AutoTokenizer, TextStreamer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM
model_name = "Intel/neural-chat-7b-v3-1" # Hugging Face model_id or local model
prompt = "Once upon a time, there existed a little girl,"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids
streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)
print(outputs)
yields
2024-03-27 02:12:43 [INFO] Using Neural Speed...
2024-03-27 02:12:43 [INFO] cpu device is used.
2024-03-27 02:12:43 [INFO] Applying Weight Only Quantization.
2024-03-27 02:12:43 [INFO] Using LLM runtime.
cmd: ['python', PosixPath('/usr/local/lib/python3.10/dist-packages/neural_speed/convert/convert_mistral.py'), '--outfile', 'runtime_outs/ne_mistral_f32.bin', '--outtype', 'f32', 'Intel/neural-chat-7b-v3-1']
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
[<ipython-input-17-40dcb74a8701>](https://localhost:8080/#) in <cell line: 10>()
8 streamer = TextStreamer(tokenizer)
9
---> 10 model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
11 outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)
12 print(outputs)
1 frames
[/usr/local/lib/python3.10/dist-packages/neural_speed/__init__.py](https://localhost:8080/#) in init(self, model_name, use_quant, use_gptq, use_awq, use_autoround, weight_dtype, alg, group_size, scale_dtype, compute_dtype, use_ggml)
129 if not os.path.exists(fp32_bin):
130 convert_model(model_name, fp32_bin, "f32")
--> 131 assert os.path.exists(fp32_bin), "Fail to convert pytorch model"
132
133 if not use_quant:
AssertionError: Fail to convert pytorch model
The text was updated successfully, but these errors were encountered:
Hi, this issue seems to have the same reason as #193. pip install neural_speed won't install all packages from requirements.txt and we are trying to fix it now. You can use pip install -r requirements.txt for a quick fix. Thanks.
this is using the example code only
yields
The text was updated successfully, but these errors were encountered: