Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Loading checkpoint shards takes too long #251

Open
irjawais opened this issue May 9, 2024 · 2 comments
Open

Loading checkpoint shards takes too long #251

irjawais opened this issue May 9, 2024 · 2 comments
Assignees

Comments

@irjawais
Copy link

irjawais commented May 9, 2024

When I load "meta-llama/Meta-Llama-3-8B-Instruct" model like this

from transformers import AutoTokenizer, TextStreamer from intel_extension_for_transformers.transformers import AutoModelForCausalLM model_name = "meta-llama/Meta-Llama-3-8B-Instruct" # Hugging Face model_id or local model tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) streamer = TextStreamer(tokenizer) model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)

it got hanged. Then only way is to restart instance to recover it.

Is there any issue in my spec?

my instance spec ubunu 32 GB RAM.

@irjawais
Copy link
Author

irjawais commented May 9, 2024

warnings.warn(
Loading checkpoint shards: 75%|█████████████████████████████████████████████████████████████████████████████████████████████████████████ | 3/4 [01:53<00:37, 37.72s/it]Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.10/dist-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py", line 593, in from_pretrained
model.init( # pylint: disable=E1123
File "/usr/local/lib/python3.10/dist-packages/neural_speed/init.py", line 182, in init
assert os.path.exists(fp32_bin), "Fail to convert pytorch model"
AssertionError: Fail to convert pytorch model

@intellinjun
Copy link
Contributor

@irjawais
Can you check the memory usage when converting the model? From your description, it seems that there may be insufficient memory.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants