Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Llama 2 from Embeddings #72

Open
liechtym opened this issue Jan 8, 2024 · 5 comments
Open

Generate Llama 2 from Embeddings #72

liechtym opened this issue Jan 8, 2024 · 5 comments

Comments

@liechtym
Copy link

liechtym commented Jan 8, 2024

Compiling and loading Llama 2 in Neuron is working great for me on a inf2.8xlarge with the new release 2.16.

However, I have a unique use case where I need to be able to input embeddings directly into Llama 2 instead of token ids. I need to be able to generate the embeddings, modify the embeddings, and then use the embeddings for generation. I was already able to generate the embeddings separately via llama_model.chkpt_model.model.embed_tokens(token_ids). However, I'm not seeing a way to plug those embeddings into the model once I've modified them.

It seems to me that LlamaForSampling.sample() (from transformers_neuronx.llama.model) probably can't do this (correct me if I'm wrong). I got TypeError: sample() got an unexpected keyword argument 'inputs_embeds' when I tried.

So, I tried using the HuggingFaceGenerationModelAdapter from transformers_neuronx.generation_utils to enable using the generation API as was performed on this GP2 example. However, there was an error that prevented that, which I filed an issue for in the tranfomers repo.

What is the best way to go about doing this? I really appreciate your help.

@liechtym
Copy link
Author

liechtym commented Jan 10, 2024

In transformers repo they said the HuggingFaceGenerationModelAdapter incompatibility error is probably stemming from the tranfomers-neuronx wrapper. Any help with this?

Here is the error:

Traceback (most recent call last):
  File "modular.py", line 107, in <module>
    chatbot = MiniGPT4LLama2Chatbot(cfg_path, gpu_id)
  File "modular.py", line 62, in __init__
    self.model = model_cls.from_config(model_config)
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt4.py", line 173, in from_config
    model = cls(
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt4.py", line 45, in __init__
    super().__init__(
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt_base.py", line 43, in __init__
    self.llama_model, self.llama_tokenizer = self.init_llm(
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/base_model.py", line 202, in init_llm
    llama_model = HuggingFaceGenerationModelAdapter(llama_model_cpu.config, llama_model_neuron)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/generation_utils.py", line 18, in __init__
    super().__init__(config)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1190, in __init__
    config = self._autoset_attn_implementation(
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1311, in _autoset_attn_implementation
    config = cls._check_and_enable_sdpa(
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1464, in _check_and_enable_sdpa
    raise ValueError(
ValueError: HuggingFaceGenerationModelAdapter does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new

See more details on the issue page: huggingface/transformers#28396.

Of course my general goal is to simply get this working with input embeddings so if this is not the right route, let me know.

@shebbur-aws
Copy link

Hi @liechtym , We do not have support for external embeddings. One way you could potentially get around this is by replacing the model embedding weights directly. Please let us know if that helps.

@liechtym
Copy link
Author

@shebbur-aws Thanks for your reply. A workaround is totally fine for me. Would you be able to give a quick explanation/example for how to replace the embedding weights and run the forward pass on the rest of the model?

@liechtym
Copy link
Author

Could I get help on this @shebbur-aws ?

@davidshtian
Copy link

@liechtym @shebbur-aws Hi~ I've got the same situation here, do you have any resolution or workaround on this? Input embeds as model input parameter instead of input ids. Thanks~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants