Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor OV weight compression call inside from_pretrained #683

Conversation

nikita-savelyevv
Copy link
Collaborator

@nikita-savelyevv nikita-savelyevv commented Apr 23, 2024

What does this PR do?

Address #618 (comment)

Changes:

  • Make optional save_directory parameter for OVQuantizer.quantize() method
  • Move calibration dataset assembly logic from OVModelForCausalLM to OVQuantizer

In a similar fashion in the future I plan to move SD dataset collection logic to OVQuantizer

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@nikita-savelyevv nikita-savelyevv marked this pull request as ready for review April 25, 2024 10:45
@nikita-savelyevv
Copy link
Collaborator Author

@AlexKoff88 @echarlaix could you please take a look?

@nikita-savelyevv nikita-savelyevv marked this pull request as draft April 25, 2024 16:07
@nikita-savelyevv nikita-savelyevv marked this pull request as ready for review April 25, 2024 20:09
Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @nikita-savelyevv

Comment on lines +335 to +342
from optimum.gptq.data import get_dataset, prepare_dataset

tokenizer = AutoTokenizer.from_pretrained(quantization_config.tokenizer)
nsamples = quantization_config.num_samples if quantization_config.num_samples else 128
calibration_dataset = get_dataset(
quantization_config.dataset, tokenizer, seqlen=32, nsamples=nsamples
)
calibration_dataset = prepare_dataset(calibration_dataset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be done for every OVModel no ?

Copy link
Collaborator Author

@nikita-savelyevv nikita-savelyevv Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular part is for OVModelForCausalLM only. First of all, because GPTQ dataset creation logic is employed which is applicable for LLMs only. Secondly, self.model is required to have prepare_inputs method, which is specific for OVModelForCausalLM.

In theory we could extend this part to other model classes. There is some logic for the SD model class and I plan to migrate it to OVQuantizer in a future PR. There's also get_calibration_dataset method, maybe it should actually go there/be extended to multiple model types. Will need to think about it.

For other model types there is no such logic in the codebase at the moment if I'm not mistaken, so I'm not yet sure about those. Maybe we could add it in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think it makes sense to make it available for other OVModel and to also extend get_calibration_dataset, but this can be done in a following PR !

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we could add a warning that the dataset config argument will be ignored for models that are not an instances of OVModelForCausalLM

Comment on lines -648 to -655
elif config.dataset is not None and isinstance(config.dataset, str):
tokenizer = AutoTokenizer.from_pretrained(config.tokenizer)

from optimum.gptq.data import get_dataset, prepare_dataset

nsamples = config.num_samples if config.num_samples else 128
dataset = get_dataset(config.dataset, tokenizer, seqlen=32, nsamples=nsamples)
dataset = prepare_dataset(dataset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_weight_only_quantization is still used here and here we might need to update this places as well to ensure compatibility

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I did want to do that, but the difference there is that only a raw openvino.runtime.Model is available, not an instance of transformers.PreTrainedModel, and the latter is required to initialize OVQuantizer. We could extend OVQuantizer to accept an instance of openvino.runtime.Model, but that's a rather serious API change.

@echarlaix echarlaix merged commit c235ae1 into huggingface:main Apr 29, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants