Refactor OV weight compression call inside from_pretrained #683

nikita-savelyevv · 2024-04-23T13:37:33Z

What does this PR do?

Address #618 (comment)

Changes:

Make optional save_directory parameter for OVQuantizer.quantize() method
Move calibration dataset assembly logic from OVModelForCausalLM to OVQuantizer

In a similar fashion in the future I plan to move SD dataset collection logic to OVQuantizer

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-04-23T13:42:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

nikita-savelyevv · 2024-04-25T10:46:00Z

@AlexKoff88 @echarlaix could you please take a look?

echarlaix

Looks great, thanks @nikita-savelyevv

echarlaix · 2024-04-25T21:33:53Z

optimum/intel/openvino/quantization.py

+                    from optimum.gptq.data import get_dataset, prepare_dataset
+
+                    tokenizer = AutoTokenizer.from_pretrained(quantization_config.tokenizer)
+                    nsamples = quantization_config.num_samples if quantization_config.num_samples else 128
+                    calibration_dataset = get_dataset(
+                        quantization_config.dataset, tokenizer, seqlen=32, nsamples=nsamples
+                    )
+                    calibration_dataset = prepare_dataset(calibration_dataset)


this should be done for every OVModel no ?

This particular part is for OVModelForCausalLM only. First of all, because GPTQ dataset creation logic is employed which is applicable for LLMs only. Secondly, self.model is required to have prepare_inputs method, which is specific for OVModelForCausalLM.

In theory we could extend this part to other model classes. There is some logic for the SD model class and I plan to migrate it to OVQuantizer in a future PR. There's also get_calibration_dataset method, maybe it should actually go there/be extended to multiple model types. Will need to think about it.

For other model types there is no such logic in the codebase at the moment if I'm not mistaken, so I'm not yet sure about those. Maybe we could add it in the future.

yes I think it makes sense to make it available for other OVModel and to also extend get_calibration_dataset, but this can be done in a following PR !

also we could add a warning that the dataset config argument will be ignored for models that are not an instances of OVModelForCausalLM

echarlaix · 2024-04-25T21:38:05Z

optimum/intel/openvino/quantization.py

-    elif config.dataset is not None and isinstance(config.dataset, str):
-        tokenizer = AutoTokenizer.from_pretrained(config.tokenizer)
-
-        from optimum.gptq.data import get_dataset, prepare_dataset
-
-        nsamples = config.num_samples if config.num_samples else 128
-        dataset = get_dataset(config.dataset, tokenizer, seqlen=32, nsamples=nsamples)
-        dataset = prepare_dataset(dataset)


_weight_only_quantization is still used here and here we might need to update this places as well to ensure compatibility

Yep, I did want to do that, but the difference there is that only a raw openvino.runtime.Model is available, not an instance of transformers.PreTrainedModel, and the latter is required to initialize OVQuantizer. We could extend OVQuantizer to accept an instance of openvino.runtime.Model, but that's a rather serious API change.

nikita-savelyevv added 3 commits April 22, 2024 19:14

Move calibration dataset construction to WC function

d78950f

Tweak tokenizer

99471b2

Removed not used import

b986830

nikita-savelyevv added 3 commits April 23, 2024 15:52

ruff

cdbedb4

ruff 2

fa4065f

Refactor through OVQuantizer call

ea3f211

nikita-savelyevv marked this pull request as ready for review April 25, 2024 10:45

Merge branch 'main' into refactor-compression-from-pretrained

d1f4149

nikita-savelyevv marked this pull request as draft April 25, 2024 16:07

Merge branch 'main' into refactor-compression-from-pretrained

c35cd20

nikita-savelyevv marked this pull request as ready for review April 25, 2024 20:09

echarlaix approved these changes Apr 25, 2024

View reviewed changes

AlexKoff88 approved these changes Apr 29, 2024

View reviewed changes

echarlaix merged commit c235ae1 into huggingface:main Apr 29, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor OV weight compression call inside from_pretrained #683

Refactor OV weight compression call inside from_pretrained #683

nikita-savelyevv commented Apr 23, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 23, 2024

nikita-savelyevv commented Apr 25, 2024

echarlaix left a comment

echarlaix Apr 25, 2024

nikita-savelyevv Apr 26, 2024 •

edited

Loading

echarlaix Apr 29, 2024

echarlaix Apr 29, 2024

echarlaix Apr 25, 2024

nikita-savelyevv Apr 26, 2024

Refactor OV weight compression call inside from_pretrained #683

Refactor OV weight compression call inside from_pretrained #683

Conversation

nikita-savelyevv commented Apr 23, 2024 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Apr 23, 2024

nikita-savelyevv commented Apr 25, 2024

echarlaix left a comment

Choose a reason for hiding this comment

echarlaix Apr 25, 2024

Choose a reason for hiding this comment

nikita-savelyevv Apr 26, 2024 • edited Loading

Choose a reason for hiding this comment

echarlaix Apr 29, 2024

Choose a reason for hiding this comment

echarlaix Apr 29, 2024

Choose a reason for hiding this comment

echarlaix Apr 25, 2024

Choose a reason for hiding this comment

nikita-savelyevv Apr 26, 2024

Choose a reason for hiding this comment

nikita-savelyevv commented Apr 23, 2024 •

edited

Loading

nikita-savelyevv Apr 26, 2024 •

edited

Loading