Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed load issue and update docs for weight-only quantization with intel-extension-for-transformers #666

Conversation

PenghuiCheng
Copy link
Contributor

What does this PR do?

This PR fixed load issue for weight-only quantized model and update documents for weight-only quantization with intel-extension-for-transformers.
This PR dependence on #658.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Signed-off-by: Cheng, Penghui <[email protected]>
Signed-off-by: Cheng, Penghui <[email protected]>
@@ -281,6 +289,69 @@ def main():
)
parser.add_argument("--dataset_name", nargs="?", default="NeelNanda/pile-10k", const="NeelNanda/pile-10k")
parser.add_argument("--calib_iters", default=100, type=int, help="calibration iters.")
parser.add_argument(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep this example for post-training as ITREX is currently not a required dependency, what do you think about adding this example directly to https://github.com/intel/intel-extension-for-transformers/tree/main/examples/huggingface/pytorch and add a link to these examples in the README? For example I see https://github.com/intel/intel-extension-for-transformers/blob/main/examples/huggingface/pytorch/text-generation/quantization/run_generation.py

@@ -126,6 +126,33 @@ mpirun -np <number_of_processes> <RUN_CMD>

Please refer to INC [documentation](https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#distributed-tuning) and [text-classification](https://github.com/huggingface/optimum-intel/tree/main/examples/neural_compressor/text-classification) example for more details.

## Weight-only quantization
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait for this feature to be more stable before adding it to the documentation (currently not compatible with optimum-intel latest release)

Comment on lines -78 to +81
if is_intel_extension_for_transformers_version("!=", INTEL_EXTENSION_FOR_TRANSFORMERS_MINIMUM_VERSION):
if is_intel_extension_for_transformers_version("<", INTEL_EXTENSION_FOR_TRANSFORMERS_MINIMUM_VERSION):
raise ImportError(
f"Found an incompatible version of `intel-extension-for-transformers`. Found version {_intel_extension_for_transformers_version}, "
f"but only version {INTEL_EXTENSION_FOR_TRANSFORMERS_MINIMUM_VERSION} is supported."
f"but only version {INTEL_EXTENSION_FOR_TRANSFORMERS_MINIMUM_VERSION} or higher is supported."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to fix the ITREX version at the moment as it will avoid any undesired impact resulting from potential breaking changes from ITREX (as it was the case for ITREX v1.3.0 -> v1.4.0)

@@ -297,6 +297,7 @@ def quantize(
)

self._quantized_model.quantization_config = quantization_config
self._quantized_model.config.quantization_config = quantization_config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the model configs separated from the quantization config

@echarlaix echarlaix deleted the branch huggingface:update-itrex April 18, 2024 08:16
@echarlaix echarlaix closed this Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants