-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed load issue and update docs for weight-only quantization with intel-extension-for-transformers #666
Fixed load issue and update docs for weight-only quantization with intel-extension-for-transformers #666
Conversation
Signed-off-by: Cheng, Penghui <[email protected]>
Signed-off-by: Cheng, Penghui <[email protected]>
Signed-off-by: Cheng, Penghui <[email protected]>
@@ -281,6 +289,69 @@ def main(): | |||
) | |||
parser.add_argument("--dataset_name", nargs="?", default="NeelNanda/pile-10k", const="NeelNanda/pile-10k") | |||
parser.add_argument("--calib_iters", default=100, type=int, help="calibration iters.") | |||
parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep this example for post-training as ITREX is currently not a required dependency, what do you think about adding this example directly to https://github.com/intel/intel-extension-for-transformers/tree/main/examples/huggingface/pytorch and add a link to these examples in the README? For example I see https://github.com/intel/intel-extension-for-transformers/blob/main/examples/huggingface/pytorch/text-generation/quantization/run_generation.py
@@ -126,6 +126,33 @@ mpirun -np <number_of_processes> <RUN_CMD> | |||
|
|||
Please refer to INC [documentation](https://github.com/intel/neural-compressor/blob/master/docs/source/tuning_strategies.md#distributed-tuning) and [text-classification](https://github.com/huggingface/optimum-intel/tree/main/examples/neural_compressor/text-classification) example for more details. | |||
|
|||
## Weight-only quantization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's wait for this feature to be more stable before adding it to the documentation (currently not compatible with optimum-intel latest release)
if is_intel_extension_for_transformers_version("!=", INTEL_EXTENSION_FOR_TRANSFORMERS_MINIMUM_VERSION): | ||
if is_intel_extension_for_transformers_version("<", INTEL_EXTENSION_FOR_TRANSFORMERS_MINIMUM_VERSION): | ||
raise ImportError( | ||
f"Found an incompatible version of `intel-extension-for-transformers`. Found version {_intel_extension_for_transformers_version}, " | ||
f"but only version {INTEL_EXTENSION_FOR_TRANSFORMERS_MINIMUM_VERSION} is supported." | ||
f"but only version {INTEL_EXTENSION_FOR_TRANSFORMERS_MINIMUM_VERSION} or higher is supported." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to fix the ITREX version at the moment as it will avoid any undesired impact resulting from potential breaking changes from ITREX (as it was the case for ITREX v1.3.0
-> v1.4.0
)
@@ -297,6 +297,7 @@ def quantize( | |||
) | |||
|
|||
self._quantized_model.quantization_config = quantization_config | |||
self._quantized_model.config.quantization_config = quantization_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep the model configs separated from the quantization config
What does this PR do?
This PR fixed load issue for weight-only quantized model and update documents for weight-only quantization with intel-extension-for-transformers.
This PR dependence on #658.
Before submitting