initial support for concurency #2

dtrawins · 2024-01-08T07:35:02Z

enable multi concurrency in the execution?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

slyalin · 2024-01-08T12:05:42Z

optimum/intel/openvino/modeling_decoder.py

+        if self.compiled_model is None:
+            super().compile()        


Just call self.compile()

slyalin · 2024-01-08T12:13:20Z

optimum/intel/openvino/modeling_decoder.py

    def forward(
        self,
        input_ids: torch.LongTensor,
+        infer_request: openvino.runtime.InferRequest,


Is it truly backward compatible to add another positional argument to forward that is exposed externally and can be potentially used in some not so trivial examples bypassing generate method? Would it be safer to add it to kwargs instead?

In my opinion, the safest way would be to pass it as keyword-only argument. It is more descriptive and clear approach than kwargs.

slyalin · 2024-01-08T12:23:21Z

tests/openvino/gen_threads.py

+prompt1 = [" The weather is "]
+x = threading.Thread(target=gen_thread, args=(prompt1,))
+x.start()
+prompt2 = [" Openvino is a "]
+y = threading.Thread(target=gen_thread, args=(prompt2,))
+y.start()


Use more stressful test with beam search that would trigger a race on beam_idx model object field update. Or run multiple generates with different batch sizes.

slyalin · 2024-01-08T12:26:39Z

optimum/intel/openvino/modeling_decoder.py

@@ -436,30 +446,29 @@ def forward(
            inputs["beam_idx"] = self.next_beam_idx


The most critical comment: self.next_beam_idx is a shared resource that is updated concurrently from multiple threads. It will lead to incorrect behavior when multiple generates with different batch size or with beam search mode and different prompts, or with any kind of sampling are called. Separate version of next_beam_idx should be created for each generate invocation.

Indeed it is failing with parallel generate with different batch sizes. I'll try passing next_beam_idx also in kwargs along with infer_request.

* Support AWQ models * Add tests * Add dependencies * Fix tests * enable awq export only if ov support it * fix style (#2) * disable awq and gptq install for old torch (#3) * fix style * disable autogptq and autoawq install for old transformers testing * separate common quant models patching and gptq (#4) * disable windows install (huggingface#5) * separate common quant models patching and gptq * disable awq windows * skip logits check for quantized models (huggingface#6) * fix test after rebase * fix testing condition for 2024.6 and unpatch in case if failed * Fix qwen2-vl tests (huggingface#1084) * Skip private mdoel loading test for external contributors (huggingface#1082) * Fix reshaping unet if timestep is 0d tensor (huggingface#1083) * Disable kv cache compression for fp vlm (huggingface#1080) * Support AWQ models * Add tests * Add dependencies * Fix tests * enable awq export only if ov support it * fix style (#2) * disable awq and gptq install for old torch (#3) * fix style * disable autogptq and autoawq install for old transformers testing * separate common quant models patching and gptq (#4) * disable windows install (huggingface#5) * separate common quant models patching and gptq * disable awq windows * skip logits check for quantized models (huggingface#6) * fix test after rebase * fix testing condition for 2024.6 and unpatch in case if failed * add necessary packages in test_openvino_full * fix code style after rebase (huggingface#7) --------- Co-authored-by: eaidova <[email protected]> Co-authored-by: Nikita Savelyev <[email protected]> Co-authored-by: Ella Charlaix <[email protected]>

dtrawins added 2 commits January 8, 2024 08:31

initial support for concurency

272b4bd

simpler interface

8587e5c

slyalin requested changes Jan 8, 2024

View reviewed changes

eaidova force-pushed the ea/stateful branch 2 times, most recently from 50a3f22 to 70d086a Compare January 8, 2024 13:40

dtrawins added 12 commits January 9, 2024 13:22

fixing tests execution

ce0af6a

merge

439d341

Merge remote-tracking branch 'ekaterina/ea/stateful' into concurrency

472019b

drop debug

20b72fe

review changes

3e24565

cleanup

c28b5a5

Merge remote-tracking branch 'ekaterina/ea/stateful' into concurrency

9748ef5

multi batch test

6de48dd

fix unit tests

fd97bb0

Merge remote-tracking branch 'ekaterina/ea/stateful' into concurrency

42b28c5

initial beam_idx

cafb627

concurrency

138f944

eaidova added a commit that referenced this pull request Dec 18, 2024

fix style (#2)

df97004

eaidova added a commit that referenced this pull request Dec 20, 2024

fix style (#2)

da3bd88

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial support for concurency #2

initial support for concurency #2

dtrawins commented Jan 8, 2024

slyalin Jan 8, 2024

slyalin Jan 8, 2024 •

edited

Loading

jiwaszki Jan 8, 2024

slyalin Jan 8, 2024

slyalin Jan 8, 2024 •

edited

Loading

dtrawins Jan 10, 2024

		@@ -436,30 +446,29 @@ def forward(
		inputs["beam_idx"] = self.next_beam_idx

initial support for concurency #2

Are you sure you want to change the base?

initial support for concurency #2

Conversation

dtrawins commented Jan 8, 2024

enable multi concurrency in the execution?

Before submitting

slyalin Jan 8, 2024

Choose a reason for hiding this comment

slyalin Jan 8, 2024 • edited Loading

Choose a reason for hiding this comment

jiwaszki Jan 8, 2024

Choose a reason for hiding this comment

slyalin Jan 8, 2024

Choose a reason for hiding this comment

slyalin Jan 8, 2024 • edited Loading

Choose a reason for hiding this comment

dtrawins Jan 10, 2024

Choose a reason for hiding this comment

slyalin Jan 8, 2024 •

edited

Loading

slyalin Jan 8, 2024 •

edited

Loading