Electriclizard solution3 #23

electriclizard · 2023-05-21T06:39:24Z

Hello!
I've a some kind of hack solution. Every model has it's own text tokenizer and it rans five times before each model. So i've tried to use a single roberta-tokenizer for all models, it puts the data to the device(gpu) only one time and models get the data adress. It has some issues with model answers and this approach needs to be validated on the test dataset. But it works faster on my local tests and gives us an ability to train all models that we need with one tokenizer and get some perfomance growth.

Dev

Async

Single tokenize

rsolovev

Hey @electriclizard, on our tests this version held up to 30 docs/sec throughput on highload_scenario (+5/+3 increase in rps compared to previous iterations), but in the middle of the next stage, CUDA reported OOM, here is a log sample:

...
INFO:     10.147.2.177:47382 - "POST /process HTTP/1.1" 200 OK
INFO:     10.147.2.177:47180 - "POST /process HTTP/1.1" 200 OK
INFO:     10.147.2.177:47388 - "POST /process HTTP/1.1" 200 OK
INFO:     10.147.2.177:47374 - "POST /process HTTP/1.1" 200 OK
INFO:     10.147.2.177:42462 - "POST /process HTTP/1.1" 200 OK
INFO:     10.147.2.177:47166 - "POST /process HTTP/1.1" 200 OK
INFO:     10.147.2.177:42490 - "POST /process HTTP/1.1" 200 OK
INFO:     10.147.2.177:42456 - "POST /process HTTP/1.1" 200 OK
Task exception was never retrieved
future: <Task finished name='Task-3' coro=<PredictionHandler.handle() done, defined at /src/handlers/recognition.py:37> exception=OutOfMemoryError('CUDA out of memory. Tried to allocate 54.00 MiB (GPU 0; 14.76 GiB total capacity; 13.00 GiB already allocated; 28.75 MiB free; 13.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF')>
Traceback (most recent call last):
  File "/src/handlers/recognition.py", line 51, in handle
    outs = model(inputs)
  File "/src/infrastructure/models.py", line 70, in __call__
    logits = self.model(**inputs).logits
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1226, in forward
    outputs = self.roberta(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 854, in forward
    encoder_outputs = self.encoder(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 528, in forward
    layer_outputs = layer_module(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 412, in forward
    self_attention_outputs = self.attention(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 339, in forward
    self_outputs = self.self(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 259, in forward
    attention_scores = attention_scores / math.sqrt(self.attention_head_size)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 54.00 MiB (GPU 0; 14.76 GiB total capacity; 13.00 GiB already allocated; 28.75 MiB free; 13.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Task exception was never retrieved
future: <Task finished name='Task-4' coro=<PredictionHandler.handle() done, defined at /src/handlers/recognition.py:37> exception=OutOfMemoryError('CUDA out of memory. Tried to allocate 54.00 MiB (GPU 0; 14.76 GiB total capacity; 13.00 GiB already allocated; 28.75 MiB free; 13.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF')>

...

electriclizard · 2023-05-22T13:40:21Z

Hmm, it is interesting, i'll try to reproduce the error later, thank you for a log!

… processing

Single tokenize

electriclizard · 2023-05-30T18:50:15Z

Hey @rsolovev i've fixed the out of memmory issue, successfully ran all k6 tests with no failures on my local gpu, so waiting for the new results

darknessest · 2023-05-30T18:54:57Z

Hey there, @electriclizard, could you please check your email inbox and specifically emails from the @inca.digital domain.

electriclizard · 2023-05-31T07:37:53Z

Hey there, @electriclizard, could you please check your email inbox and specifically emails from the @inca.digital domain.

Done!

rsolovev

@electriclizard thank you, here are the results for the latest commit from this branch

electriclizard · 2023-05-31T10:06:28Z

@electriclizard thank you, here are the results for the latest commit from this branch

strange results 🤔, on the local tests it was the most efficient solution, because of one tokenization for all models

may the rps depends on the network speed and stability?
I understand that rps is lower because i made tests in local network, but anyway this solution was more efficient than other previous

rsolovev · 2023-05-31T10:11:12Z

may the rps depends on the network speed and stability?

sure it could, but we're trying to launch tests and solutions isolated from the rest of the infrastructure to minimise these outside variables. let me check that if that was the right commit's tests

rsolovev · 2023-05-31T14:21:25Z

@electriclizard -- here are the results for the restart -- grafana

Artur and others added 12 commits May 14, 2023 00:39

infrastructure layer

b10375a

service layer

558714a

handlers, configuration and transport app layer

5923b45

Containerization

a9f6005

helm chart updates

a9659ca

entrypoint fix

3bae4aa

Merge pull request #13 from electriclizard/dev

8b893f4

Dev

make models to work with batch

2cd6166

async queus for model calls

ba96614

Merge pull request #2 from electriclizard/async

ae6a534

Async

split model and tokenizer

523b8ff

create one more queue for bacth tokenise task

9f2b047

electriclizard requested review from darknessest and rsolovev as code owners May 21, 2023 06:39

electriclizard closed this May 21, 2023

Merge pull request #22 from electriclizard/single-tokenize

d9c6bbe

Single tokenize

electriclizard reopened this May 21, 2023

rsolovev reviewed May 22, 2023

View reviewed changes

Artur and others added 4 commits May 30, 2023 23:45

fix imports

a4fdbd5

fix Cuda out of memmory error, add a batch size calculating and batch…

b83dc8a

… processing

fix twice model init

f054aa4

Merge pull request #27 from electriclizard/single-tokenize

c43123a

Single tokenize

rsolovev reviewed May 31, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Electriclizard solution3 #23

Electriclizard solution3 #23

electriclizard commented May 21, 2023

rsolovev left a comment

electriclizard commented May 22, 2023

electriclizard commented May 30, 2023

darknessest commented May 30, 2023

electriclizard commented May 31, 2023

rsolovev left a comment

electriclizard commented May 31, 2023 •

edited

Loading

rsolovev commented May 31, 2023

rsolovev commented May 31, 2023

Electriclizard solution3 #23

Are you sure you want to change the base?

Electriclizard solution3 #23

Conversation

electriclizard commented May 21, 2023

rsolovev left a comment

Choose a reason for hiding this comment

electriclizard commented May 22, 2023

electriclizard commented May 30, 2023

darknessest commented May 30, 2023

electriclizard commented May 31, 2023

rsolovev left a comment

Choose a reason for hiding this comment

electriclizard commented May 31, 2023 • edited Loading

rsolovev commented May 31, 2023

rsolovev commented May 31, 2023

electriclizard commented May 31, 2023 •

edited

Loading