Potential memory leak #156

novak2000 · 2024-02-11T15:32:00Z

System Info

I'm running a docker container to run BAAI rerank-base model on a local PC with RTX 4090 and intel i9-13900KF, 64GB RAM

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

After calling '/rerank' request many times (around 400,000 times with 5000 texts each) RAM memory usage increases significantly(from 6GB to 42+GB).
Memory usage before:

after:

Expected behavior

Is this behavior expected? Since I'm unfamiliar with rust and its basic concepts, any feedback would be helpful.
Thanks!

karan00713 · 2024-02-15T21:03:15Z

@novak2000 i'm too having this issue, I tried on my laptop in CPU when i try to use Embed4all or SentenceTransformer, both had huge increase in memory after each request, kindly let me know if you found any solutions

OlivierDehaene · 2024-02-22T10:19:08Z

It seems its linked to an issue with Hyper: hyperium/hyper#1790
#161 solves the issue by using another memory allocator.

A-Posthuman · 2024-03-06T22:48:31Z

I seem to still be running into this issue of steadily growing mem usage of the text-embeddings-router process. This is with TEI 1.1.0, but I also tested 1.0.0 with same results.

Running with docker:

docker run --name tei --gpus all -e CUDA_MEMORY_FRACTION=1.0 -p 8081:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1.0 --model-id $model --tokenization-workers 4 --max-batch-tokens 131072 --max-batch-requests 1024 --pooling cls

model is: BAAI/bge-small-en-v1.5

OlivierDehaene · 2024-03-07T10:54:36Z

Do you have a graph of the memory increase? And if you have v1.0.0 vs v1.1.0 that would be amazing.

A-Posthuman · 2024-03-07T15:06:28Z

I don't have a pretty graph, but here are 3 ps outputs over the past 24 hrs, the first one is from just after starting the docker image, the 2nd is from not long after, and the 3rd is from a minute ago where you can see the mem percentage has grown to 8.2% of the server's ram, from the first output's 3.6%

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     3341091 12.0  3.6 54668960 583220 ?     Ssl  18:48   0:01 text-embeddings-router --model-id BAAI/bge-small-en-v1.5 --tokenization-workers 4 --max-batch-tokens 131072 --max-batch-requests 1024 --pooling cls

root     3341091  2.5  3.9 54762532 638616 ?     Ssl  18:48   0:51 text-embeddings-router --model-id BAAI/bge-small-en-v1.5 --tokenization-workers 4 --max-batch-tokens 131072 --max-batch-requests 1024 --pooling cls

root     3341091 65.2  8.2 55811112 1338148 ?    Ssl  Mar06 594:48 text-embeddings-router --model-id BAAI/bge-small-en-v1.5 --tokenization-workers 4 --max-batch-tokens 131072 --max-batch-requests 1024 --pooling cls

hiepxanh · 2024-03-07T16:15:05Z

I'm embedding with default config with 1 milion vector without any issue, maybe worker cause leak?

A-Posthuman · 2024-03-07T16:25:11Z

BTW I forgot to mention regarding 1.0.0 vs 1.1.0, I tried both and performance seemed similar in regards to the mem use growing.

The worker/client program in my case is on a separate server, and the embedding throughput it's inferencing is in the range of 5 to 10 million requests to the TEI server per 24 hrs.

OlivierDehaene · 2024-03-07T17:03:22Z

Ok I will keep this in my prio list but it must be very deep in the stack and might take some time to find.

The worker/client program in my case is on a separate server, and the embedding throughput it's inferencing is in the range of 5 to 10 million requests to the TEI server per 24 hrs.

That's great :) It's always nice to hear that the project is running in prod with some real throughput requirements.

A-Posthuman · 2024-03-07T19:14:19Z

Ok if you need any other details let me know. The instance is on AWS, a g5.xlarge (1 nvidia A10G gpu), using the AMI:

Deep Learning AMI GPU PyTorch 2.1.0 (Ubuntu 20.04) 20231103
id: ami-0ac1f653c5b6af751

the gpu is being shared, 90% of it is a separate vllm text generation server, the other 10% gets used by TEI.

novak2000 · 2024-03-07T20:41:51Z

Just to mention that I'm also running into the same issue again.
I'm using version 1.0

OlivierDehaene · 2024-03-10T11:20:38Z

@novak2000 can you use 1.1 and keep the memory resources limit? I'm wondering if the container will be killed or not on 1.1.

novak2000 · 2024-03-11T21:41:54Z

I'm sending you docker stats before and after running a simple test, with around 25k requests to the server(each of the requests has between 100 and 1000 texts to embed and ~1000 texts to rerank)

models used:
reranker: BAAI/bge-reranker-base
embedding: sentence-transformers/multi-qa-MiniLM-L6-cos-v1

before:

after ~10k requests(it looked like they were working stable just beneath the memory limit):

after ~20k requests, the embedding server got killed and restarted on failure:

Let me know if you need more details

novak2000 · 2024-03-11T23:14:17Z

I ran the tests again, and this time both services were killed

graph of memory consumption:

OlivierDehaene · 2024-03-12T12:22:45Z

Ok thanks for this info.
I'm more or less off this week so I will keep digging when I find the time.

djanito · 2024-06-21T10:05:05Z

Any news on this ? I'm running into the same issue and it's not usable in production.

OlivierDehaene · 2024-06-27T09:20:35Z

Yes it seems that there was a leak in one of our dependency. This is othogonal to the problem of allocator reported above.
We updated the depency and added a logic to trim os pages in #307.

See: https://www.algolia.com/blog/engineering/when-allocators-are-hoarding-your-precious-memory/ for more info on the subject.

I will release 1.3 with this PR today. Will you be able to test it and report if the problem is indeed fixed?

djanito · 2024-06-28T08:14:14Z

I can try it today if you want but I don't see the 1.3 release for the moment.

OlivierDehaene · 2024-06-28T11:41:11Z

It's released now.

OlivierDehaene · 2024-07-08T10:19:13Z

@djanito, were you able to try it out?

OlivierDehaene mentioned this issue Feb 22, 2024

fix: use mimalloc to solve memory "leak" #161

Merged

OlivierDehaene closed this as completed in #161 Feb 22, 2024

OlivierDehaene reopened this Mar 7, 2024

OlivierDehaene mentioned this issue Jun 27, 2024

fix: use malloc_trim to cleanup pages #307

Merged

mshakirDr mentioned this issue Aug 10, 2024

[QUESTION] Is there a memory leak in huggingface embedding with pipeline mode zylon-ai/private-gpt#2054

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential memory leak #156

Potential memory leak #156

novak2000 commented Feb 11, 2024

karan00713 commented Feb 15, 2024 •

edited

Loading

OlivierDehaene commented Feb 22, 2024

A-Posthuman commented Mar 6, 2024

OlivierDehaene commented Mar 7, 2024

A-Posthuman commented Mar 7, 2024

hiepxanh commented Mar 7, 2024

A-Posthuman commented Mar 7, 2024

OlivierDehaene commented Mar 7, 2024

A-Posthuman commented Mar 7, 2024

novak2000 commented Mar 7, 2024

OlivierDehaene commented Mar 10, 2024

novak2000 commented Mar 11, 2024

novak2000 commented Mar 11, 2024

OlivierDehaene commented Mar 12, 2024

djanito commented Jun 21, 2024

OlivierDehaene commented Jun 27, 2024

djanito commented Jun 28, 2024

OlivierDehaene commented Jun 28, 2024

OlivierDehaene commented Jul 8, 2024

Potential memory leak #156

Potential memory leak #156

Comments

novak2000 commented Feb 11, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

karan00713 commented Feb 15, 2024 • edited Loading

OlivierDehaene commented Feb 22, 2024

A-Posthuman commented Mar 6, 2024

OlivierDehaene commented Mar 7, 2024

A-Posthuman commented Mar 7, 2024

hiepxanh commented Mar 7, 2024

A-Posthuman commented Mar 7, 2024

OlivierDehaene commented Mar 7, 2024

A-Posthuman commented Mar 7, 2024

novak2000 commented Mar 7, 2024

OlivierDehaene commented Mar 10, 2024

novak2000 commented Mar 11, 2024

novak2000 commented Mar 11, 2024

OlivierDehaene commented Mar 12, 2024

djanito commented Jun 21, 2024

OlivierDehaene commented Jun 27, 2024

djanito commented Jun 28, 2024

OlivierDehaene commented Jun 28, 2024

OlivierDehaene commented Jul 8, 2024

karan00713 commented Feb 15, 2024 •

edited

Loading