Can't use deployed Neuron model with error NRT:nrt_allocate_neuron_cores #9

marina-pchelina · 2024-06-13T19:18:13Z

Hi, I'm following the sample here to try to compile a model to Neuron and deploy on SageMaker.

Following the steps in the sample exactly, I am able to deploy the model, but it when I try to use it I get the 500 error and my CloudWatch traceback shows the following:

I only saw this error when trying to use a second model in the same instance while another one is running, but that should not be the case here.

jluntamazon · 2024-06-14T00:08:55Z

Hi @marina-pchelina

Since NeuronCores are reserved per process, it's possible that you have an old process which is holding onto the NeuronCores but has not been properly terminated. One thing to try is to forcefully stop all running processes: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/training-troubleshooting.html#neuroncore-s-not-available-requested-1-available-0

marina-pchelina · 2024-06-15T00:28:52Z

hi, thanks for getting back to me!
What I don't understand is how can neuron cores be holding on to any old process if a new inference instance is initialized each time I deploy. Anyway, I tried including some commands from the troubleshooting doc and some other I found in the top of my inference.py script like so:

import subprocess

print("Running apt-get install kmod")
print(subprocess.run(['apt-get install kmod'], stdout=subprocess.PIPE, shell=True).stdout.decode('utf-8'))
print("Running lsmod | grep neuron")
print(subprocess.run(['lsmod | grep neuron'], stdout=subprocess.PIPE, shell=True).stdout.decode('utf-8'))
print("Running ps aux | grep python")
print(subprocess.run(['ps aux | grep python'], stdout=subprocess.PIPE, shell=True).stdout.decode('utf-8'))
print("Running neuron-ls")
print(subprocess.run(['neuron-ls'], stdout=subprocess.PIPE, shell=True).stdout.decode('utf-8'))
print("Running modinfo neuron")
print(subprocess.run(['modinfo neuron'], stdout=subprocess.PIPE, shell=True).stdout.decode('utf-8'))

I can see the 2 cores are there with neuron-ls:

No significant python processes that could be using the cores, killing them all explicitly didn't help ether.

However, seems like I'm not able to use lsmod or modinfo, which I'm able to use and get output from from inside an EC2 instance (same inf2) directly. I tried installing them with apt-get install kmod but that didn't help either.

Could that possibly have something to do with image that's used in the tutorial? It's currently this one:
ecr_image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference-neuronx:1.13.1-neuronx-py310-sdk2.13.2-ubuntu20.04"

jluntamazon · 2024-06-17T19:47:27Z

@marina-pchelina, we were able to reproduce the problem in the tutorial and are looking into a fix.

The root cause appears to be 2 separate misconfigurations:

The default number of workers is set to 4. In torchserve each worker is a separate process which must take control of a single NeuronCore. Since the ml.inf2.xlarge instance has only 2 NeuronCores, this is an invalid number of workers. You can observe the configuration in the beginning of the logs: Default workers per model: 4
The default number of NeuronCores per worker appears to be not configured which is causing the first process to attempt to take ownership of all NeuronCores. The Neuron runtime allows each process to take control of as many NeuronCores as it needs. The default behavior is that 1 process takes ownership of all NeuronCores on the instance. When using process-level workers, this means that each process should be configured with the environment variable NEURON_RT_NUM_CORES=1 so that it only takes ownership of a single NeuronCores for the model that it loads. You can see this in the logs because 4 warnings are issued for each model load followed by only 3 nrt_allocate_neuron_cores errors showing that the NeuronCores have already been allocated to another process.

marina-pchelina · 2024-06-19T16:05:26Z

thanks for looking into this!
I tried to re-compile the model with --target inf2 on the off chance it might help configure the num of workers, but it still showed Default workers per model: 4 in the logs.
If it's any help, I can deploy and use models through the HuggingFace integration, the problem with that is, I want to use both cores with DataParallel, which the HuggingFace class doesn't seem to allow to do.
Let me know if there's anything I can do myself to work around that, otherwise, I'll wait for a fix.

marina-pchelina · 2025-01-30T20:22:56Z

@jluntamazon
Looks like errors start happening even earlier now, I can't install neuronx runtime, running the same notebook in the same env (ml.c5.4xlarge with conda_pytorch_p310 kernel).

Both commands, the original
%pip install --upgrade neuronx-cc==2.* torch-neuronx
and one not specifying package versions
%pip install --upgrade neuronx-cc torch-neuronx --no-cache-dir
result in the same error.
Any pointers for successful installation would be much appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't use deployed Neuron model with error NRT:nrt_allocate_neuron_cores #9

Can't use deployed Neuron model with error NRT:nrt_allocate_neuron_cores #9

marina-pchelina commented Jun 13, 2024

jluntamazon commented Jun 14, 2024

marina-pchelina commented Jun 15, 2024

jluntamazon commented Jun 17, 2024

marina-pchelina commented Jun 19, 2024

marina-pchelina commented Jan 30, 2025

Can't use deployed Neuron model with error NRT:nrt_allocate_neuron_cores #9

Can't use deployed Neuron model with error NRT:nrt_allocate_neuron_cores #9

Comments

marina-pchelina commented Jun 13, 2024

jluntamazon commented Jun 14, 2024

marina-pchelina commented Jun 15, 2024

jluntamazon commented Jun 17, 2024

marina-pchelina commented Jun 19, 2024

marina-pchelina commented Jan 30, 2025