-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't use deployed Neuron model with error NRT:nrt_allocate_neuron_cores #9
Comments
Since NeuronCores are reserved per process, it's possible that you have an old process which is holding onto the NeuronCores but has not been properly terminated. One thing to try is to forcefully stop all running processes: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/training-troubleshooting.html#neuroncore-s-not-available-requested-1-available-0 |
@marina-pchelina, we were able to reproduce the problem in the tutorial and are looking into a fix. The root cause appears to be 2 separate misconfigurations:
|
thanks for looking into this! |
@jluntamazon Both commands, the original |
Hi, I'm following the sample here to try to compile a model to Neuron and deploy on SageMaker.
Following the steps in the sample exactly, I am able to deploy the model, but it when I try to use it I get the 500 error and my CloudWatch traceback shows the following:
I only saw this error when trying to use a second model in the same instance while another one is running, but that should not be the case here.
The text was updated successfully, but these errors were encountered: