Managed Deployment Class returns 500 error #39772

BillmanH · 2025-02-18T00:13:03Z

Package Name: azure.ai.ml
Package Version: 1.17.1
Operating System: WIN
python 3.12.4:

The error is cryptic, as per conventions.
Error:

{
    "name": "HttpResponseError",
    "message": "(ServiceError) Received 500 from a service request
Code: ServiceError
Message: Received 500 from a service request
Target: POST https://inference-deployment-api.inference-deployment-api.svc/inferencedeployment/subscriptions/e1528a7a-9681-47d5-8fbe-cc0850266856/resourceGroups/mloppsexample/providers/Microsoft.MachineLearningServices/workspaces/testmlinstance/endpoints/endpt-moe-7362/deployments/openapi/v2?api-version=2021-10-01&validateOnly=False
Exception Details:	(InternalServerError) 
    Code: InternalServerError
    Message: 
Additional Information:Type: ComponentName
Info: {
    "value": "managementfrontend"
}Type: Correlation
Info: {
    "value": {
        "operation": "e8088be0c84fc407cdf44d797d0794a4",
        "request": "db5604699c11951b"
    }
}Type: Environment
Info: {
    "value": "centralus"
}Type: Location
Info: {
    "value": "centralus"
}Type: Time
Info: {
    "value": "2025-02-17T23:34:07.5387801+00:00"
}",
    "stack": "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m
\u001b[1;31mHttpResponseError\u001b[0m                         Traceback (most recent call last)
File \u001b[1;32mc:\\Users\\william.harding\\repos\\learn-azureml\\register_API_endpoint.py:16\u001b[0m
\u001b[0;32m      1\u001b[0m deployment \u001b[39m=\u001b[39m ManagedOnlineDeployment(
\u001b[0;32m      2\u001b[0m     name\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mopenapi\u001b[39m\u001b[39m\"\u001b[39m,
\u001b[0;32m      3\u001b[0m     endpoint_name\u001b[39m=\u001b[39mendpoint_name,
\u001b[1;32m   (...)\u001b[0m
\u001b[0;32m     13\u001b[0m     instance_count\u001b[39m=\u001b[39m\u001b[39m1\u001b[39m,
\u001b[0;32m     14\u001b[0m )
\u001b[1;32m---> 16\u001b[0m deployment \u001b[39m=\u001b[39m ml_client\u001b[39m.\u001b[39monline_deployments\u001b[39m.\u001b[39mbegin_create_or_update(deployment)\u001b[39m.\u001b[39mresult()

File \u001b[1;32mc:\\Users\\william.harding\\AppData\\Local\\miniconda3\\envs\\ailabs\\Lib\\site-packages\\azure\\core\\tracing\\decorator.py:94\u001b[0m, in \u001b[0;36mdistributed_trace.<locals>.decorator.<locals>.wrapper_use_tracer\u001b[1;34m(*args, **kwargs)\u001b[0m
\u001b[0;32m     92\u001b[0m span_impl_type \u001b[39m=\u001b[39m settings\u001b[39m.\u001b[39mtracing_implementation()
\u001b[0;32m     93\u001b[0m \u001b[39mif\u001b[39;00m span_impl_type \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m:
\u001b[1;32m---> 94\u001b[0m     \u001b[39mreturn\u001b[39;00m func(\u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)
\u001b[0;32m     96\u001b[0m \u001b[39m# Merge span is parameter is set, but only if no explicit parent are passed\u001b[39;00m
\u001b[0;32m     97\u001b[0m \u001b[39mif\u001b[39;00m merge_span \u001b[39mand\u001b[39;00m \u001b[39mnot\u001b[39;00m passed_in_parent:

File \u001b[1;32mc:\\Users\\william.harding\\AppData\\Local\\miniconda3\\envs\\ailabs\\Lib\\site-packages\\azure\\ai\\ml\\_telemetry\\activity.py:289\u001b[0m, in \u001b[0;36mmonitor_with_activity.<locals>.monitor.<locals>.wrapper\u001b[1;34m(*args, **kwargs)\u001b[0m
\u001b[0;32m    285\u001b[0m     \u001b[39mwith\u001b[39m tracer\u001b[39m.\u001b[39mspan():
\u001b[0;32m    286\u001b[0m         \u001b[39mwith\u001b[39;00m log_activity(
\u001b[0;32m    287\u001b[0m             logger\u001b[39m.\u001b[39mpackage_logger, activity_name \u001b[39mor\u001b[39;00m f\u001b[39m.\u001b[39m\u001b[39m__name__\u001b[39m, activity_type, custom_dimensions
\u001b[0;32m    288\u001b[0m         ):
\u001b[1;32m--> 289\u001b[0m             \u001b[39mreturn\u001b[39;00m f(\u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)
\u001b[0;32m    290\u001b[0m \u001b[39melif\u001b[39;00m \u001b[39mhasattr\u001b[39m(logger, \u001b[39m\"\u001b[39m\u001b[39mpackage_logger\u001b[39m\u001b[39m\"\u001b[39m):
\u001b[0;32m    291\u001b[0m     \u001b[39mwith\u001b[39;00m log_activity(logger\u001b[39m.\u001b[39mpackage_logger, activity_name \u001b[39mor\u001b[39;00m f\u001b[39m.\u001b[39m\u001b[39m__name__\u001b[39m, activity_type, custom_dimensions):

File \u001b[1;32mc:\\Users\\william.harding\\AppData\\Local\\miniconda3\\envs\\ailabs\\Lib\\site-packages\\azure\\ai\\ml\\operations\\_online_deployment_operations.py:218\u001b[0m, in \u001b[0;36mOnlineDeploymentOperations.begin_create_or_update\u001b[1;34m(self, deployment, local, vscode_debug, skip_script_validation, local_enable_gpu, **kwargs)\u001b[0m
\u001b[0;32m    216\u001b[0m     log_and_raise_error(ex)
\u001b[0;32m    217\u001b[0m \u001b[39melse\u001b[39;00m:
\u001b[1;32m--> 218\u001b[0m     \u001b[39mraise\u001b[39;00m ex

File \u001b[1;32mc:\\Users\\william.harding\\AppData\\Local\\miniconda3\\envs\\ailabs\\Lib\\site-packages\\azure\\ai\\ml\\operations\\_online_deployment_operations.py:213\u001b[0m, in \u001b[0;36mOnlineDeploymentOperations.begin_create_or_update\u001b[1;34m(self, deployment, local, vscode_debug, skip_script_validation, local_enable_gpu, **kwargs)\u001b[0m
\u001b[0;32m    211\u001b[0m         \u001b[39mreturn\u001b[39;00m poller
\u001b[0;32m    212\u001b[0m     \u001b[39mexcept\u001b[39;00m \u001b[39mException\u001b[39;00m \u001b[39mas\u001b[39;00m ex:
\u001b[1;32m--> 213\u001b[0m         \u001b[39mraise\u001b[39;00m ex
\u001b[0;32m    214\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mException\u001b[39;00m \u001b[39mas\u001b[39;00m ex:  \u001b[39m# pylint: disable=W0718\u001b[39;00m
\u001b[0;32m    215\u001b[0m     \u001b[39mif\u001b[39;00m \u001b[39misinstance\u001b[39m(ex, (ValidationException, SchemaValidationError)):

File \u001b[1;32mc:\\Users\\william.harding\\AppData\\Local\\miniconda3\\envs\\ailabs\\Lib\\site-packages\\azure\\ai\\ml\\operations\\_online_deployment_operations.py:196\u001b[0m, in \u001b[0;36mOnlineDeploymentOperations.begin_create_or_update\u001b[1;34m(self, deployment, local, vscode_debug, skip_script_validation, local_enable_gpu, **kwargs)\u001b[0m
\u001b[0;32m    192\u001b[0m         module_logger\u001b[39m.\u001b[39minfo(\u001b[39m\"\u001b[39m\u001b[39m\\n\u001b[39;00m\u001b[39mStarting deployment\u001b[39m\u001b[39m\"\u001b[39m)
\u001b[0;32m    194\u001b[0m     deployment_rest \u001b[39m=\u001b[39m deployment\u001b[39m.\u001b[39m_to_rest_object(location\u001b[39m=\u001b[39mlocation)  \u001b[39m# type: ignore\u001b[39;00m
\u001b[1;32m--> 196\u001b[0m     poller \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_online_deployment\u001b[39m.\u001b[39mbegin_create_or_update(
\u001b[0;32m    197\u001b[0m         resource_group_name\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_resource_group_name,
\u001b[0;32m    198\u001b[0m         workspace_name\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_workspace_name,
\u001b[0;32m    199\u001b[0m         endpoint_name\u001b[39m=\u001b[39mdeployment\u001b[39m.\u001b[39mendpoint_name,
\u001b[0;32m    200\u001b[0m         deployment_name\u001b[39m=\u001b[39mdeployment\u001b[39m.\u001b[39mname,
\u001b[0;32m    201\u001b[0m         body\u001b[39m=\u001b[39mdeployment_rest,
\u001b[0;32m    202\u001b[0m         polling\u001b[39m=\u001b[39mAzureMLPolling(
\u001b[0;32m    203\u001b[0m             LROConfigurations\u001b[39m.\u001b[39mPOLL_INTERVAL,
\u001b[0;32m    204\u001b[0m             path_format_arguments\u001b[39m=\u001b[39mpath_format_arguments,
\u001b[0;32m    205\u001b[0m             \u001b[39m*\u001b[39m\u001b[39m*\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_init_kwargs,
\u001b[0;32m    206\u001b[0m         ),
\u001b[0;32m    207\u001b[0m         polling_interval\u001b[39m=\u001b[39mLROConfigurations\u001b[39m.\u001b[39mPOLL_INTERVAL,
\u001b[0;32m    208\u001b[0m         \u001b[39m*\u001b[39m\u001b[39m*\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_init_kwargs,
\u001b[0;32m    209\u001b[0m         \u001b[39mcls\u001b[39m\u001b[39m=\u001b[39m\u001b[39mlambda\u001b[39;00m response, deserialized, headers: OnlineDeployment\u001b[39m.\u001b[39m_from_rest_object(deserialized),
\u001b[0;32m    210\u001b[0m     )
\u001b[0;32m    211\u001b[0m     \u001b[39mreturn\u001b[39;00m poller
\u001b[0;32m    212\u001b[0m \u001b[39mexcept\u001b[39;00m

Python function:

deployment = ManagedOnlineDeployment(
    name="openapi",
    endpoint_name=endpoint_name,
    model="anomaly_detection_model:1", 
    code_configuration=CodeConfiguration(
        code="./", scoring_script="score.py"
    ),
    environment=Environment(
        image="mcr.microsoft.com/azureml/minimal-ubuntu22.04-py39-cpu-inference",
        conda_file="env.yaml",
    ),
    instance_type="Standard_DS3_v2", 
    instance_count=1,
)

deployment = ml_client.online_deployments.begin_create_or_update(deployment).result()

Is this a valid deployment process? Is there something missing?
Can you help to give us an example script that would work?

The text was updated successfully, but these errors were encountered:

BillmanH · 2025-02-18T00:13:40Z

Full process:


endpoint_name = f"endpt-moe-{random.randint(0,10000)}"

credential = DefaultAzureCredential()
ml_client = MLClient(
    credential,
    subscription_id=subscription_id,
    resource_group_name=resource_group,
    workspace_name=workspace_name,
)

endpoint = ManagedOnlineEndpoint(
    public_network_access="enabled",
    name = endpoint_name, 
    description="this is a sample endpoint",
    auth_mode="key"
)

ml_client.online_endpoints.begin_create_or_update(endpoint)

key = ml_client.online_endpoints.get_keys(endpoint_name).primary_key

deployment = ManagedOnlineDeployment(
    name="openapi",
    endpoint_name=endpoint_name,
    model="anomaly_detection_model:1", # TODO: Add to config
    code_configuration=CodeConfiguration(
        code="./", scoring_script="score.py"
    ),
    environment=Environment(
        image="mcr.microsoft.com/azureml/minimal-ubuntu22.04-py39-cpu-inference",
        conda_file="env.yaml",
    ),
    instance_type="Standard_DS3_v2", # Add to config
    instance_count=1,
)

deployment = ml_client.online_deployments.begin_create_or_update(deployment)

BillmanH · 2025-02-18T00:32:08Z

I've tried all kinds of variations of this process and I can't find a way to make this deploy.

tdemgit · 2025-02-18T15:35:41Z

Try initializing the environment before passing it to ManagedOnlineDeployment.

from azure.ai.ml.entities import Environment

Register the environment

env = Environment(
name="custom-env",
image="mcr.microsoft.com/azureml/minimal-ubuntu22.04-py39-cpu-inference",
conda_file="env.yaml",
)
ml_client.environments.create_or_update(env)

Reference the registered environment in deployment

deployment = ManagedOnlineDeployment(
name="openapi",
endpoint_name=endpoint_name,
model="anomaly_detection_model:1", # Ensure model is registered
code_configuration=CodeConfiguration(code="./", scoring_script="score.py"),
environment=env.id, # Use registered environment ID
instance_type="Standard_DS3_v2",
instance_count=1,
)

Deploy and wait for completion

deployment = ml_client.online_deployments.begin_create_or_update(deployment).result()

BillmanH · 2025-02-18T16:31:16Z

Thanks for the input, It didn't work.

env = ml_client.environments.get("billmanh-env", label="latest")  # <-- a currently existing env

endpoint_name = "endpt-moe-3767"
key = ml_client.online_endpoints.get_keys(endpoint_name).primary_key

#%%
deployment = ManagedOnlineDeployment(
    name="openapi",
    endpoint_name=endpoint_name,
    model="anomaly_detection_model:1", # TODO: Add to config
    code_configuration=CodeConfiguration(
        code="./", scoring_script="score.py"
    ),
    environment=env,
    instance_type="Standard_DS3_v2", # Add to config
    instance_count=1,
)

deployment = ml_client.online_deployments.begin_create_or_update(deployment)

I updated the env that I have created to have the azureml-inference-server-http library, which is required.

The yaml file for that env:

channels:
  - conda-forge
dependencies:
  - python=3.12.9
  - pip
  - pip:
      - mlflow
      - argparse
      - azure-ai-ml
      - azureml-mlflow
      - azureml-core
      - azure-identity
      - pandas
      - numpy
      - scikit-learn
      - matplotlib
      - joblib
      - pyyaml
      - uuid
      - azureml-inference-server-http
name: billmanh-env

tdemgit · 2025-02-18T16:39:18Z

You need to pass the environment's Id, not the object itself.
So:
environment = env.id

BillmanH · 2025-02-18T17:22:23Z

Same.

A note after some work in the UI I was able to deploy manually:

When I use the UI to deploy the image using the same code, I get logs that have the python error

It would be Super useful to have those logs in the SDK (as opposed to the 500 error). Because the error is 500 I don't know if it is crashing because of an issue with the SDK or an issue.

github-actions · 2025-02-18T18:17:33Z

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.

tdemgit · 2025-02-18T18:19:53Z

would this, added at the end of your script, give you what you need then:

logs = ml_client.online_deployments.get_logs(endpoint_name=endpoint_name, name="openapi")
print(logs)

Rileyjs · 2025-02-18T21:42:19Z

Some tweaks needed to the deployment object got it working.

deployment = ManagedOnlineDeployment(
    name="openapi",
    endpoint_name=endpoint_name,
    model=f'{model_deploy_name}:{model_deploy_version}',
    code_configuration=CodeConfiguration(
        code=f'{code_scoring_dir}', scoring_script=f'{code_scoring_script}'
    ),
    environment=f'{env_name}:{env_version}',
    instance_type=f'{compute_sku}',
    instance_count=1,
)

model_deploy_name: "deployed_name"
model_deploy_version: 1

env_name: "premade-env-name"
env_version: 1
# Compute for the endpoint being run
compute_sku: "Standard_DS1_v2"

# Scoring stuff
code_scoring_dir: "openapi/code-decorated"
code_scoring_script: "score.py"```

achauhan-scc · 2025-02-19T04:44:48Z

Here is the error that was logged.
Microsoft.MachineLearning.ModelRegistry.Utilities.Exceptions.MrsApiException: There is no registered model in Account Subscription: xxx, ResourceGroup: xxx, Workspace: xx with id anomaly_detection_model:1

It seems like model was unavailable, as per last comment, providing the correct model details resolve the issue.
Here is example which can used for future reference.
https://github.com/Azure/azureml-examples/blob/main/sdk/python/endpoints/online/managed/online-endpoints-openapi.ipynb

github-actions · 2025-02-19T04:45:11Z

Hi @BillmanH. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

BillmanH · 2025-02-20T20:02:55Z

Thanks for this! We were able to adjust and fix the issue. Cheers!

l0lawrence added Machine Learning Service Attention Workflow: This issue is responsible by Azure service team. labels Feb 18, 2025

github-actions bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Feb 18, 2025

achauhan-scc self-assigned this Feb 19, 2025

achauhan-scc added the issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. label Feb 19, 2025

BillmanH closed this as completed Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Managed Deployment Class returns 500 error #39772

Managed Deployment Class returns 500 error #39772

BillmanH commented Feb 18, 2025 •

edited

Loading

BillmanH commented Feb 18, 2025

BillmanH commented Feb 18, 2025

tdemgit commented Feb 18, 2025

BillmanH commented Feb 18, 2025

tdemgit commented Feb 18, 2025

BillmanH commented Feb 18, 2025

github-actions bot commented Feb 18, 2025

tdemgit commented Feb 18, 2025

Rileyjs commented Feb 18, 2025 •

edited

Loading

achauhan-scc commented Feb 19, 2025

github-actions bot commented Feb 19, 2025

BillmanH commented Feb 20, 2025

Managed Deployment Class returns 500 error #39772

Managed Deployment Class returns 500 error #39772

Comments

BillmanH commented Feb 18, 2025 • edited Loading

BillmanH commented Feb 18, 2025

BillmanH commented Feb 18, 2025

tdemgit commented Feb 18, 2025

Register the environment

Reference the registered environment in deployment

Deploy and wait for completion

BillmanH commented Feb 18, 2025

tdemgit commented Feb 18, 2025

BillmanH commented Feb 18, 2025

github-actions bot commented Feb 18, 2025

tdemgit commented Feb 18, 2025

Rileyjs commented Feb 18, 2025 • edited Loading

achauhan-scc commented Feb 19, 2025

github-actions bot commented Feb 19, 2025

BillmanH commented Feb 20, 2025

BillmanH commented Feb 18, 2025 •

edited

Loading

Rileyjs commented Feb 18, 2025 •

edited

Loading