Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managed Deployment Class returns 500 error #39772

Closed
BillmanH opened this issue Feb 18, 2025 · 12 comments
Closed

Managed Deployment Class returns 500 error #39772

BillmanH opened this issue Feb 18, 2025 · 12 comments
Assignees
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. Machine Learning question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@BillmanH
Copy link
Contributor

BillmanH commented Feb 18, 2025

  • Package Name: azure.ai.ml
  • Package Version: 1.17.1
  • Operating System: WIN
  • python 3.12.4:

Following this this document.

The error is cryptic, as per conventions.
Error:

{
    "name": "HttpResponseError",
    "message": "(ServiceError) Received 500 from a service request
Code: ServiceError
Message: Received 500 from a service request
Target: POST https://inference-deployment-api.inference-deployment-api.svc/inferencedeployment/subscriptions/e1528a7a-9681-47d5-8fbe-cc0850266856/resourceGroups/mloppsexample/providers/Microsoft.MachineLearningServices/workspaces/testmlinstance/endpoints/endpt-moe-7362/deployments/openapi/v2?api-version=2021-10-01&validateOnly=False
Exception Details:	(InternalServerError) 
    Code: InternalServerError
    Message: 
Additional Information:Type: ComponentName
Info: {
    "value": "managementfrontend"
}Type: Correlation
Info: {
    "value": {
        "operation": "e8088be0c84fc407cdf44d797d0794a4",
        "request": "db5604699c11951b"
    }
}Type: Environment
Info: {
    "value": "centralus"
}Type: Location
Info: {
    "value": "centralus"
}Type: Time
Info: {
    "value": "2025-02-17T23:34:07.5387801+00:00"
}",
    "stack": "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m
\u001b[1;31mHttpResponseError\u001b[0m                         Traceback (most recent call last)
File \u001b[1;32mc:\\Users\\william.harding\\repos\\learn-azureml\\register_API_endpoint.py:16\u001b[0m
\u001b[0;32m      1\u001b[0m deployment \u001b[39m=\u001b[39m ManagedOnlineDeployment(
\u001b[0;32m      2\u001b[0m     name\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mopenapi\u001b[39m\u001b[39m\"\u001b[39m,
\u001b[0;32m      3\u001b[0m     endpoint_name\u001b[39m=\u001b[39mendpoint_name,
\u001b[1;32m   (...)\u001b[0m
\u001b[0;32m     13\u001b[0m     instance_count\u001b[39m=\u001b[39m\u001b[39m1\u001b[39m,
\u001b[0;32m     14\u001b[0m )
\u001b[1;32m---> 16\u001b[0m deployment \u001b[39m=\u001b[39m ml_client\u001b[39m.\u001b[39monline_deployments\u001b[39m.\u001b[39mbegin_create_or_update(deployment)\u001b[39m.\u001b[39mresult()

File \u001b[1;32mc:\\Users\\william.harding\\AppData\\Local\\miniconda3\\envs\\ailabs\\Lib\\site-packages\\azure\\core\\tracing\\decorator.py:94\u001b[0m, in \u001b[0;36mdistributed_trace.<locals>.decorator.<locals>.wrapper_use_tracer\u001b[1;34m(*args, **kwargs)\u001b[0m
\u001b[0;32m     92\u001b[0m span_impl_type \u001b[39m=\u001b[39m settings\u001b[39m.\u001b[39mtracing_implementation()
\u001b[0;32m     93\u001b[0m \u001b[39mif\u001b[39;00m span_impl_type \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m:
\u001b[1;32m---> 94\u001b[0m     \u001b[39mreturn\u001b[39;00m func(\u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)
\u001b[0;32m     96\u001b[0m \u001b[39m# Merge span is parameter is set, but only if no explicit parent are passed\u001b[39;00m
\u001b[0;32m     97\u001b[0m \u001b[39mif\u001b[39;00m merge_span \u001b[39mand\u001b[39;00m \u001b[39mnot\u001b[39;00m passed_in_parent:

File \u001b[1;32mc:\\Users\\william.harding\\AppData\\Local\\miniconda3\\envs\\ailabs\\Lib\\site-packages\\azure\\ai\\ml\\_telemetry\\activity.py:289\u001b[0m, in \u001b[0;36mmonitor_with_activity.<locals>.monitor.<locals>.wrapper\u001b[1;34m(*args, **kwargs)\u001b[0m
\u001b[0;32m    285\u001b[0m     \u001b[39mwith\u001b[39m tracer\u001b[39m.\u001b[39mspan():
\u001b[0;32m    286\u001b[0m         \u001b[39mwith\u001b[39;00m log_activity(
\u001b[0;32m    287\u001b[0m             logger\u001b[39m.\u001b[39mpackage_logger, activity_name \u001b[39mor\u001b[39;00m f\u001b[39m.\u001b[39m\u001b[39m__name__\u001b[39m, activity_type, custom_dimensions
\u001b[0;32m    288\u001b[0m         ):
\u001b[1;32m--> 289\u001b[0m             \u001b[39mreturn\u001b[39;00m f(\u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)
\u001b[0;32m    290\u001b[0m \u001b[39melif\u001b[39;00m \u001b[39mhasattr\u001b[39m(logger, \u001b[39m\"\u001b[39m\u001b[39mpackage_logger\u001b[39m\u001b[39m\"\u001b[39m):
\u001b[0;32m    291\u001b[0m     \u001b[39mwith\u001b[39;00m log_activity(logger\u001b[39m.\u001b[39mpackage_logger, activity_name \u001b[39mor\u001b[39;00m f\u001b[39m.\u001b[39m\u001b[39m__name__\u001b[39m, activity_type, custom_dimensions):

File \u001b[1;32mc:\\Users\\william.harding\\AppData\\Local\\miniconda3\\envs\\ailabs\\Lib\\site-packages\\azure\\ai\\ml\\operations\\_online_deployment_operations.py:218\u001b[0m, in \u001b[0;36mOnlineDeploymentOperations.begin_create_or_update\u001b[1;34m(self, deployment, local, vscode_debug, skip_script_validation, local_enable_gpu, **kwargs)\u001b[0m
\u001b[0;32m    216\u001b[0m     log_and_raise_error(ex)
\u001b[0;32m    217\u001b[0m \u001b[39melse\u001b[39;00m:
\u001b[1;32m--> 218\u001b[0m     \u001b[39mraise\u001b[39;00m ex

File \u001b[1;32mc:\\Users\\william.harding\\AppData\\Local\\miniconda3\\envs\\ailabs\\Lib\\site-packages\\azure\\ai\\ml\\operations\\_online_deployment_operations.py:213\u001b[0m, in \u001b[0;36mOnlineDeploymentOperations.begin_create_or_update\u001b[1;34m(self, deployment, local, vscode_debug, skip_script_validation, local_enable_gpu, **kwargs)\u001b[0m
\u001b[0;32m    211\u001b[0m         \u001b[39mreturn\u001b[39;00m poller
\u001b[0;32m    212\u001b[0m     \u001b[39mexcept\u001b[39;00m \u001b[39mException\u001b[39;00m \u001b[39mas\u001b[39;00m ex:
\u001b[1;32m--> 213\u001b[0m         \u001b[39mraise\u001b[39;00m ex
\u001b[0;32m    214\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mException\u001b[39;00m \u001b[39mas\u001b[39;00m ex:  \u001b[39m# pylint: disable=W0718\u001b[39;00m
\u001b[0;32m    215\u001b[0m     \u001b[39mif\u001b[39;00m \u001b[39misinstance\u001b[39m(ex, (ValidationException, SchemaValidationError)):

File \u001b[1;32mc:\\Users\\william.harding\\AppData\\Local\\miniconda3\\envs\\ailabs\\Lib\\site-packages\\azure\\ai\\ml\\operations\\_online_deployment_operations.py:196\u001b[0m, in \u001b[0;36mOnlineDeploymentOperations.begin_create_or_update\u001b[1;34m(self, deployment, local, vscode_debug, skip_script_validation, local_enable_gpu, **kwargs)\u001b[0m
\u001b[0;32m    192\u001b[0m         module_logger\u001b[39m.\u001b[39minfo(\u001b[39m\"\u001b[39m\u001b[39m\\n\u001b[39;00m\u001b[39mStarting deployment\u001b[39m\u001b[39m\"\u001b[39m)
\u001b[0;32m    194\u001b[0m     deployment_rest \u001b[39m=\u001b[39m deployment\u001b[39m.\u001b[39m_to_rest_object(location\u001b[39m=\u001b[39mlocation)  \u001b[39m# type: ignore\u001b[39;00m
\u001b[1;32m--> 196\u001b[0m     poller \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_online_deployment\u001b[39m.\u001b[39mbegin_create_or_update(
\u001b[0;32m    197\u001b[0m         resource_group_name\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_resource_group_name,
\u001b[0;32m    198\u001b[0m         workspace_name\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_workspace_name,
\u001b[0;32m    199\u001b[0m         endpoint_name\u001b[39m=\u001b[39mdeployment\u001b[39m.\u001b[39mendpoint_name,
\u001b[0;32m    200\u001b[0m         deployment_name\u001b[39m=\u001b[39mdeployment\u001b[39m.\u001b[39mname,
\u001b[0;32m    201\u001b[0m         body\u001b[39m=\u001b[39mdeployment_rest,
\u001b[0;32m    202\u001b[0m         polling\u001b[39m=\u001b[39mAzureMLPolling(
\u001b[0;32m    203\u001b[0m             LROConfigurations\u001b[39m.\u001b[39mPOLL_INTERVAL,
\u001b[0;32m    204\u001b[0m             path_format_arguments\u001b[39m=\u001b[39mpath_format_arguments,
\u001b[0;32m    205\u001b[0m             \u001b[39m*\u001b[39m\u001b[39m*\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_init_kwargs,
\u001b[0;32m    206\u001b[0m         ),
\u001b[0;32m    207\u001b[0m         polling_interval\u001b[39m=\u001b[39mLROConfigurations\u001b[39m.\u001b[39mPOLL_INTERVAL,
\u001b[0;32m    208\u001b[0m         \u001b[39m*\u001b[39m\u001b[39m*\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_init_kwargs,
\u001b[0;32m    209\u001b[0m         \u001b[39mcls\u001b[39m\u001b[39m=\u001b[39m\u001b[39mlambda\u001b[39;00m response, deserialized, headers: OnlineDeployment\u001b[39m.\u001b[39m_from_rest_object(deserialized),
\u001b[0;32m    210\u001b[0m     )
\u001b[0;32m    211\u001b[0m     \u001b[39mreturn\u001b[39;00m poller
\u001b[0;32m    212\u001b[0m \u001b[39mexcept\u001b[39;00m

Python function:

deployment = ManagedOnlineDeployment(
    name="openapi",
    endpoint_name=endpoint_name,
    model="anomaly_detection_model:1", 
    code_configuration=CodeConfiguration(
        code="./", scoring_script="score.py"
    ),
    environment=Environment(
        image="mcr.microsoft.com/azureml/minimal-ubuntu22.04-py39-cpu-inference",
        conda_file="env.yaml",
    ),
    instance_type="Standard_DS3_v2", 
    instance_count=1,
)

deployment = ml_client.online_deployments.begin_create_or_update(deployment).result()

Is this a valid deployment process? Is there something missing?
Can you help to give us an example script that would work?

@BillmanH
Copy link
Contributor Author

Full process:


endpoint_name = f"endpt-moe-{random.randint(0,10000)}"

credential = DefaultAzureCredential()
ml_client = MLClient(
    credential,
    subscription_id=subscription_id,
    resource_group_name=resource_group,
    workspace_name=workspace_name,
)

endpoint = ManagedOnlineEndpoint(
    public_network_access="enabled",
    name = endpoint_name, 
    description="this is a sample endpoint",
    auth_mode="key"
)

ml_client.online_endpoints.begin_create_or_update(endpoint)

key = ml_client.online_endpoints.get_keys(endpoint_name).primary_key

deployment = ManagedOnlineDeployment(
    name="openapi",
    endpoint_name=endpoint_name,
    model="anomaly_detection_model:1", # TODO: Add to config
    code_configuration=CodeConfiguration(
        code="./", scoring_script="score.py"
    ),
    environment=Environment(
        image="mcr.microsoft.com/azureml/minimal-ubuntu22.04-py39-cpu-inference",
        conda_file="env.yaml",
    ),
    instance_type="Standard_DS3_v2", # Add to config
    instance_count=1,
)

deployment = ml_client.online_deployments.begin_create_or_update(deployment)


@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Feb 18, 2025
@BillmanH
Copy link
Contributor Author

I've tried all kinds of variations of this process and I can't find a way to make this deploy.

@tdemgit
Copy link

tdemgit commented Feb 18, 2025

Try initializing the environment before passing it to ManagedOnlineDeployment.

from azure.ai.ml.entities import Environment

Register the environment

env = Environment(
name="custom-env",
image="mcr.microsoft.com/azureml/minimal-ubuntu22.04-py39-cpu-inference",
conda_file="env.yaml",
)
ml_client.environments.create_or_update(env)

Reference the registered environment in deployment

deployment = ManagedOnlineDeployment(
name="openapi",
endpoint_name=endpoint_name,
model="anomaly_detection_model:1", # Ensure model is registered
code_configuration=CodeConfiguration(code="./", scoring_script="score.py"),
environment=env.id, # Use registered environment ID
instance_type="Standard_DS3_v2",
instance_count=1,
)

Deploy and wait for completion

deployment = ml_client.online_deployments.begin_create_or_update(deployment).result()

@BillmanH
Copy link
Contributor Author

Thanks for the input, It didn't work.

env = ml_client.environments.get("billmanh-env", label="latest")  # <-- a currently existing env

endpoint_name = "endpt-moe-3767"
key = ml_client.online_endpoints.get_keys(endpoint_name).primary_key

#%%
deployment = ManagedOnlineDeployment(
    name="openapi",
    endpoint_name=endpoint_name,
    model="anomaly_detection_model:1", # TODO: Add to config
    code_configuration=CodeConfiguration(
        code="./", scoring_script="score.py"
    ),
    environment=env,
    instance_type="Standard_DS3_v2", # Add to config
    instance_count=1,
)

deployment = ml_client.online_deployments.begin_create_or_update(deployment)

I updated the env that I have created to have the azureml-inference-server-http library, which is required.

The yaml file for that env:

channels:
  - conda-forge
dependencies:
  - python=3.12.9
  - pip
  - pip:
      - mlflow
      - argparse
      - azure-ai-ml
      - azureml-mlflow
      - azureml-core
      - azure-identity
      - pandas
      - numpy
      - scikit-learn
      - matplotlib
      - joblib
      - pyyaml
      - uuid
      - azureml-inference-server-http
name: billmanh-env

@tdemgit
Copy link

tdemgit commented Feb 18, 2025

You need to pass the environment's Id, not the object itself.
So:
environment = env.id

@BillmanH
Copy link
Contributor Author

Same.

A note after some work in the UI I was able to deploy manually:

Image
When I use the UI to deploy the image using the same code, I get logs that have the python error

It would be Super useful to have those logs in the SDK (as opposed to the 500 error). Because the error is 500 I don't know if it is crashing because of an issue with the SDK or an issue.

@l0lawrence l0lawrence added Machine Learning Service Attention Workflow: This issue is responsible by Azure service team. labels Feb 18, 2025
@github-actions github-actions bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Feb 18, 2025
Copy link

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.

@tdemgit
Copy link

tdemgit commented Feb 18, 2025

would this, added at the end of your script, give you what you need then:

logs = ml_client.online_deployments.get_logs(endpoint_name=endpoint_name, name="openapi")
print(logs)

@Rileyjs
Copy link

Rileyjs commented Feb 18, 2025

Some tweaks needed to the deployment object got it working.

deployment = ManagedOnlineDeployment(
    name="openapi",
    endpoint_name=endpoint_name,
    model=f'{model_deploy_name}:{model_deploy_version}',
    code_configuration=CodeConfiguration(
        code=f'{code_scoring_dir}', scoring_script=f'{code_scoring_script}'
    ),
    environment=f'{env_name}:{env_version}',
    instance_type=f'{compute_sku}',
    instance_count=1,
)
model_deploy_name: "deployed_name"
model_deploy_version: 1

env_name: "premade-env-name"
env_version: 1
# Compute for the endpoint being run
compute_sku: "Standard_DS1_v2"

# Scoring stuff
code_scoring_dir: "openapi/code-decorated"
code_scoring_script: "score.py"```

@achauhan-scc
Copy link
Member

Here is the error that was logged.
Microsoft.MachineLearning.ModelRegistry.Utilities.Exceptions.MrsApiException: There is no registered model in Account Subscription: xxx, ResourceGroup: xxx, Workspace: xx with id anomaly_detection_model:1

It seems like model was unavailable, as per last comment, providing the correct model details resolve the issue.
Here is example which can used for future reference.
https://github.com/Azure/azureml-examples/blob/main/sdk/python/endpoints/online/managed/online-endpoints-openapi.ipynb

@achauhan-scc achauhan-scc self-assigned this Feb 19, 2025
@achauhan-scc achauhan-scc added the issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. label Feb 19, 2025
Copy link

Hi @BillmanH. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

@BillmanH
Copy link
Contributor Author

Thanks for this! We were able to adjust and fix the issue. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. Machine Learning question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

5 participants