[BUG] Error in PySpark tests #2211

miguelgfierro · 2025-02-24T15:15:29Z

Description

Spark tests breaking: https://github.com/recommenders-team/recommenders/actions/runs/13447063630/job/37574622335

2025-02-21T01:39:23Z	ERROR	The first run cannot skip downloading Java DB
  2025-02-21T01:39:24Z	FATAL	Fatal error	image scan error: scan error: scan failed: failed analysis: analyze error: pipeline error: failed to analyze layer (sha256:b09e82ba08eb33253154a30bdf386082168ded70dd2b200ccf6044153de91aaa): post analysis error: post analysis error: Unable to initialize the Java DB: Java DB update failed: '--skip-java-db-update' cannot be specified on the first run
  2025-02-21T01:39:24: Call failed with error:
  
  2025-02-21T01:39:24: #### Exception encountered when generating sbom: Command '['trivy', 'image', '--no-progress', '--format', 'spdx-json', '--skip-db-update', '--skip-java-db-update', '--offline-scan', '--output', 'image-details.json', '978a92daa2ad4447aae1b21196dd4a9b.azurecr.io/azureml/azureml_773345bc36a2cce558dfeaaf5d474adf', '--timeout', '10m0s']' returned non-zero exit status 1.
  2025-02-21T01:39:24: The push refers to repository [978a92daa2ad4447aae1b21196dd4a9b.azurecr.io/azureml/azureml_773345bc36a2cce558dfeaaf5d474adf]
  2025-02-21T01:39:24: b09e82ba08eb: Preparing
  2025-02-21T01:39:24: d2596f004fd8: Preparing
  2025-02-21T01:39:24: da9cf28542bf: Preparing
  2025-02-21T01:39:24: 4053f9139ccd: Preparing
  2025-02-21T01:39:24: 1e3547805ea3: Preparing
  2025-02-21T01:39:24: eac9efed49d3: Preparing
  2025-02-21T01:39:24: 2bf0156340ce: Preparing
  2025-02-21T01:39:24: f36fd4bb7334: Preparing
  2025-02-21T01:39:24: 2bf0156340ce: Waiting
  2025-02-21T01:39:24: eac9efed49d3: Waiting
  2025-02-21T01:39:24: f36fd4bb7334: Waiting
  2025-02-21T01:39:24: b09e82ba08eb: Layer already exists
  2025-02-21T01:39:24: d2596f004fd8: Layer already exists
  2025-02-21T01:39:24: da9cf28542bf: Layer already exists
  2025-02-21T01:39:24: 1e3547805ea3: Layer already exists
  2025-02-21T01:39:24: 4053f9139ccd: Layer already exists
  2025-02-21T01:39:24: eac9efed49d3: Layer already exists
  2025-02-21T01:39:24: 2bf0156340ce: Layer already exists
  2025-02-21T01:39:24: f36fd4bb7334: Layer already exists
  2025-02-21T01:39:24: 1: digest: sha256:f9a87aa0f05832939d7e545ff7656368c5003c045346d3e19502a22e094c5c33 size: 2018
  
  
  2025-02-21T01:39:24: #### Image digest: sha256:f9a87aa0f05832939d7e545ff7656368c5003c045346d3e19502a22e094c5c33
  2025-02-21T01:39:24: #### Calling generate_sbom
  2025-02-21T01:39:24: #### Generating SBOM 
  2025-02-21T01:39:24: #### Running command: trivy image --no-progress --format spdx-json --skip-db-update --skip-java-db-update --offline-scan --output image-details.json 978a92daa2ad4447aae1b21196dd4a9b.azurecr.io/azureml/azureml_773345bc36a2cce558dfeaaf5d474adf:1 --timeout 10m0s
  2025-02-21T01:39:25Z	INFO	"--format spdx" and "--format spdx-json" disable security scanning
  2025-02-21T01:40:24Z	ERROR	The first run cannot skip downloading Java DB
  2025-02-21T01:40:24Z	FATAL	Fatal error	image scan error: scan error: scan failed: failed analysis: analyze error: pipeline error: failed to analyze layer (sha256:b09e82ba08eb33253154a30bdf386082168ded70dd2b200ccf6044153de91aaa): post analysis error: post analysis error: Unable to initialize the Java DB: Java DB update failed: '--skip-java-db-update' cannot be specified on the first run
  2025-02-21T01:40:24: Call failed with error:
  
  2025-02-21T01:40:24: #### Exception encountered when generating sbom: Command '['trivy', 'image', '--no-progress', '--format', 'spdx-json', '--skip-db-update', '--skip-java-db-update', '--offline-scan', '--output', 'image-details.json', '978a92daa2ad4447aae1b21196dd4a9b.azurecr.io/azureml/azureml_773345bc36a2cce558dfeaaf5d474adf:1', '--timeout', '10m0s']' returned non-zero exit status 1.
  2025-02-21T01:40:24: #### Cleaning up local image cache
  2025-02-21T01:40:24: Deleting 978a92daa2ad4447aae1b21196dd4a9b.azurecr.io/azureml/azureml_773345bc36a2cce558dfeaaf5d474adf from local machine
  2025-02-21T01:40:24: Error response from daemon: page not found
  
  
  2025-02-21T01:40:24: Logging out of Docker registry: 978a92daa2ad4447aae1b21196dd4a9b.azurecr.io
  2025-02-21T01:40:25: Removing login credentials for https://index.docker.io/v1/
  
  
  2025-02-21T01:40:25: Logging out of Docker registry: 978a92daa2ad4447aae1b21196dd4a9b.azurecr.io
  2025-02-21T01:40:25: Removing login credentials for https://index.docker.io/v1/
  
  
  Traceback (most recent call last):
  
    File "/home/runner/work/recommenders/recommenders/tests/ci/azureml_tests/submit_groupwise_azureml_pytest.py", line 171, in <module>
      run_tests(
    File "/home/runner/work/recommenders/recommenders/tests/ci/azureml_tests/aml_utils.py", line 142, in run_tests
  Execution Summary
  =================
  RunId: elated_owl_wth18ktmd9
  Web View: https://ml.azure.com/runs/elated_owl_wth18ktmd9?wsid=/subscriptions/***/resourcegroups/recommenders_project_resources/workspaces/azureml-test-workspace
  
  Warnings:
  AzureMLCompute job failed
  ExecutionFailed: [REDACTED]
  	exit_codes: 1
  	Appinsights Reachable: Some(true)
  
      client.jobs.stream(job.name)
    File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/azure/core/tracing/decorator.py", line 116, in wrapper_use_tracer
      return func(*args, **kwargs)
    File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/azure/ai/ml/_telemetry/activity.py", line 288, in wrapper
      return f(*args, **kwargs)
    File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/azure/ai/ml/operations/_job_operations.py", line 838, in stream
      self._stream_logs_until_completion(
    File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/azure/ai/ml/operations/_job_ops_helper.py", line 334, in stream_logs_until_completion
      raise JobException(
  azure.ai.ml.exceptions.JobException: Exception : 
   {
      "error": {
          "code": "UserError",
          "message": "Execution failed. User process 'python' exited with status code 1. Please check log file 'user_logs/std_log.txt' for error details. Error: 329.43s call     tests/data_validation/recommenders/datasets/test_criteo.py::test_criteo_load_spark_df_full\n70.65s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[20m-20000263-27278-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]\n65.50s call     tests/functional/examples/test_notebooks_pyspark.py::test_als_pyspark_functional\n50.93s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[10m-10000054-10681-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]\n31.99s call     tests/functional/examples/test_notebooks_pyspark.py::test_benchmark_movielens_pyspark[size0-algos0-expected_values_ndcg0]\n26.83s call     tests/smoke/examples/test_notebooks_pyspark.py::test_als_pyspark_smoke\n23.94s call     tests/smoke/examples/test_notebooks_pyspark.py::test_mmlspark_lightgbm_criteo_smoke\n12.09s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[1m-1000209-3883-1-Toy Story (1995)-Animation|Children's|Comedy-1995]\n7.95s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[100k-100000-1682-1-Toy Story (1995)-Animation|Children's|Comedy-1995]\n2.29s setup    tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[100k-100000-1682-1-Toy Story (1995)-Animation|Children's|Comedy-1995]\n0.78s call     tests/data_validation/recommenders/datasets/test_criteo.py::test_criteo_load_spark_df_sample\n0.56s teardown tests/functional/examples/test_notebooks_pyspark.py::test_benchmark_movielens_pyspark[size0-algos0-expected_values_ndcg0]\n0.08s teardown tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[20m-20000263-27278-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]\n0.03s teardown tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[10m-10000054-10681-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]\n\n(18 durations < 0.005s hidden.  Use -vv to show these durations.)\n=========================== short test summary info ============================\nFAILED tests/data_validation/recommenders/datasets/test_criteo.py::test_criteo_load_spark_df_sample\nFAILED tests/smoke/examples/test_notebooks_pyspark.py::test_mmlspark_lightgbm_criteo_smoke\n======== 2 failed, 8 passed, 1 skipped, 5 warnings in 624.55s (0:10:24) ========\n",
          "message_parameters": {},
          "details": []
      },
      "time": "0001-01-01T00:00:00.000Z",
      "component_name": "CommonRuntime"
  } 
  Error: Process completed with exit code 1.

However, the tests run correctly:

============================= test session starts ==============================
  platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0
  rootdir: /mnt/azureml/cr/j/7e7d45a36a84448fa692cde5f8c6215b/exe/wd
  configfile: pyproject.toml
  plugins: cov-5.0.0, hypothesis-6.108.5, mock-3.14.0, typeguard-4.3.0, anyio-4.4.0
  collected 11 items
  
  tests/data_validation/recommenders/datasets/test_movielens.py ....       [ 36%]
  tests/data_validation/recommenders/datasets/test_criteo.py ..            [ 54%]
  tests/smoke/examples/test_notebooks_pyspark.py .                         [ 63%]
  tests/functional/examples/test_notebooks_pyspark.py s                    [ 72%]
  tests/smoke/examples/test_notebooks_pyspark.py .                         [ 81%]
  tests/functional/examples/test_notebooks_pyspark.py ..                   [100%]
 ============================== slowest durations ===============================
  420.11s call     tests/data_validation/recommenders/datasets/test_criteo.py::test_criteo_load_spark_df_full
  77.86s call     tests/functional/examples/test_notebooks_pyspark.py::test_als_pyspark_functional
  71.30s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[20m-20000263-27278-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]
  46.24s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[10m-10000054-10681-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]
  42.53s call     tests/functional/examples/test_notebooks_pyspark.py::test_benchmark_movielens_pyspark[size0-algos0-expected_values_ndcg0]
  36.71s call     tests/smoke/examples/test_notebooks_pyspark.py::test_mmlspark_lightgbm_criteo_smoke
  33.70s call     tests/smoke/examples/test_notebooks_pyspark.py::test_als_pyspark_smoke
  11.91s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[1m-1000209-3883-1-Toy Story (1995)-Animation|Children's|Comedy-1995]
  10.10s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[100k-100000-1682-1-Toy Story (1995)-Animation|Children's|Comedy-1995]
  3.33s setup    tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[100k-100000-1682-1-Toy Story (1995)-Animation|Children's|Comedy-1995]
  2.10s call     tests/data_validation/recommenders/datasets/test_criteo.py::test_criteo_load_spark_df_sample
  1.01s teardown tests/functional/examples/test_notebooks_pyspark.py::test_benchmark_movielens_pyspark[size0-algos0-expected_values_ndcg0]
  0.08s teardown tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[20m-20000263-27278-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]
  0.04s teardown tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[10m-10000054-10681-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]
  
  (18 durations < 0.005s hidden.  Use -vv to show these durations.)
  ============ 10 passed, 1 skipped, 5 warnings in 758.54s (0:12:38) =============

In which platform does it happen?

How do we replicate the issue?

Expected behavior (i.e. solution)

Willingness to contribute

Yes, I can contribute for this issue independently.
Yes, I can contribute for this issue with guidance from Recommenders community.
No, I cannot contribute at this time.

Other Comments

any suggestion why this could be failing? @anargyri @SimonYansenZhao

The text was updated successfully, but these errors were encountered:

anargyri · 2025-02-24T18:17:32Z

It looks like something related to scanning the docker image

2025-02-21T01:38:22: #### Running command: trivy image --no-progress --format spdx-json --skip-db-update --skip-java-db-update --offline-scan --output image-details.json 978a92daa2ad4447aae1b21196dd4a9b.azurecr.io/azureml/azureml_773345bc36a2cce558dfeaaf5d474adf --timeout 10m0s

and the error looks like this aquasecurity/trivy#486

miguelgfierro added the bug Something isn't working label Feb 24, 2025

miguelgfierro mentioned this issue Mar 3, 2025

[BUG] Issue in the GPU test due to exception with trivy image scan #2212

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Error in PySpark tests #2211

[BUG] Error in PySpark tests #2211

miguelgfierro commented Feb 24, 2025

anargyri commented Feb 24, 2025