Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error in PySpark tests #2211

Open
3 tasks
miguelgfierro opened this issue Feb 24, 2025 · 1 comment
Open
3 tasks

[BUG] Error in PySpark tests #2211

miguelgfierro opened this issue Feb 24, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@miguelgfierro
Copy link
Collaborator

Description

Spark tests breaking: https://github.com/recommenders-team/recommenders/actions/runs/13447063630/job/37574622335

2025-02-21T01:39:23Z	ERROR	The first run cannot skip downloading Java DB
  2025-02-21T01:39:24Z	FATAL	Fatal error	image scan error: scan error: scan failed: failed analysis: analyze error: pipeline error: failed to analyze layer (sha256:b09e82ba08eb33253154a30bdf386082168ded70dd2b200ccf6044153de91aaa): post analysis error: post analysis error: Unable to initialize the Java DB: Java DB update failed: '--skip-java-db-update' cannot be specified on the first run
  2025-02-21T01:39:24: Call failed with error:
  
  2025-02-21T01:39:24: #### Exception encountered when generating sbom: Command '['trivy', 'image', '--no-progress', '--format', 'spdx-json', '--skip-db-update', '--skip-java-db-update', '--offline-scan', '--output', 'image-details.json', '978a92daa2ad4447aae1b21196dd4a9b.azurecr.io/azureml/azureml_773345bc36a2cce558dfeaaf5d474adf', '--timeout', '10m0s']' returned non-zero exit status 1.
  2025-02-21T01:39:24: The push refers to repository [978a92daa2ad4447aae1b21196dd4a9b.azurecr.io/azureml/azureml_773345bc36a2cce558dfeaaf5d474adf]
  2025-02-21T01:39:24: b09e82ba08eb: Preparing
  2025-02-21T01:39:24: d2596f004fd8: Preparing
  2025-02-21T01:39:24: da9cf28542bf: Preparing
  2025-02-21T01:39:24: 4053f9139ccd: Preparing
  2025-02-21T01:39:24: 1e3547805ea3: Preparing
  2025-02-21T01:39:24: eac9efed49d3: Preparing
  2025-02-21T01:39:24: 2bf0156340ce: Preparing
  2025-02-21T01:39:24: f36fd4bb7334: Preparing
  2025-02-21T01:39:24: 2bf0156340ce: Waiting
  2025-02-21T01:39:24: eac9efed49d3: Waiting
  2025-02-21T01:39:24: f36fd4bb7334: Waiting
  2025-02-21T01:39:24: b09e82ba08eb: Layer already exists
  2025-02-21T01:39:24: d2596f004fd8: Layer already exists
  2025-02-21T01:39:24: da9cf28542bf: Layer already exists
  2025-02-21T01:39:24: 1e3547805ea3: Layer already exists
  2025-02-21T01:39:24: 4053f9139ccd: Layer already exists
  2025-02-21T01:39:24: eac9efed49d3: Layer already exists
  2025-02-21T01:39:24: 2bf0156340ce: Layer already exists
  2025-02-21T01:39:24: f36fd4bb7334: Layer already exists
  2025-02-21T01:39:24: 1: digest: sha256:f9a87aa0f05832939d7e545ff7656368c5003c045346d3e19502a22e094c5c33 size: 2018
  
  
  2025-02-21T01:39:24: #### Image digest: sha256:f9a87aa0f05832939d7e545ff7656368c5003c045346d3e19502a22e094c5c33
  2025-02-21T01:39:24: #### Calling generate_sbom
  2025-02-21T01:39:24: #### Generating SBOM 
  2025-02-21T01:39:24: #### Running command: trivy image --no-progress --format spdx-json --skip-db-update --skip-java-db-update --offline-scan --output image-details.json 978a92daa2ad4447aae1b21196dd4a9b.azurecr.io/azureml/azureml_773345bc36a2cce558dfeaaf5d474adf:1 --timeout 10m0s
  2025-02-21T01:39:25Z	INFO	"--format spdx" and "--format spdx-json" disable security scanning
  2025-02-21T01:40:24Z	ERROR	The first run cannot skip downloading Java DB
  2025-02-21T01:40:24Z	FATAL	Fatal error	image scan error: scan error: scan failed: failed analysis: analyze error: pipeline error: failed to analyze layer (sha256:b09e82ba08eb33253154a30bdf386082168ded70dd2b200ccf6044153de91aaa): post analysis error: post analysis error: Unable to initialize the Java DB: Java DB update failed: '--skip-java-db-update' cannot be specified on the first run
  2025-02-21T01:40:24: Call failed with error:
  
  2025-02-21T01:40:24: #### Exception encountered when generating sbom: Command '['trivy', 'image', '--no-progress', '--format', 'spdx-json', '--skip-db-update', '--skip-java-db-update', '--offline-scan', '--output', 'image-details.json', '978a92daa2ad4447aae1b21196dd4a9b.azurecr.io/azureml/azureml_773345bc36a2cce558dfeaaf5d474adf:1', '--timeout', '10m0s']' returned non-zero exit status 1.
  2025-02-21T01:40:24: #### Cleaning up local image cache
  2025-02-21T01:40:24: Deleting 978a92daa2ad4447aae1b21196dd4a9b.azurecr.io/azureml/azureml_773345bc36a2cce558dfeaaf5d474adf from local machine
  2025-02-21T01:40:24: Error response from daemon: page not found
  
  
  2025-02-21T01:40:24: Logging out of Docker registry: 978a92daa2ad4447aae1b21196dd4a9b.azurecr.io
  2025-02-21T01:40:25: Removing login credentials for https://index.docker.io/v1/
  
  
  2025-02-21T01:40:25: Logging out of Docker registry: 978a92daa2ad4447aae1b21196dd4a9b.azurecr.io
  2025-02-21T01:40:25: Removing login credentials for https://index.docker.io/v1/
  
  
  Traceback (most recent call last):
  
    File "/home/runner/work/recommenders/recommenders/tests/ci/azureml_tests/submit_groupwise_azureml_pytest.py", line 171, in <module>
      run_tests(
    File "/home/runner/work/recommenders/recommenders/tests/ci/azureml_tests/aml_utils.py", line 142, in run_tests
  Execution Summary
  =================
  RunId: elated_owl_wth18ktmd9
  Web View: https://ml.azure.com/runs/elated_owl_wth18ktmd9?wsid=/subscriptions/***/resourcegroups/recommenders_project_resources/workspaces/azureml-test-workspace
  
  Warnings:
  AzureMLCompute job failed
  ExecutionFailed: [REDACTED]
  	exit_codes: 1
  	Appinsights Reachable: Some(true)
  
      client.jobs.stream(job.name)
    File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/azure/core/tracing/decorator.py", line 116, in wrapper_use_tracer
      return func(*args, **kwargs)
    File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/azure/ai/ml/_telemetry/activity.py", line 288, in wrapper
      return f(*args, **kwargs)
    File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/azure/ai/ml/operations/_job_operations.py", line 838, in stream
      self._stream_logs_until_completion(
    File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/azure/ai/ml/operations/_job_ops_helper.py", line 334, in stream_logs_until_completion
      raise JobException(
  azure.ai.ml.exceptions.JobException: Exception : 
   {
      "error": {
          "code": "UserError",
          "message": "Execution failed. User process 'python' exited with status code 1. Please check log file 'user_logs/std_log.txt' for error details. Error: 329.43s call     tests/data_validation/recommenders/datasets/test_criteo.py::test_criteo_load_spark_df_full\n70.65s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[20m-20000263-27278-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]\n65.50s call     tests/functional/examples/test_notebooks_pyspark.py::test_als_pyspark_functional\n50.93s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[10m-10000054-10681-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]\n31.99s call     tests/functional/examples/test_notebooks_pyspark.py::test_benchmark_movielens_pyspark[size0-algos0-expected_values_ndcg0]\n26.83s call     tests/smoke/examples/test_notebooks_pyspark.py::test_als_pyspark_smoke\n23.94s call     tests/smoke/examples/test_notebooks_pyspark.py::test_mmlspark_lightgbm_criteo_smoke\n12.09s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[1m-1000209-3883-1-Toy Story (1995)-Animation|Children's|Comedy-1995]\n7.95s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[100k-100000-1682-1-Toy Story (1995)-Animation|Children's|Comedy-1995]\n2.29s setup    tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[100k-100000-1682-1-Toy Story (1995)-Animation|Children's|Comedy-1995]\n0.78s call     tests/data_validation/recommenders/datasets/test_criteo.py::test_criteo_load_spark_df_sample\n0.56s teardown tests/functional/examples/test_notebooks_pyspark.py::test_benchmark_movielens_pyspark[size0-algos0-expected_values_ndcg0]\n0.08s teardown tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[20m-20000263-27278-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]\n0.03s teardown tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[10m-10000054-10681-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]\n\n(18 durations < 0.005s hidden.  Use -vv to show these durations.)\n=========================== short test summary info ============================\nFAILED tests/data_validation/recommenders/datasets/test_criteo.py::test_criteo_load_spark_df_sample\nFAILED tests/smoke/examples/test_notebooks_pyspark.py::test_mmlspark_lightgbm_criteo_smoke\n======== 2 failed, 8 passed, 1 skipped, 5 warnings in 624.55s (0:10:24) ========\n",
          "message_parameters": {},
          "details": []
      },
      "time": "0001-01-01T00:00:00.000Z",
      "component_name": "CommonRuntime"
  } 
  Error: Process completed with exit code 1.

However, the tests run correctly:

============================= test session starts ==============================
  platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0
  rootdir: /mnt/azureml/cr/j/7e7d45a36a84448fa692cde5f8c6215b/exe/wd
  configfile: pyproject.toml
  plugins: cov-5.0.0, hypothesis-6.108.5, mock-3.14.0, typeguard-4.3.0, anyio-4.4.0
  collected 11 items
  
  tests/data_validation/recommenders/datasets/test_movielens.py ....       [ 36%]
  tests/data_validation/recommenders/datasets/test_criteo.py ..            [ 54%]
  tests/smoke/examples/test_notebooks_pyspark.py .                         [ 63%]
  tests/functional/examples/test_notebooks_pyspark.py s                    [ 72%]
  tests/smoke/examples/test_notebooks_pyspark.py .                         [ 81%]
  tests/functional/examples/test_notebooks_pyspark.py ..                   [100%]
 ============================== slowest durations ===============================
  420.11s call     tests/data_validation/recommenders/datasets/test_criteo.py::test_criteo_load_spark_df_full
  77.86s call     tests/functional/examples/test_notebooks_pyspark.py::test_als_pyspark_functional
  71.30s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[20m-20000263-27278-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]
  46.24s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[10m-10000054-10681-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]
  42.53s call     tests/functional/examples/test_notebooks_pyspark.py::test_benchmark_movielens_pyspark[size0-algos0-expected_values_ndcg0]
  36.71s call     tests/smoke/examples/test_notebooks_pyspark.py::test_mmlspark_lightgbm_criteo_smoke
  33.70s call     tests/smoke/examples/test_notebooks_pyspark.py::test_als_pyspark_smoke
  11.91s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[1m-1000209-3883-1-Toy Story (1995)-Animation|Children's|Comedy-1995]
  10.10s call     tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[100k-100000-1682-1-Toy Story (1995)-Animation|Children's|Comedy-1995]
  3.33s setup    tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[100k-100000-1682-1-Toy Story (1995)-Animation|Children's|Comedy-1995]
  2.10s call     tests/data_validation/recommenders/datasets/test_criteo.py::test_criteo_load_spark_df_sample
  1.01s teardown tests/functional/examples/test_notebooks_pyspark.py::test_benchmark_movielens_pyspark[size0-algos0-expected_values_ndcg0]
  0.08s teardown tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[20m-20000263-27278-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]
  0.04s teardown tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df[10m-10000054-10681-1-Toy Story (1995)-Adventure|Animation|Children|Comedy|Fantasy-1995]
  
  (18 durations < 0.005s hidden.  Use -vv to show these durations.)
  ============ 10 passed, 1 skipped, 5 warnings in 758.54s (0:12:38) =============

In which platform does it happen?

How do we replicate the issue?

Expected behavior (i.e. solution)

Willingness to contribute

  • Yes, I can contribute for this issue independently.
  • Yes, I can contribute for this issue with guidance from Recommenders community.
  • No, I cannot contribute at this time.

Other Comments

any suggestion why this could be failing? @anargyri @SimonYansenZhao

@miguelgfierro miguelgfierro added the bug Something isn't working label Feb 24, 2025
@anargyri
Copy link
Collaborator

It looks like something related to scanning the docker image

2025-02-21T01:38:22: #### Running command: trivy image --no-progress --format spdx-json --skip-db-update --skip-java-db-update --offline-scan --output image-details.json 978a92daa2ad4447aae1b21196dd4a9b.azurecr.io/azureml/azureml_773345bc36a2cce558dfeaaf5d474adf --timeout 10m0s

and the error looks like this aquasecurity/trivy#486

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants