🐕 Batch: Refactoring Test workflows in models #1484

DhanshreeA · 2025-01-03T10:44:15Z

Summary

This issue will encompass efforts to reconcile, clean up, and enhance our test (and build) pipelines for individual models.

We currently have a test module and CLI command (ersilia test ...) that can check a given model for functionality, completeness, and correctness. In addition to this, we also have a testing playground - a test utility which checks a given model for functionality, completeness, and correctness; and is able to simulate running one or more models on a user's system.

Existing test in our model pipeline is quite redundant in face of these functionalities because it is quite naive in comparison since it only tests nullity in model predictions, and is not robust to how a model might serialize its outputs. Moreover, the Docker build pipelines are bloated with code that can be removed in favor of a singular workflow testing the built images. We also need to handle testing for ARM and AMD builds more smartly since currently we only test the AMD images, but recently we have experienced some models successfully building for the ARM platform but then not actually working.

Furthermore, we need to revisit H5 serialization within Ersilia, and also include tests for this functionality at the level of testing models.

Each of the objectives below should be considered individual tasks, and should be addressed in separate PRs referencing this issue.

Objective(s)

Consolidate the following input-output combinations in the testing scenarios covered by the ersilia test command:

Input = CSV - Output = CSV
Input = CSV - Output = HDF5
Input = CSV - Output = JSON
Input = SMILE - Output = CSV
Input = SMILE - Output = HDF5
Input = SMILE - Output = JSON

For the test-model.yml workflow, we should remove the current testing logic (L128-L144) and keep it in favor of only using the ersilia test command. We also want to upload the logs generated from this command, as well as the results of this command as artifacts with a retention period of 14 days.
Same as above, in the test-model-pr.yml workflow, we should only keep to using the ersilia test command. Same conditions apply for handling and uploading the logs and results as artifacts with retention of 14 days.
Refactor the upload-ersilia-pack.yml, and upload-bentoml.yml workflows to only build and publish model images (both for ARM and AMD), ie we can remove the testing logic from these workflows. These images should be tagged dev.
Refactor the testing playground to work with specific model ids, as well as image tags.
Create a new test workflow for docker builds that is triggered after the Upload model to DockerHub workflow. This workflow should utilise the Testing Playground utility from Ersilia and test the built model image (however it gets built, ie using Ersilia Pack or legacy approaches). This workflow should run on a matrix of ubuntu-latest, and macos-latest, to ensure that we are also testing the AMD images. Based on the result of this workflows, we can tag the images latest and identify which architectures they successfully work on.
The Post model upload workflow should run at the very last and update necessary metadata stores (Airtable, S3 JSON), and README. We can remove the step to create testing issues for community members at this point from this workflow.

Documentation

ModelTester class used in the test CLI command: https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/developer-docs/model-tester
Testing Playground utility: https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/developer-docs/testing-playground

The text was updated successfully, but these errors were encountered:

GemmaTuron · 2025-01-03T16:13:11Z

Hi @Abellegese or @DhanshreeA

Can you clarify if the test command needs to be modified (according to point 1) or both the test command and the playground will be modified?
The test command only tests the model from source, right? And the only modification we will do to it currently is to test all different combinations of input and output which currently was not happening? Once an output is generated, whichever the format, the next step is to check that the output has the required length, is not none etc?
What are the modifications to do in the testing playground more specifically? Maybe opening one issue with more details for each task would be helpful as those get tackled.
I would also add that Documenting in GitBook is an important part of each task

Abellegese · 2025-01-03T17:23:58Z

Hey @GemmaTuron our plan is to update both pipeline for this functionality. I am creating one issue for both.

Abellegese · 2025-01-05T13:41:43Z

A few more detail about the features has been given here #1488.

GemmaTuron · 2025-01-09T10:00:18Z

We have re-evaluated our strategy for Model Testing and this is what we have finally agreed on:
The Model Testing happens through workflows in two repositories: eos-template and ersilia-maintenance. All those workflows should simply use different flavours of the ersilia test command, the playground will be reserved for testing the Ersilia CLI itself.
The test command will work in three levels. Each level can be run either on MacOS or Linux, for consistency we want to test models in both platforms.

Basic: ersilia test [model_id] --from_dir/from_github/from_s3
- The default test will not fetch the model through Ersilia, only download it locally unless it already exists (from_dir).
- When using the --from_dir flag, the user needs to pass the path to local directory. When using from_github/from_s3, the model will automatically be downloaded in the following directory: eos/tmp/[model_id]
- This command is designed to perform high-level checks, and can be run for example as part of the ersilia-maintenance. The high level checks include:
  - File integrity
  - All metadata fields compliant with the set rules (for metadata that is only included upon the first model incorporation, Contributor, S3, DockerHub and Docker Architecture, the test will only be performed if these fields are available in the .json or .yml files)
  - URLS (model repository, Dockerhub if existing, S3 if existing) are checked
  - Dependencies pinned: all dependencies have a version specified in the .yml or Dockerfile.
  - Model size (total size of all folders, incl. checkpoints). This information will be saved as metadata
Shallow: ersilia test [model_id] --shallow --from_dir/from_github/from_s3/from_dockerhub
- First the model is downloaded and the basic tests are performed. If the model is indicated --from_dockerhub, it will be downloaded from_github in the eos/tmp directory.
- Next the model is fetched and served through Ersilia's CLI. So that:
  - If ersilia test [model_id] --shallow --from_dockerhub what will actually happen is: model test --from_github + model fetch --from_dockerhub
  - If ersilia test [model_id] --shallow --from_dir/from_github/from_s3/from_dockerhub: what will actually happen is model test --from_dir/from_github/from_s3 + model fetch --from_dir [path_to_dir]/[path_to_tmp]
- The environment size is calculated (if from_dir/from:github/from_s3) and saved as metadata
- The container size is calculated (if from_dockerhub) and saved as metadata
- Output correctness:
  - All formats: .json, .csv, .h5
  - Consistency between runs
  - Not nulls
Deep: ersilia test [model_id] --deep --from_dir/from_github/from_s3/from_dockerhub
- All tests in basic and shallow are performed the same way as in --shallow but in addition the computational performance for 1, 50, 100 inputs is measured and stored as metadata (to decide, maybe just one performance, 100, needs to be stored?)

Optional flags to the test command:

as_json: save the output of the test command as an easily parsable .json file
version: image tag for docker image. Default, dev

Once the test command is refactored, the workflows on eos_template need to be modified. In general lines:

model-test-on-pr.yml will use the model test --shallow --from _dir
model-test-source.yml will use the model test --shallow --from_github
model-test-image.yml will use the model test --deep --from_dockerhub
Because some metadata fields will be updated, Airtable will also need to be updated. We need to agree on which fields will be created to prepare the columns on the Airtable board @miquelduranfrigola can you take care of the naming of these?

Responsabilities

@Abellegese will refactor the test command
@DhanshreeA will consolidate the workflows (please revise what we have said to make sure everything makes sense)
@miquelduranfrigola will oversee the general dev and take care of new metadata fields

miquelduranfrigola · 2025-01-10T12:40:04Z

Thanks @GemmaTuron
Yes, I will summarize for all of you how the new and old AirTable fields are called
I will let you know as soon as I have made sufficient progress

DhanshreeA · 2025-01-13T10:00:10Z

So far, work has been completed for 1) default testing, and 2) shallow testing with all combinations for fetching a model. @Abellegese is currently implementing the optional flag as_json (version is implemented).

GemmaTuron · 2025-01-13T10:26:57Z

Thanks @DhanshreeA and @Abellegese . Would be very useful if you can provide here an example with one model of the commands that can be run with the basic and --shallow and what is its output with the different flags, so we can see if any edits need to be made before moving on

Abellegese · 2025-01-13T10:53:49Z

Okay @GemmaTuron we will give it here.

Abellegese · 2025-01-13T18:09:52Z

Hi @GemmaTuron here is the sample for commands

Basic

ersilia test eos3b5e --from_dir/--from_s3/--from_github

Abellegese · 2025-01-13T18:12:21Z

Shallow Dockerhub

ersilia test eos3b5e --shallow --from_dockerhub

Edited
There was two table title mistake and its been corrected.

From Shallow Check Summary to Validation and Size Check Summary
From Model Output Content Validation Summary to Consistency Summary Between Ersilia and Bash Execution Outputs

Abellegese · 2025-01-13T18:22:24Z

Shallow Dockerhub: Env Size instead of Docker Image in this case

Edited
There was two table title mistake and its been corrected.

From Shallow Check Summary to Validation and Size Check Summary

ersilia test eos3b5e --shallow --from_dir/--from_github/--from_s3

Abellegese · 2025-01-13T19:10:34Z

Deep:

ersilia test eos3b5e --deep --from_dir/--from_github/--from_s3

miquelduranfrigola · 2025-01-13T22:37:10Z

On a quick look, this is quite amazing.

DhanshreeA · 2025-01-14T11:05:47Z

Hi @Abellegese some observations from running the test command locally:

For the model eos9gg2, I see the Model Input and Output Type tests as failing whereas that field is present in the metadata, and you can see it through the logs as well. This is when running ersilia test eos9gg2 --from_s3/--from_github:

{'Identifier': 'eos9gg2', 'Slug': 'chemical-space-projections-drugbank', 'Status': 'In progress', 'Title': 'Chemical space 2D projections against DrugBank', 'Description': 'This tool performs PCA, UMAP and tSNE projections taking the DrugBank chemical space as a reference. The Ersilia Compound Embeddings as used as descriptors. Four PCA components and two UMAP and tSNE components are returned.', 'Mode': 'In-house', 'Task': 'Representation', 'Input': 'Compound', 'Input Shape': 'Single', 'Output': 'Descriptor', 'Output Type': 'Float', 'Output Shape': 'List', 'Interpretation': 'Coordinates of 2D projections, namely PCA, UMAP and tSNE.', 'Tag': ['Embedding'], 'Publication': 'https://academic.oup.com/nar/article/52/D1/D1265/7416367', 'Source Code': 'https://github.com/ersilia-os/compound-embedding', 'License': 'GPL-3.0-or-later', 'S3': 'https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9gg2.zip'}

Personal note for me: The DockerHub Architecture field will always fail when fetching the model from S3 because the field will be unset since the DockerHub building happens after S3 upload. (We can change this order).
The --shallow flag is not running the tests it is supposed to run, and this is happening with all the modes. I ran ersilia test eos3b5e --shallow --from_github/--from_dockerhub/--from_s3. I only see the same output as in the default behavior of the test command.
The --shallow flag works with the model eos9gg2 in the DockerHub configuration, ie, ersilia test eos9gg2 --shallow --from_dockerhub, but exits with the following error:

⠙ Performing shallow checks                Validation and Size Check Results                
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Check              ┃                                  Status ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Docker Image Size  │                              1134.70 MB │
├────────────────────┼─────────────────────────────────────────┤
│ Check Single Input │                                ✔ PASSED │
├────────────────────┼─────────────────────────────────────────┤
│ Check Predefined   │                                ✔ PASSED │
│ Example Input      │                                         │
├────────────────────┼─────────────────────────────────────────┤
│ Check Consistency  │                                ✔ PASSED │
│ of Model Output    │                                         │
└────────────────────┴─────────────────────────────────────────┘
⠦ Performing shallow checks 🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

Failed to read CSV from /tmp/tmpjnz08pks/bash_output.csv.
If this error message is not helpful, open an issue at:
 - https://github.com/ersilia-os/ersilia
Or feel free to reach out to us at:
 - hello[at]ersilia.io

If you haven't, try to run your command in verbose mode (-v in the CLI)
 - You will find the console log file in: /home/dee/eos/current.log
Run process finished successfully.

For the GitHub, S3, or Dir options, the shallow flags seems to be doing nothing for the model eos9gg2. I see the following output on the terminal:

Performing shallow checks
⠙ Performing shallow checks Run process finished successfully.

Abellegese · 2025-01-14T11:08:41Z

Hi @DhanshreeA okay right, what was the command?

DhanshreeA · 2025-01-14T11:10:02Z

Hi @DhanshreeA okay right, what was the command?

@Abellegese updated my comment.

DhanshreeA · 2025-01-14T12:28:24Z

Okay, I think the shallow checks are working but not being displayed for some reason? Here are the logs when I run the test command in verbose mode:

ersilia -v test eos3b5e --shallow --from_dockerhub:

eos3b5e_test.log

What's more weird is that when I run it without verbosity, the process simply appears to exit.

Abellegese · 2025-01-14T12:49:21Z

@DhanshreeA from this log seems nothing went wrong.

Abellegese · 2025-01-14T17:42:15Z

Updates: eos9gg2 with json file report

eos9gg2-test.json

Abellegese · 2025-01-14T22:36:27Z

Updates

I pushed which I believe the final refactoring that has everything specified in the task
I kept as_json flas as bool and save the report in the pwd named `eosxxxx-test.json'
I tested eos7jio from docker/from git./from s3 but the s3 does not have run.sh in the model/framework folder. This folder does not have anything in it except README file. We cant run --from_s3 in this case.
The command can update either of the metadata file, in the --deep --from_dir/--from_github/--from_s3 and it can save Env Size and CP (for three run)
For --deep --from_dockerhub it save the Image size and CP in the metadata

Note: I tested it locally extensively and will never know what will happen when you try this out and I will take responsibility for that.

DhanshreeA · 2025-01-15T07:09:02Z

@DhanshreeA from this log seems nothing went wrong.

@Abellegese that's the point. In the verbose mode, it was fine, but without the flag set up, at least for eos3b5e, the process simply exited with nothing printed on the screen.

However, with the recent push, I see some changes, but for some reason, Ersilia crashed with a Subprocess execution failed error. Attaching logs:

           Model Information Checks            
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Check                          ┃     Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Model ID                       │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Slug                     │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Status                   │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Title                    │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Description              │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Task                     │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Input                    │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Input Shape              │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Output                   │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Output Type              │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Output Shape             │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Interpretation           │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Tag                      │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Publication              │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Source Code              │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Contributor              │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Dockerhub URL            │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model S3 URL                   │   ✔ PASSED │
└────────────────────────────────┴────────────┘
               Model File Checks               
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Check                          ┃     Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ File: Dockerfile               │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ File: metadata.json            │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ File: model/framework/run.sh   │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ File: src/service.py           │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ File: pack.py                  │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ File: README.md                │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ File: LICENSE                  │   ✔ PASSED │
└────────────────────────────────┴────────────┘
             Model Directory Sizes             
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Check                          ┃       Size ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Directory                      │        1MB │
└────────────────────────────────┴────────────┘
                                   Dependency Check                                    
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Check                          ┃                                             Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Dockerfile Check               │                                           ✔ PASSED │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Check Details                  │                 Dockerfile dependencies are valid. │
└────────────────────────────────┴────────────────────────────────────────────────────┘
* Basic checks completed!
Performing shallow checks...
⠦ Performing shallow checks No predefined examples found for the model. Generating random examples.
⠇ Performing shallow checks                            Validation and Size Check Results                           
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Check                          ┃                                             Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Image Size                     │                                          353.22 MB │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Check Single Input             │                                           ✔ PASSED │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Check Predefined Example Input │                                           ✔ PASSED │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Check Consistency of Model     │                                           ✔ PASSED │
│ Output                         │                                                    │
└────────────────────────────────┴────────────────────────────────────────────────────┘
⠙ Performing shallow checks 🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

Subprocess execution failed.
If this error message is not helpful, open an issue at:
 - https://github.com/ersilia-os/ersilia
Or feel free to reach out to us at:
 - hello[at]ersilia.io

If you haven't, try to run your command in verbose mode (-v in the CLI)
 - You will find the console log file in: /home/dee/eos/current.log
Run process completed.

I think this specifically failed in the CSV test because I do not see a file.csv having been created, however I do see file.h5, and file.json.

DhanshreeA · 2025-01-15T07:09:40Z

DhanshreeA · 2025-01-15T07:11:02Z

Updates

* I pushed which I believe the final refactoring that has everything specified in the task

* I kept `as_json` flas as bool and save the report in the `pwd` named `eosxxxx-test.json'

* I tested eos7jio from docker/from git./from s3 but the s3 does not have `run.sh` in the `model/framework` folder. This folder does not have anything in it except README file. We cant run --from_s3 in this case.

* The command can update either of the metadata file, in the `--deep --from_dir/--from_github/--from_s3` and it can save Env Size and CP (for three run)

* For `--deep --from_dockerhub` it save the Image size and CP in the metadata

Note: I tested it locally extensively and will never know what will happen when you try this out and I will take responsibility for that.

@Abellegese The S3 issue with eos7jio is understandable, the model was refactored last week, and since our workflows are not working properly, the refactored version which has run.sh could not get uploaded to S3.

DhanshreeA · 2025-01-15T07:18:47Z

Model eos3b5e

Notes:

The as_json flag seems to not be having any effect as I do not see any JSON file being serialized.
Again without the -v flag, the shallow checks don't display on the terminal, and I see the following logs:

* Basic checks completed!
Performing shallow checks...
⠸ Performing shallow checks Run process completed.

I think I figured out the issue with the shallow exiting without printing anything. I think when it runs into an error, it silently exits instead of printing anything. And this is only caught within the verbose mode.

DhanshreeA · 2025-01-15T08:45:11Z

@Abellegese we can format the output in this table to display better:

                           Computational Performance Summary                           
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Check                          ┃                                             Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Computational Performance      │                                           ✔ PASSED │
│ Tracking                       │                                                    │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Computational Performance      │         1 predictions executed in 9.88 seconds. 10 │
│ Tracking Details               │          predictions executed in 9.92 seconds. 100 │
│                                │             predictions executed in 10.01 seconds. │
└────────────────────────────────┴────────────────────────────────────────────────────┘

We simply only need a newline character after each line in the Computational Performance Tracking Details.

Abellegese · 2025-01-15T08:47:42Z

@DhanshreeA yes nice idea.

DhanshreeA · 2025-01-15T09:05:50Z

Another thing, when running deep checks, this is what I see:

* Basic checks completed!
Performing deep checks...
⠇ Performing shallow checks

We shouldn't see the line Performing deep checks until the checks have actually started.

DhanshreeA · 2025-01-15T09:08:07Z

I am not sure I understand this table:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Check                          ┃                                             Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Environment Size               │                                              475MB │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Check Single Input             │                                           ✔ PASSED │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Check Predefined Example Input │                                           ✔ PASSED │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Check Consistency of Model     │                                           ✔ PASSED │
│ Output                         │                                                    │
└────────────────────────────────┴────────────────────────────────────────────────────┘

Especially the Check Single Input and Check Predefined Example Input fields. This comes from the shallow checks.

DhanshreeA · 2025-01-15T09:13:17Z

Note on --as_json:

{
    "ModelInformationChecks": [
        {
            "Check": "Model ID",
            "Status": "PASSED"
        },
        {
            "Check": "Model Slug",
            "Status": "PASSED"
        },
        {
            "Check": "Model Status",
            "Status": "PASSED"
        },
        {
            "Check": "Model Title",
            "Status": "PASSED"
        },
        {
            "Check": "Model Description",
            "Status": "PASSED"
        },
        {
            "Check": "Model Task",
            "Status": "PASSED"
        },
        {
            "Check": "Model Input",
            "Status": "PASSED"
        },
        {
            "Check": "Model Input Shape",
            "Status": "PASSED"
        },
        {
            "Check": "Model Output",
            "Status": "PASSED"
        },
        {
            "Check": "Model Output Type",
            "Status": "PASSED"
        },
        {
            "Check": "Model Output Shape",
            "Status": "PASSED"
        },
        {
            "Check": "Model Interpretation",
            "Status": "PASSED"
        },
        {
            "Check": "Model Tag",
            "Status": "PASSED"
        },
        {
            "Check": "Model Publication",
            "Status": "PASSED"
        },
        {
            "Check": "Model Source Code",
            "Status": "PASSED"
        },
        {
            "Check": "Model Contributor",
            "Status": "PASSED"
        },
        {
            "Check": "Model Dockerhub URL",
            "Status": "PASSED"
        },
        {
            "Check": "Model S3 URL",
            "Status": "PASSED"
        },
        {
            "Check": "Model Docker Architecture",
            "Status": "PASSED"
        }
    ],
    "ModelFileChecks": [
        {
            "Check": "File: Dockerfile",
            "Status": "PASSED"
        },
        {
            "Check": "File: metadata.json",
            "Status": "PASSED"
        },
        {
            "Check": "File: model/framework/run.sh",
            "Status": "PASSED"
        },
        {
            "Check": "File: src/service.py",
            "Status": "PASSED"
        },
        {
            "Check": "File: pack.py",
            "Status": "PASSED"
        },
        {
            "Check": "File: README.md",
            "Status": "PASSED"
        },
        {
            "Check": "File: LICENSE",
            "Status": "PASSED"
        }
    ],
    "ModelDirectorySizes": [
        {
            "Check": "Directory",
            "Size": "1MB"
        }
    ],
    "DependencyCheck": [
        {
            "Check": "Dockerfile Check",
            "Status": "PASSED"
        },
        {
            "Check": "Check Details",
            "Status": "Dockerfile dependencies are valid."
        }
    ],
    "ValidationandSizeCheckResults": [
        {
            "Check": "Environment Size",
            "Status": "475MB"
        },
        {
            "Check": "Check Single Input",
            "Status": "PASSED"
        },
        {
            "Check": "Check Predefined Example Input",
            "Status": "PASSED"
        },
        {
            "Check": "Check Consistency of Model Output",
            "Status": "PASSED"
        }
    ],
    "ConsistencySummaryBetweenErsiliaandBashExecutionOutputs": [],
    "ModelOutputContentValidationSummary": [
        {
            "Check": "str : CSV",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "str : JSON",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "str : HDF5",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "list : CSV",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "list : JSON",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "list : HDF5",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "csv : CSV",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "csv : JSON",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "csv : HDF5",
            "Detail": "Valid Content",
            "Status": "PASSED"
        }
    ],
    "ComputationalPerformanceSummary": [
        {
            "Check": "Computational Performance Tracking",
            "Status": "PASSED"
        },
        {
            "Check": "Computational Performance Tracking Details",
            "Status": "1 predictions executed in 9.86 seconds. 10 predictions executed in 9.93 seconds. 100 predictions executed in 10.01 seconds."
        }
    ]
}

This needs to be changed to something more machine readable where it's straightforward to do json[key] and get the value. Also the file is not being saved in the PWD, it's being saved in PWD/.. (ie a dir above that). This needs to change.

DhanshreeA · 2025-01-15T09:59:48Z

Okay, so the next steps involve updating ersilia such that we can actually update the new metadata fields to Airtable. Mainly the airtableops.py script takes care of that. The function update_metadata_to_airtable runs to read the metadata file from the repo and then uses that to update the corresponding fields for a model in AirTable.

This function utilizes the RepoMetadataFile class which in turn uses the BaseInformation class to read these fields. Mainly this class is what we want to update with the new fields. @Abellegese should open a PR to make this change, and also add a unit test for this. The test fixture should have an ideal metadata.json file and similarly a metadata.yaml file with these new fields. And RepoMetadataClass should be able to read from these files correctly.

The updated fields include:

Docker Pack Method
Environment Size
Image Size
Computational Performance 1
Computational Performance 10
Computational Performance 100

As per @miquelduranfrigola these fields have been created in the Airtable DB.

DhanshreeA · 2025-01-15T10:22:26Z

Moreover, just for neatness, we should filter these warning logs from fuzzywuzzy:

/Users/mduranfrigola/miniconda3/envs/ersilia/lib/python3.11/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
 warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')

DhanshreeA · 2025-01-15T11:34:16Z

Let's update the JSON Format to something like:

{
    "model_information_checks": {
        "model_id": true,
        "model_slug": true,
        "model_status": true,
        "model_title": true,
        "model_description": true,
        "model_task": true,
        "model_input": true,
        "model_input_shape": true,
        "model_output": true,
        "model_output_type": true,
        "model_output_shape": true,
        "model_interpretation": true,
        "model_tag": true,
        "model_publication": true,
        "model_source_code": true,
        "model_contributor": true,
        "model_dockerhub_url": true,
        "model_s3_url": true,
        "model_docker_architecture": true
    },
    "model_file_checks": {
        "dockerfile": true,
        "metadata_json": true,
        "model_framework_run_sh": true,
        "src_service_py": true,
        "pack_py": true,
        "readme_md": true,
        "license": true
    },
    "model_directory_sizes": {
        "directory_size_mb": 1
    },
    "dependency_check": {
        "dockerfile_check": true,
        "check_details": "Dockerfile dependencies are valid."
    },
    "validation_and_size_check_results": {
        "environment_size_mb": 475,
        "check_single_input": true,
        "check_predefined_example_input": true,
        "check_consistency_of_model_output": true
    },
    "consistency_summary_between_ersilia_and_bash_execution_outputs": [],
    "model_output_content_validation_summary": {
        "str_csv": true,
        "str_json": true,
        "str_hdf5": true,
        "list_csv": true,
        "list_json": true,
        "list_hdf5": true,
        "csv_csv": true,
        "csv_json": true,
        "csv_hdf5": true
    },
    "computational_performance_summary": {
            "pred_1": 9.86,
            "pred_10": 9.93,
            "pred_100": 10.01
        }
}

Abellegese · 2025-01-15T12:38:30Z

Noted @DhanshreeA .

Abellegese · 2025-01-16T22:14:33Z

Updates

I refactored the eoseb5e model locally to make it to return invalid values for several datastructure output such as Single, List, Flexible List, Matrix and Serializable Object
I added a detailed traceback based error tracking to be able to display the error without the verbosity mode
I updated the way json report is created, meaning it can now save the runtime error in it and the code moved to the finally in the try-excepy block to make the result json file be created even at the failure case with the detailed runtime error in it.
The file content for combination of input and output was implemented in this check_model_output_content function but now also other checks in the model tester that executes run command now have validation for their output content.
Performed code clean up

Here is sample output to test invalid contents

github-project-automation bot added this to Ersilia Model Hub Jan 3, 2025

github-project-automation bot moved this to On Hold in Ersilia Model Hub Jan 3, 2025

DhanshreeA removed the status in Ersilia Model Hub Jan 3, 2025

Abellegese self-assigned this Jan 3, 2025

GemmaTuron changed the title ~~🐕 Batch: Refactoring Test workfllows in models~~ 🐕 Batch: Refactoring Test workflows in models Jan 3, 2025

This was referenced Jan 3, 2025

📑 Feature Request: Extensions for updated test script #1228

Open

🐈 Task: Run model repo syncing workflow in a cron job #1466

Closed

Abellegese moved this to In Progress in Ersilia Model Hub Jan 3, 2025

Abellegese mentioned this issue Jan 7, 2025

Testing playground feature updates #1492

Open

3 tasks

DhanshreeA mentioned this issue Jan 8, 2025

Refactor test and build pipelines ersilia-os/eos-template#87

Draft

4 tasks

GemmaTuron mentioned this issue Jan 10, 2025

🐛 Bug: Ersilia Test on PY3.12 #1494

Closed

DhanshreeA mentioned this issue Jan 14, 2025

Test command refactor #1506

Closed

DhanshreeA mentioned this issue Jan 15, 2025

🐈 Task: Revise resolution method for Dockerized images in models #1509

Open

4 tasks

Abellegese mentioned this issue Jan 16, 2025

Airtable field update and add unit testing #1514

Open

Abellegese linked a pull request Jan 16, 2025 that will close this issue

Refactoring Test Command #1515

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐕 Batch: Refactoring Test workflows in models #1484

🐕 Batch: Refactoring Test workflows in models #1484

DhanshreeA commented Jan 3, 2025 •

edited by Abellegese

Loading

GemmaTuron commented Jan 3, 2025

Abellegese commented Jan 3, 2025

Abellegese commented Jan 5, 2025

GemmaTuron commented Jan 9, 2025 •

edited by DhanshreeA

Loading

miquelduranfrigola commented Jan 10, 2025 •

edited

Loading

DhanshreeA commented Jan 13, 2025 •

edited

Loading

GemmaTuron commented Jan 13, 2025

Abellegese commented Jan 13, 2025

Abellegese commented Jan 13, 2025

Abellegese commented Jan 13, 2025 •

edited

Loading

Abellegese commented Jan 13, 2025 •

edited

Loading

Abellegese commented Jan 13, 2025 •

edited

Loading

miquelduranfrigola commented Jan 13, 2025

DhanshreeA commented Jan 14, 2025 •

edited

Loading

Abellegese commented Jan 14, 2025

DhanshreeA commented Jan 14, 2025

DhanshreeA commented Jan 14, 2025

Abellegese commented Jan 14, 2025

Abellegese commented Jan 14, 2025 •

edited

Loading

Abellegese commented Jan 14, 2025

DhanshreeA commented Jan 15, 2025 •

edited

Loading

DhanshreeA commented Jan 15, 2025 •

edited

Loading

DhanshreeA commented Jan 15, 2025

DhanshreeA commented Jan 15, 2025 •

edited

Loading

DhanshreeA commented Jan 15, 2025

Abellegese commented Jan 15, 2025

DhanshreeA commented Jan 15, 2025

DhanshreeA commented Jan 15, 2025

DhanshreeA commented Jan 15, 2025

DhanshreeA commented Jan 15, 2025

DhanshreeA commented Jan 15, 2025

DhanshreeA commented Jan 15, 2025

Abellegese commented Jan 15, 2025

Abellegese commented Jan 16, 2025

🐕 Batch: Refactoring Test workflows in models #1484

🐕 Batch: Refactoring Test workflows in models #1484

Comments

DhanshreeA commented Jan 3, 2025 • edited by Abellegese Loading

Summary

Objective(s)

Documentation

GemmaTuron commented Jan 3, 2025

Abellegese commented Jan 3, 2025

Abellegese commented Jan 5, 2025

GemmaTuron commented Jan 9, 2025 • edited by DhanshreeA Loading

miquelduranfrigola commented Jan 10, 2025 • edited Loading

DhanshreeA commented Jan 13, 2025 • edited Loading

GemmaTuron commented Jan 13, 2025

Abellegese commented Jan 13, 2025

Abellegese commented Jan 13, 2025

Abellegese commented Jan 13, 2025 • edited Loading

Abellegese commented Jan 13, 2025 • edited Loading

Abellegese commented Jan 13, 2025 • edited Loading

miquelduranfrigola commented Jan 13, 2025

DhanshreeA commented Jan 14, 2025 • edited Loading

Abellegese commented Jan 14, 2025

DhanshreeA commented Jan 14, 2025

DhanshreeA commented Jan 14, 2025

Abellegese commented Jan 14, 2025

Abellegese commented Jan 14, 2025 • edited Loading

Abellegese commented Jan 14, 2025

DhanshreeA commented Jan 15, 2025 • edited Loading

DhanshreeA commented Jan 15, 2025 • edited Loading

DhanshreeA commented Jan 15, 2025

DhanshreeA commented Jan 15, 2025 • edited Loading

DhanshreeA commented Jan 15, 2025

Abellegese commented Jan 15, 2025

DhanshreeA commented Jan 15, 2025

DhanshreeA commented Jan 15, 2025

DhanshreeA commented Jan 15, 2025

DhanshreeA commented Jan 15, 2025

DhanshreeA commented Jan 15, 2025

DhanshreeA commented Jan 15, 2025

Abellegese commented Jan 15, 2025

Abellegese commented Jan 16, 2025

DhanshreeA commented Jan 3, 2025 •

edited by Abellegese

Loading

GemmaTuron commented Jan 9, 2025 •

edited by DhanshreeA

Loading

miquelduranfrigola commented Jan 10, 2025 •

edited

Loading

DhanshreeA commented Jan 13, 2025 •

edited

Loading

Abellegese commented Jan 13, 2025 •

edited

Loading

Abellegese commented Jan 13, 2025 •

edited

Loading

Abellegese commented Jan 13, 2025 •

edited

Loading

DhanshreeA commented Jan 14, 2025 •

edited

Loading

Abellegese commented Jan 14, 2025 •

edited

Loading

DhanshreeA commented Jan 15, 2025 •

edited

Loading

DhanshreeA commented Jan 15, 2025 •

edited

Loading

DhanshreeA commented Jan 15, 2025 •

edited

Loading