Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐕 Batch: Refactoring Test workflows in models #1484

Open
1 of 7 tasks
DhanshreeA opened this issue Jan 3, 2025 · 34 comments · May be fixed by #1515
Open
1 of 7 tasks

🐕 Batch: Refactoring Test workflows in models #1484

DhanshreeA opened this issue Jan 3, 2025 · 34 comments · May be fixed by #1515
Assignees

Comments

@DhanshreeA
Copy link
Member

DhanshreeA commented Jan 3, 2025

Summary

This issue will encompass efforts to reconcile, clean up, and enhance our test (and build) pipelines for individual models.

We currently have a test module and CLI command (ersilia test ...) that can check a given model for functionality, completeness, and correctness. In addition to this, we also have a testing playground - a test utility which checks a given model for functionality, completeness, and correctness; and is able to simulate running one or more models on a user's system.

Existing test in our model pipeline is quite redundant in face of these functionalities because it is quite naive in comparison since it only tests nullity in model predictions, and is not robust to how a model might serialize its outputs. Moreover, the Docker build pipelines are bloated with code that can be removed in favor of a singular workflow testing the built images. We also need to handle testing for ARM and AMD builds more smartly since currently we only test the AMD images, but recently we have experienced some models successfully building for the ARM platform but then not actually working.

Furthermore, we need to revisit H5 serialization within Ersilia, and also include tests for this functionality at the level of testing models.

Each of the objectives below should be considered individual tasks, and should be addressed in separate PRs referencing this issue.

Objective(s)

  • Consolidate the following input-output combinations in the testing scenarios covered by the ersilia test command:
  1. Input = CSV - Output = CSV
  2. Input = CSV - Output = HDF5
  3. Input = CSV - Output = JSON
  4. Input = SMILE - Output = CSV
  5. Input = SMILE - Output = HDF5
  6. Input = SMILE - Output = JSON
  • For the test-model.yml workflow, we should remove the current testing logic (L128-L144) and keep it in favor of only using the ersilia test command. We also want to upload the logs generated from this command, as well as the results of this command as artifacts with a retention period of 14 days.
  • Same as above, in the test-model-pr.yml workflow, we should only keep to using the ersilia test command. Same conditions apply for handling and uploading the logs and results as artifacts with retention of 14 days.
  • Refactor the upload-ersilia-pack.yml, and upload-bentoml.yml workflows to only build and publish model images (both for ARM and AMD), ie we can remove the testing logic from these workflows. These images should be tagged dev.
  • Refactor the testing playground to work with specific model ids, as well as image tags.
  • Create a new test workflow for docker builds that is triggered after the Upload model to DockerHub workflow. This workflow should utilise the Testing Playground utility from Ersilia and test the built model image (however it gets built, ie using Ersilia Pack or legacy approaches). This workflow should run on a matrix of ubuntu-latest, and macos-latest, to ensure that we are also testing the AMD images. Based on the result of this workflows, we can tag the images latest and identify which architectures they successfully work on.
  • The Post model upload workflow should run at the very last and update necessary metadata stores (Airtable, S3 JSON), and README. We can remove the step to create testing issues for community members at this point from this workflow.

Documentation

  1. ModelTester class used in the test CLI command: https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/developer-docs/model-tester
  2. Testing Playground utility: https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/developer-docs/testing-playground
@DhanshreeA DhanshreeA removed the status in Ersilia Model Hub Jan 3, 2025
@Abellegese Abellegese self-assigned this Jan 3, 2025
@GemmaTuron GemmaTuron changed the title 🐕 Batch: Refactoring Test workfllows in models 🐕 Batch: Refactoring Test workflows in models Jan 3, 2025
@GemmaTuron
Copy link
Member

Hi @Abellegese or @DhanshreeA

Can you clarify if the test command needs to be modified (according to point 1) or both the test command and the playground will be modified?
The test command only tests the model from source, right? And the only modification we will do to it currently is to test all different combinations of input and output which currently was not happening? Once an output is generated, whichever the format, the next step is to check that the output has the required length, is not none etc?
What are the modifications to do in the testing playground more specifically? Maybe opening one issue with more details for each task would be helpful as those get tackled.
I would also add that Documenting in GitBook is an important part of each task

@Abellegese
Copy link
Contributor

Hey @GemmaTuron our plan is to update both pipeline for this functionality. I am creating one issue for both.

@Abellegese
Copy link
Contributor

A few more detail about the features has been given here #1488.

@GemmaTuron
Copy link
Member

GemmaTuron commented Jan 9, 2025

We have re-evaluated our strategy for Model Testing and this is what we have finally agreed on:
The Model Testing happens through workflows in two repositories: eos-template and ersilia-maintenance. All those workflows should simply use different flavours of the ersilia test command, the playground will be reserved for testing the Ersilia CLI itself.
The test command will work in three levels. Each level can be run either on MacOS or Linux, for consistency we want to test models in both platforms.

  1. Basic: ersilia test [model_id] --from_dir/from_github/from_s3
    • The default test will not fetch the model through Ersilia, only download it locally unless it already exists (from_dir).
    • When using the --from_dir flag, the user needs to pass the path to local directory. When using from_github/from_s3, the model will automatically be downloaded in the following directory: eos/tmp/[model_id]
    • This command is designed to perform high-level checks, and can be run for example as part of the ersilia-maintenance. The high level checks include:
      • File integrity
      • All metadata fields compliant with the set rules (for metadata that is only included upon the first model incorporation, Contributor, S3, DockerHub and Docker Architecture, the test will only be performed if these fields are available in the .json or .yml files)
      • URLS (model repository, Dockerhub if existing, S3 if existing) are checked
      • Dependencies pinned: all dependencies have a version specified in the .yml or Dockerfile.
      • Model size (total size of all folders, incl. checkpoints). This information will be saved as metadata
  2. Shallow: ersilia test [model_id] --shallow --from_dir/from_github/from_s3/from_dockerhub
    • First the model is downloaded and the basic tests are performed. If the model is indicated --from_dockerhub, it will be downloaded from_github in the eos/tmp directory.
    • Next the model is fetched and served through Ersilia's CLI. So that:
      • If ersilia test [model_id] --shallow --from_dockerhub what will actually happen is: model test --from_github + model fetch --from_dockerhub
      • If ersilia test [model_id] --shallow --from_dir/from_github/from_s3/from_dockerhub: what will actually happen is model test --from_dir/from_github/from_s3 + model fetch --from_dir [path_to_dir]/[path_to_tmp]
    • The environment size is calculated (if from_dir/from:github/from_s3) and saved as metadata
    • The container size is calculated (if from_dockerhub) and saved as metadata
    • Output correctness:
      • All formats: .json, .csv, .h5
      • Consistency between runs
      • Not nulls
  3. Deep: ersilia test [model_id] --deep --from_dir/from_github/from_s3/from_dockerhub
    • All tests in basic and shallow are performed the same way as in --shallow but in addition the computational performance for 1, 50, 100 inputs is measured and stored as metadata (to decide, maybe just one performance, 100, needs to be stored?)

Optional flags to the test command:

  • as_json: save the output of the test command as an easily parsable .json file
  • version: image tag for docker image. Default, dev

Once the test command is refactored, the workflows on eos_template need to be modified. In general lines:

  • model-test-on-pr.yml will use the model test --shallow --from _dir
  • model-test-source.yml will use the model test --shallow --from_github
  • model-test-image.yml will use the model test --deep --from_dockerhub
    Because some metadata fields will be updated, Airtable will also need to be updated. We need to agree on which fields will be created to prepare the columns on the Airtable board @miquelduranfrigola can you take care of the naming of these?

Responsabilities

  • @Abellegese will refactor the test command
  • @DhanshreeA will consolidate the workflows (please revise what we have said to make sure everything makes sense)
  • @miquelduranfrigola will oversee the general dev and take care of new metadata fields

@miquelduranfrigola
Copy link
Member

miquelduranfrigola commented Jan 10, 2025

Thanks @GemmaTuron
Yes, I will summarize for all of you how the new and old AirTable fields are called
I will let you know as soon as I have made sufficient progress

@DhanshreeA
Copy link
Member Author

DhanshreeA commented Jan 13, 2025

So far, work has been completed for 1) default testing, and 2) shallow testing with all combinations for fetching a model. @Abellegese is currently implementing the optional flag as_json (version is implemented).

@GemmaTuron
Copy link
Member

Thanks @DhanshreeA and @Abellegese . Would be very useful if you can provide here an example with one model of the commands that can be run with the basic and --shallow and what is its output with the different flags, so we can see if any edits need to be made before moving on

@Abellegese
Copy link
Contributor

Okay @GemmaTuron we will give it here.

@Abellegese
Copy link
Contributor

Hi @GemmaTuron here is the sample for commands

Basic

  • ersilia test eos3b5e --from_dir/--from_s3/--from_github

basic_check

@Abellegese
Copy link
Contributor

Abellegese commented Jan 13, 2025

Shallow Dockerhub

ersilia test eos3b5e --shallow --from_dockerhub

Edited
There was two table title mistake and its been corrected.

  1. From Shallow Check Summary to Validation and Size Check Summary
  2. From Model Output Content Validation Summary to Consistency Summary Between Ersilia and Bash Execution Outputs

shallow

@Abellegese
Copy link
Contributor

Abellegese commented Jan 13, 2025

Shallow Dockerhub: Env Size instead of Docker Image in this case

Edited
There was two table title mistake and its been corrected.

  1. From Shallow Check Summary to Validation and Size Check Summary

ersilia test eos3b5e --shallow --from_dir/--from_github/--from_s3

git

@Abellegese
Copy link
Contributor

Abellegese commented Jan 13, 2025

Deep:

ersilia test eos3b5e --deep --from_dir/--from_github/--from_s3

comp

@miquelduranfrigola
Copy link
Member

On a quick look, this is quite amazing.

@DhanshreeA
Copy link
Member Author

DhanshreeA commented Jan 14, 2025

Hi @Abellegese some observations from running the test command locally:

  • For the model eos9gg2, I see the Model Input and Output Type tests as failing whereas that field is present in the metadata, and you can see it through the logs as well. This is when running ersilia test eos9gg2 --from_s3/--from_github:
{'Identifier': 'eos9gg2', 'Slug': 'chemical-space-projections-drugbank', 'Status': 'In progress', 'Title': 'Chemical space 2D projections against DrugBank', 'Description': 'This tool performs PCA, UMAP and tSNE projections taking the DrugBank chemical space as a reference. The Ersilia Compound Embeddings as used as descriptors. Four PCA components and two UMAP and tSNE components are returned.', 'Mode': 'In-house', 'Task': 'Representation', 'Input': 'Compound', 'Input Shape': 'Single', 'Output': 'Descriptor', 'Output Type': 'Float', 'Output Shape': 'List', 'Interpretation': 'Coordinates of 2D projections, namely PCA, UMAP and tSNE.', 'Tag': ['Embedding'], 'Publication': 'https://academic.oup.com/nar/article/52/D1/D1265/7416367', 'Source Code': 'https://github.com/ersilia-os/compound-embedding', 'License': 'GPL-3.0-or-later', 'S3': 'https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9gg2.zip'}
  • Personal note for me: The DockerHub Architecture field will always fail when fetching the model from S3 because the field will be unset since the DockerHub building happens after S3 upload. (We can change this order).
  • The --shallow flag is not running the tests it is supposed to run, and this is happening with all the modes. I ran ersilia test eos3b5e --shallow --from_github/--from_dockerhub/--from_s3. I only see the same output as in the default behavior of the test command.
  • The --shallow flag works with the model eos9gg2 in the DockerHub configuration, ie, ersilia test eos9gg2 --shallow --from_dockerhub, but exits with the following error:
Performing shallow checks                Validation and Size Check Results                
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ CheckStatus ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Docker Image Size1134.70 MB │
├────────────────────┼─────────────────────────────────────────┤
│ Check Single Input │                                ✔ PASSED │
├────────────────────┼─────────────────────────────────────────┤
│ Check Predefined   │                                ✔ PASSED │
│ Example Input      │                                         │
├────────────────────┼─────────────────────────────────────────┤
│ Check Consistency  │                                ✔ PASSED │
│ of Model Output    │                                         │
└────────────────────┴─────────────────────────────────────────┘
⠦ Performing shallow checks 🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

Failed to read CSV from /tmp/tmpjnz08pks/bash_output.csv.
If this error message is not helpful, open an issue at:
 - https://github.com/ersilia-os/ersilia
Or feel free to reach out to us at:
 - hello[at]ersilia.io

If you haven't, try to run your command in verbose mode (-v in the CLI)
 - You will find the console log file in: /home/dee/eos/current.log
Run process finished successfully.
  • For the GitHub, S3, or Dir options, the shallow flags seems to be doing nothing for the model eos9gg2. I see the following output on the terminal:
Performing shallow checks
⠙ Performing shallow checks Run process finished successfully.

@Abellegese
Copy link
Contributor

Hi @DhanshreeA okay right, what was the command?

@DhanshreeA
Copy link
Member Author

Hi @DhanshreeA okay right, what was the command?

@Abellegese updated my comment.

@DhanshreeA
Copy link
Member Author

Okay, I think the shallow checks are working but not being displayed for some reason? Here are the logs when I run the test command in verbose mode:

ersilia -v test eos3b5e --shallow --from_dockerhub:

eos3b5e_test.log

What's more weird is that when I run it without verbosity, the process simply appears to exit.

@Abellegese
Copy link
Contributor

@DhanshreeA from this log seems nothing went wrong.

@Abellegese
Copy link
Contributor

Abellegese commented Jan 14, 2025

Updates: eos9gg2 with json file report

Image

eos9gg2-test.json

@Abellegese
Copy link
Contributor

Updates

  • I pushed which I believe the final refactoring that has everything specified in the task
  • I kept as_json flas as bool and save the report in the pwd named `eosxxxx-test.json'
  • I tested eos7jio from docker/from git./from s3 but the s3 does not have run.sh in the model/framework folder. This folder does not have anything in it except README file. We cant run --from_s3 in this case.
  • The command can update either of the metadata file, in the --deep --from_dir/--from_github/--from_s3 and it can save Env Size and CP (for three run)
  • For --deep --from_dockerhub it save the Image size and CP in the metadata

Note: I tested it locally extensively and will never know what will happen when you try this out and I will take responsibility for that.

@DhanshreeA
Copy link
Member Author

DhanshreeA commented Jan 15, 2025

@DhanshreeA from this log seems nothing went wrong.

@Abellegese that's the point. In the verbose mode, it was fine, but without the flag set up, at least for eos3b5e, the process simply exited with nothing printed on the screen.

However, with the recent push, I see some changes, but for some reason, Ersilia crashed with a Subprocess execution failed error. Attaching logs:

           Model Information Checks            
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Check                          ┃     Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Model ID                       │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Slug                     │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Status                   │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Title                    │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Description              │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Task                     │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Input                    │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Input Shape              │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Output                   │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Output Type              │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Output Shape             │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Interpretation           │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Tag                      │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Publication              │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Source Code              │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Contributor              │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model Dockerhub URL            │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ Model S3 URL                   │   ✔ PASSED │
└────────────────────────────────┴────────────┘
               Model File Checks               
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Check                          ┃     Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ File: Dockerfile               │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ File: metadata.json            │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ File: model/framework/run.sh   │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ File: src/service.py           │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ File: pack.py                  │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ File: README.md                │   ✔ PASSED │
├────────────────────────────────┼────────────┤
│ File: LICENSE                  │   ✔ PASSED │
└────────────────────────────────┴────────────┘
             Model Directory Sizes             
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Check                          ┃       Size ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Directory                      │        1MB │
└────────────────────────────────┴────────────┘
                                   Dependency Check                                    
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Check                          ┃                                             Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Dockerfile Check               │                                           ✔ PASSED │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Check Details                  │                 Dockerfile dependencies are valid. │
└────────────────────────────────┴────────────────────────────────────────────────────┘
* Basic checks completed!
Performing shallow checks...
⠦ Performing shallow checks No predefined examples found for the model. Generating random examples.
⠇ Performing shallow checks                            Validation and Size Check Results                           
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Check                          ┃                                             Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Image Size                     │                                          353.22 MB │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Check Single Input             │                                           ✔ PASSED │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Check Predefined Example Input │                                           ✔ PASSED │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Check Consistency of Model     │                                           ✔ PASSED │
│ Output                         │                                                    │
└────────────────────────────────┴────────────────────────────────────────────────────┘
⠙ Performing shallow checks 🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

Subprocess execution failed.
If this error message is not helpful, open an issue at:
 - https://github.com/ersilia-os/ersilia
Or feel free to reach out to us at:
 - hello[at]ersilia.io

If you haven't, try to run your command in verbose mode (-v in the CLI)
 - You will find the console log file in: /home/dee/eos/current.log
Run process completed.

I think this specifically failed in the CSV test because I do not see a file.csv having been created, however I do see file.h5, and file.json.

@DhanshreeA
Copy link
Member Author

DhanshreeA commented Jan 15, 2025

I am testing the following models, and I will attach logs or share results from each model in a separate comment:

  • eos3b5e
  • eos4e40
  • eos7d58
  • eos9gg2
  • eos3cf4
  • eos7w6n
  • eos4wt0
  • eos2gw4
  • eos7jio
  • eos5axz
  • eos4u6p

@DhanshreeA
Copy link
Member Author

Updates

* I pushed which I believe the final refactoring that has everything specified in the task

* I kept `as_json` flas as bool and save the report in the `pwd` named `eosxxxx-test.json'

* I tested eos7jio from docker/from git./from s3 but the s3 does not have `run.sh` in the `model/framework` folder. This folder does not have anything in it except README file. We cant run --from_s3 in this case.

* The command can update either of the metadata file, in the `--deep --from_dir/--from_github/--from_s3` and it can save Env Size and CP (for three run)

* For `--deep --from_dockerhub` it save the Image size and CP in the metadata

Note: I tested it locally extensively and will never know what will happen when you try this out and I will take responsibility for that.

@Abellegese The S3 issue with eos7jio is understandable, the model was refactored last week, and since our workflows are not working properly, the refactored version which has run.sh could not get uploaded to S3.

@DhanshreeA
Copy link
Member Author

DhanshreeA commented Jan 15, 2025

Model eos3b5e

  • ersilia test eos3b5e --from_github
  • ersilia test eos3b5e --from_s3
  • ersilia test eos3b5e --from_dir
  • ersilia test eos3b5e --from_dockerhub
  • ersilia test eos3b5e --from_github --as_json
  • ersilia test eos3b5e --from_s3 --as_json
  • ersilia test eos3b5e --from_dir --as_json
  • ersilia test eos3b5e --from_dockerhub --as_json
  • ersilia test eos3b5e --from_dockerhub --version
  • ersilia test eos3b5e --from_dockerhub --version dev --as_json
  • ersilia test eos3b5e --shallow --from_github
  • ersilia test eos3b5e --shallow --from_s3
  • ersilia test eos3b5e --shallow --from_dir
  • ersilia test eos3b5e --shallow --from_dockerhub
  • ersilia test eos3b5e --shallow --from_dockerhub --version
  • ersilia test eos3b5e --shallow --from_github --as_json
  • ersilia test eos3b5e --shallow --from_s3 --as_json
  • ersilia test eos3b5e --shallow --from_dir --as_json
  • ersilia test eos3b5e --shallow --from_dockerhub --as_json
  • ersilia test eos3b5e --shallow --from_dockerhub --version
  • ersilia test eos3b5e --deep --from_github
  • ersilia test eos3b5e --deep --from_s3
  • ersilia test eos3b5e --deep --from_dir
  • ersilia test eos3b5e --deep --from_dockerhub
  • ersilia test eos3b5e --deep --from_dockerhub --version

Notes:

  • The as_json flag seems to not be having any effect as I do not see any JSON file being serialized.
  • Again without the -v flag, the shallow checks don't display on the terminal, and I see the following logs:
* Basic checks completed!
Performing shallow checks...
⠸ Performing shallow checks Run process completed.
  • I think I figured out the issue with the shallow exiting without printing anything. I think when it runs into an error, it silently exits instead of printing anything. And this is only caught within the verbose mode.

@DhanshreeA
Copy link
Member Author

@Abellegese we can format the output in this table to display better:

                           Computational Performance Summary                           
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Check                          ┃                                             Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Computational Performance      │                                           ✔ PASSED │
│ Tracking                       │                                                    │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Computational Performance      │         1 predictions executed in 9.88 seconds. 10 │
│ Tracking Details               │          predictions executed in 9.92 seconds. 100 │
│                                │             predictions executed in 10.01 seconds. │
└────────────────────────────────┴────────────────────────────────────────────────────┘

We simply only need a newline character after each line in the Computational Performance Tracking Details.

@Abellegese
Copy link
Contributor

@DhanshreeA yes nice idea.

@DhanshreeA
Copy link
Member Author

Another thing, when running deep checks, this is what I see:

* Basic checks completed!
Performing deep checks...
⠇ Performing shallow checks

We shouldn't see the line Performing deep checks until the checks have actually started.

@DhanshreeA
Copy link
Member Author

I am not sure I understand this table:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Check                          ┃                                             Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Environment Size               │                                              475MB │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Check Single Input             │                                           ✔ PASSED │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Check Predefined Example Input │                                           ✔ PASSED │
├────────────────────────────────┼────────────────────────────────────────────────────┤
│ Check Consistency of Model     │                                           ✔ PASSED │
│ Output                         │                                                    │
└────────────────────────────────┴────────────────────────────────────────────────────┘

Especially the Check Single Input and Check Predefined Example Input fields. This comes from the shallow checks.

@DhanshreeA
Copy link
Member Author

Note on --as_json:

{
    "ModelInformationChecks": [
        {
            "Check": "Model ID",
            "Status": "PASSED"
        },
        {
            "Check": "Model Slug",
            "Status": "PASSED"
        },
        {
            "Check": "Model Status",
            "Status": "PASSED"
        },
        {
            "Check": "Model Title",
            "Status": "PASSED"
        },
        {
            "Check": "Model Description",
            "Status": "PASSED"
        },
        {
            "Check": "Model Task",
            "Status": "PASSED"
        },
        {
            "Check": "Model Input",
            "Status": "PASSED"
        },
        {
            "Check": "Model Input Shape",
            "Status": "PASSED"
        },
        {
            "Check": "Model Output",
            "Status": "PASSED"
        },
        {
            "Check": "Model Output Type",
            "Status": "PASSED"
        },
        {
            "Check": "Model Output Shape",
            "Status": "PASSED"
        },
        {
            "Check": "Model Interpretation",
            "Status": "PASSED"
        },
        {
            "Check": "Model Tag",
            "Status": "PASSED"
        },
        {
            "Check": "Model Publication",
            "Status": "PASSED"
        },
        {
            "Check": "Model Source Code",
            "Status": "PASSED"
        },
        {
            "Check": "Model Contributor",
            "Status": "PASSED"
        },
        {
            "Check": "Model Dockerhub URL",
            "Status": "PASSED"
        },
        {
            "Check": "Model S3 URL",
            "Status": "PASSED"
        },
        {
            "Check": "Model Docker Architecture",
            "Status": "PASSED"
        }
    ],
    "ModelFileChecks": [
        {
            "Check": "File: Dockerfile",
            "Status": "PASSED"
        },
        {
            "Check": "File: metadata.json",
            "Status": "PASSED"
        },
        {
            "Check": "File: model/framework/run.sh",
            "Status": "PASSED"
        },
        {
            "Check": "File: src/service.py",
            "Status": "PASSED"
        },
        {
            "Check": "File: pack.py",
            "Status": "PASSED"
        },
        {
            "Check": "File: README.md",
            "Status": "PASSED"
        },
        {
            "Check": "File: LICENSE",
            "Status": "PASSED"
        }
    ],
    "ModelDirectorySizes": [
        {
            "Check": "Directory",
            "Size": "1MB"
        }
    ],
    "DependencyCheck": [
        {
            "Check": "Dockerfile Check",
            "Status": "PASSED"
        },
        {
            "Check": "Check Details",
            "Status": "Dockerfile dependencies are valid."
        }
    ],
    "ValidationandSizeCheckResults": [
        {
            "Check": "Environment Size",
            "Status": "475MB"
        },
        {
            "Check": "Check Single Input",
            "Status": "PASSED"
        },
        {
            "Check": "Check Predefined Example Input",
            "Status": "PASSED"
        },
        {
            "Check": "Check Consistency of Model Output",
            "Status": "PASSED"
        }
    ],
    "ConsistencySummaryBetweenErsiliaandBashExecutionOutputs": [],
    "ModelOutputContentValidationSummary": [
        {
            "Check": "str : CSV",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "str : JSON",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "str : HDF5",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "list : CSV",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "list : JSON",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "list : HDF5",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "csv : CSV",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "csv : JSON",
            "Detail": "Valid Content",
            "Status": "PASSED"
        },
        {
            "Check": "csv : HDF5",
            "Detail": "Valid Content",
            "Status": "PASSED"
        }
    ],
    "ComputationalPerformanceSummary": [
        {
            "Check": "Computational Performance Tracking",
            "Status": "PASSED"
        },
        {
            "Check": "Computational Performance Tracking Details",
            "Status": "1 predictions executed in 9.86 seconds. 10 predictions executed in 9.93 seconds. 100 predictions executed in 10.01 seconds."
        }
    ]
}

This needs to be changed to something more machine readable where it's straightforward to do json[key] and get the value. Also the file is not being saved in the PWD, it's being saved in PWD/.. (ie a dir above that). This needs to change.

@DhanshreeA
Copy link
Member Author

Okay, so the next steps involve updating ersilia such that we can actually update the new metadata fields to Airtable. Mainly the airtableops.py script takes care of that. The function update_metadata_to_airtable runs to read the metadata file from the repo and then uses that to update the corresponding fields for a model in AirTable.

This function utilizes the RepoMetadataFile class which in turn uses the BaseInformation class to read these fields. Mainly this class is what we want to update with the new fields. @Abellegese should open a PR to make this change, and also add a unit test for this. The test fixture should have an ideal metadata.json file and similarly a metadata.yaml file with these new fields. And RepoMetadataClass should be able to read from these files correctly.

The updated fields include:

  1. Docker Pack Method
  2. Environment Size
  3. Image Size
  4. Computational Performance 1
  5. Computational Performance 10
  6. Computational Performance 100

As per @miquelduranfrigola these fields have been created in the Airtable DB.

@DhanshreeA
Copy link
Member Author

Moreover, just for neatness, we should filter these warning logs from fuzzywuzzy:

/Users/mduranfrigola/miniconda3/envs/ersilia/lib/python3.11/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
 warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')

@DhanshreeA
Copy link
Member Author

Let's update the JSON Format to something like:

{
    "model_information_checks": {
        "model_id": true,
        "model_slug": true,
        "model_status": true,
        "model_title": true,
        "model_description": true,
        "model_task": true,
        "model_input": true,
        "model_input_shape": true,
        "model_output": true,
        "model_output_type": true,
        "model_output_shape": true,
        "model_interpretation": true,
        "model_tag": true,
        "model_publication": true,
        "model_source_code": true,
        "model_contributor": true,
        "model_dockerhub_url": true,
        "model_s3_url": true,
        "model_docker_architecture": true
    },
    "model_file_checks": {
        "dockerfile": true,
        "metadata_json": true,
        "model_framework_run_sh": true,
        "src_service_py": true,
        "pack_py": true,
        "readme_md": true,
        "license": true
    },
    "model_directory_sizes": {
        "directory_size_mb": 1
    },
    "dependency_check": {
        "dockerfile_check": true,
        "check_details": "Dockerfile dependencies are valid."
    },
    "validation_and_size_check_results": {
        "environment_size_mb": 475,
        "check_single_input": true,
        "check_predefined_example_input": true,
        "check_consistency_of_model_output": true
    },
    "consistency_summary_between_ersilia_and_bash_execution_outputs": [],
    "model_output_content_validation_summary": {
        "str_csv": true,
        "str_json": true,
        "str_hdf5": true,
        "list_csv": true,
        "list_json": true,
        "list_hdf5": true,
        "csv_csv": true,
        "csv_json": true,
        "csv_hdf5": true
    },
    "computational_performance_summary": {
            "pred_1": 9.86,
            "pred_10": 9.93,
            "pred_100": 10.01
        }
}

@Abellegese
Copy link
Contributor

Noted @DhanshreeA .

@Abellegese
Copy link
Contributor

Updates

  1. I refactored the eoseb5e model locally to make it to return invalid values for several datastructure output such as Single, List, Flexible List, Matrix and Serializable Object
  2. I added a detailed traceback based error tracking to be able to display the error without the verbosity mode
  3. I updated the way json report is created, meaning it can now save the runtime error in it and the code moved to the finally in the try-excepy block to make the result json file be created even at the failure case with the detailed runtime error in it.
  4. The file content for combination of input and output was implemented in this check_model_output_content function but now also other checks in the model tester that executes run command now have validation for their output content.
  5. Performed code clean up

Here is sample output to test invalid contents

Image

Image

Image

@Abellegese Abellegese linked a pull request Jan 16, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

4 participants