-
-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐕 Batch: Refactoring Test workflows in models #1484
Comments
Hi @Abellegese or @DhanshreeA Can you clarify if the test command needs to be modified (according to point 1) or both the test command and the playground will be modified? |
Hey @GemmaTuron our plan is to update both pipeline for this functionality. I am creating one issue for both. |
A few more detail about the features has been given here #1488. |
We have re-evaluated our strategy for Model Testing and this is what we have finally agreed on:
Optional flags to the test command:
Once the test command is refactored, the workflows on eos_template need to be modified. In general lines:
Responsabilities
|
Thanks @GemmaTuron |
So far, work has been completed for 1) default testing, and 2) shallow testing with all combinations for fetching a model. @Abellegese is currently implementing the optional flag |
Thanks @DhanshreeA and @Abellegese . Would be very useful if you can provide here an example with one model of the commands that can be run with the basic and --shallow and what is its output with the different flags, so we can see if any edits need to be made before moving on |
Okay @GemmaTuron we will give it here. |
Hi @GemmaTuron here is the sample for commands Basic
|
Shallow Dockerhub
Edited
|
On a quick look, this is quite amazing. |
Hi @Abellegese some observations from running the test command locally:
{'Identifier': 'eos9gg2', 'Slug': 'chemical-space-projections-drugbank', 'Status': 'In progress', 'Title': 'Chemical space 2D projections against DrugBank', 'Description': 'This tool performs PCA, UMAP and tSNE projections taking the DrugBank chemical space as a reference. The Ersilia Compound Embeddings as used as descriptors. Four PCA components and two UMAP and tSNE components are returned.', 'Mode': 'In-house', 'Task': 'Representation', 'Input': 'Compound', 'Input Shape': 'Single', 'Output': 'Descriptor', 'Output Type': 'Float', 'Output Shape': 'List', 'Interpretation': 'Coordinates of 2D projections, namely PCA, UMAP and tSNE.', 'Tag': ['Embedding'], 'Publication': 'https://academic.oup.com/nar/article/52/D1/D1265/7416367', 'Source Code': 'https://github.com/ersilia-os/compound-embedding', 'License': 'GPL-3.0-or-later', 'S3': 'https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9gg2.zip'}
⠙ Performing shallow checks Validation and Size Check Results
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Check ┃ Status ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Docker Image Size │ 1134.70 MB │
├────────────────────┼─────────────────────────────────────────┤
│ Check Single Input │ ✔ PASSED │
├────────────────────┼─────────────────────────────────────────┤
│ Check Predefined │ ✔ PASSED │
│ Example Input │ │
├────────────────────┼─────────────────────────────────────────┤
│ Check Consistency │ ✔ PASSED │
│ of Model Output │ │
└────────────────────┴─────────────────────────────────────────┘
⠦ Performing shallow checks 🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨
Error message:
Failed to read CSV from /tmp/tmpjnz08pks/bash_output.csv.
If this error message is not helpful, open an issue at:
- https://github.com/ersilia-os/ersilia
Or feel free to reach out to us at:
- hello[at]ersilia.io
If you haven't, try to run your command in verbose mode (-v in the CLI)
- You will find the console log file in: /home/dee/eos/current.log
Run process finished successfully.
|
Hi @DhanshreeA okay right, what was the command? |
@Abellegese updated my comment. |
Okay, I think the shallow checks are working but not being displayed for some reason? Here are the logs when I run the test command in verbose mode:
What's more weird is that when I run it without verbosity, the process simply appears to exit. |
@DhanshreeA from this log seems nothing went wrong. |
Updates: eos9gg2 with json file report |
Updates
Note: I tested it locally extensively and will never know what will happen when you try this out and I will take responsibility for that. |
@Abellegese that's the point. In the verbose mode, it was fine, but without the flag set up, at least for eos3b5e, the process simply exited with nothing printed on the screen. However, with the recent push, I see some changes, but for some reason, Ersilia crashed with a
I think this specifically failed in the CSV test because I do not see a |
I am testing the following models, and I will attach logs or share results from each model in a separate comment:
|
@Abellegese The S3 issue with eos7jio is understandable, the model was refactored last week, and since our workflows are not working properly, the refactored version which has |
Model eos3b5e
Notes:
|
@Abellegese we can format the output in this table to display better:
We simply only need a newline character after each line in the |
@DhanshreeA yes nice idea. |
Another thing, when running deep checks, this is what I see:
We shouldn't see the line |
I am not sure I understand this table:
Especially the |
Note on {
"ModelInformationChecks": [
{
"Check": "Model ID",
"Status": "PASSED"
},
{
"Check": "Model Slug",
"Status": "PASSED"
},
{
"Check": "Model Status",
"Status": "PASSED"
},
{
"Check": "Model Title",
"Status": "PASSED"
},
{
"Check": "Model Description",
"Status": "PASSED"
},
{
"Check": "Model Task",
"Status": "PASSED"
},
{
"Check": "Model Input",
"Status": "PASSED"
},
{
"Check": "Model Input Shape",
"Status": "PASSED"
},
{
"Check": "Model Output",
"Status": "PASSED"
},
{
"Check": "Model Output Type",
"Status": "PASSED"
},
{
"Check": "Model Output Shape",
"Status": "PASSED"
},
{
"Check": "Model Interpretation",
"Status": "PASSED"
},
{
"Check": "Model Tag",
"Status": "PASSED"
},
{
"Check": "Model Publication",
"Status": "PASSED"
},
{
"Check": "Model Source Code",
"Status": "PASSED"
},
{
"Check": "Model Contributor",
"Status": "PASSED"
},
{
"Check": "Model Dockerhub URL",
"Status": "PASSED"
},
{
"Check": "Model S3 URL",
"Status": "PASSED"
},
{
"Check": "Model Docker Architecture",
"Status": "PASSED"
}
],
"ModelFileChecks": [
{
"Check": "File: Dockerfile",
"Status": "PASSED"
},
{
"Check": "File: metadata.json",
"Status": "PASSED"
},
{
"Check": "File: model/framework/run.sh",
"Status": "PASSED"
},
{
"Check": "File: src/service.py",
"Status": "PASSED"
},
{
"Check": "File: pack.py",
"Status": "PASSED"
},
{
"Check": "File: README.md",
"Status": "PASSED"
},
{
"Check": "File: LICENSE",
"Status": "PASSED"
}
],
"ModelDirectorySizes": [
{
"Check": "Directory",
"Size": "1MB"
}
],
"DependencyCheck": [
{
"Check": "Dockerfile Check",
"Status": "PASSED"
},
{
"Check": "Check Details",
"Status": "Dockerfile dependencies are valid."
}
],
"ValidationandSizeCheckResults": [
{
"Check": "Environment Size",
"Status": "475MB"
},
{
"Check": "Check Single Input",
"Status": "PASSED"
},
{
"Check": "Check Predefined Example Input",
"Status": "PASSED"
},
{
"Check": "Check Consistency of Model Output",
"Status": "PASSED"
}
],
"ConsistencySummaryBetweenErsiliaandBashExecutionOutputs": [],
"ModelOutputContentValidationSummary": [
{
"Check": "str : CSV",
"Detail": "Valid Content",
"Status": "PASSED"
},
{
"Check": "str : JSON",
"Detail": "Valid Content",
"Status": "PASSED"
},
{
"Check": "str : HDF5",
"Detail": "Valid Content",
"Status": "PASSED"
},
{
"Check": "list : CSV",
"Detail": "Valid Content",
"Status": "PASSED"
},
{
"Check": "list : JSON",
"Detail": "Valid Content",
"Status": "PASSED"
},
{
"Check": "list : HDF5",
"Detail": "Valid Content",
"Status": "PASSED"
},
{
"Check": "csv : CSV",
"Detail": "Valid Content",
"Status": "PASSED"
},
{
"Check": "csv : JSON",
"Detail": "Valid Content",
"Status": "PASSED"
},
{
"Check": "csv : HDF5",
"Detail": "Valid Content",
"Status": "PASSED"
}
],
"ComputationalPerformanceSummary": [
{
"Check": "Computational Performance Tracking",
"Status": "PASSED"
},
{
"Check": "Computational Performance Tracking Details",
"Status": "1 predictions executed in 9.86 seconds. 10 predictions executed in 9.93 seconds. 100 predictions executed in 10.01 seconds."
}
]
} This needs to be changed to something more machine readable where it's straightforward to do |
Okay, so the next steps involve updating ersilia such that we can actually update the new metadata fields to Airtable. Mainly the This function utilizes the The updated fields include:
As per @miquelduranfrigola these fields have been created in the Airtable DB. |
Moreover, just for neatness, we should filter these warning logs from
|
Let's update the JSON Format to something like: {
"model_information_checks": {
"model_id": true,
"model_slug": true,
"model_status": true,
"model_title": true,
"model_description": true,
"model_task": true,
"model_input": true,
"model_input_shape": true,
"model_output": true,
"model_output_type": true,
"model_output_shape": true,
"model_interpretation": true,
"model_tag": true,
"model_publication": true,
"model_source_code": true,
"model_contributor": true,
"model_dockerhub_url": true,
"model_s3_url": true,
"model_docker_architecture": true
},
"model_file_checks": {
"dockerfile": true,
"metadata_json": true,
"model_framework_run_sh": true,
"src_service_py": true,
"pack_py": true,
"readme_md": true,
"license": true
},
"model_directory_sizes": {
"directory_size_mb": 1
},
"dependency_check": {
"dockerfile_check": true,
"check_details": "Dockerfile dependencies are valid."
},
"validation_and_size_check_results": {
"environment_size_mb": 475,
"check_single_input": true,
"check_predefined_example_input": true,
"check_consistency_of_model_output": true
},
"consistency_summary_between_ersilia_and_bash_execution_outputs": [],
"model_output_content_validation_summary": {
"str_csv": true,
"str_json": true,
"str_hdf5": true,
"list_csv": true,
"list_json": true,
"list_hdf5": true,
"csv_csv": true,
"csv_json": true,
"csv_hdf5": true
},
"computational_performance_summary": {
"pred_1": 9.86,
"pred_10": 9.93,
"pred_100": 10.01
}
} |
Noted @DhanshreeA . |
Updates
Here is sample output to test invalid contents |
Summary
This issue will encompass efforts to reconcile, clean up, and enhance our test (and build) pipelines for individual models.
We currently have a
test module
and CLI command (ersilia test ...
) that can check a given model for functionality, completeness, and correctness. In addition to this, we also have a testing playground - a test utility which checks a given model for functionality, completeness, and correctness; and is able to simulate running one or more models on a user's system.Existing test in our model pipeline is quite redundant in face of these functionalities because it is quite naive in comparison since it only tests nullity in model predictions, and is not robust to how a model might serialize its outputs. Moreover, the Docker build pipelines are bloated with code that can be removed in favor of a singular workflow testing the built images. We also need to handle testing for ARM and AMD builds more smartly since currently we only test the AMD images, but recently we have experienced some models successfully building for the ARM platform but then not actually working.
Furthermore, we need to revisit H5 serialization within Ersilia, and also include tests for this functionality at the level of testing models.
Each of the objectives below should be considered individual tasks, and should be addressed in separate PRs referencing this issue.
Objective(s)
test-model.yml
workflow, we should remove the current testing logic (L128-L144) and keep it in favor of only using theersilia test
command. We also want to upload the logs generated from this command, as well as the results of this command as artifacts with a retention period of 14 days.test-model-pr.yml
workflow, we should only keep to using theersilia test
command. Same conditions apply for handling and uploading the logs and results as artifacts with retention of 14 days.upload-ersilia-pack.yml
, andupload-bentoml.yml
workflows to only build and publish model images (both for ARM and AMD), ie we can remove the testing logic from these workflows. These images should be taggeddev
.Upload model to DockerHub
workflow. This workflow should utilise the Testing Playground utility from Ersilia and test the built model image (however it gets built, ie using Ersilia Pack or legacy approaches). This workflow should run on a matrix ofubuntu-latest
, andmacos-latest
, to ensure that we are also testing the AMD images. Based on the result of this workflows, we can tag the images latest and identify which architectures they successfully work on.Post model upload
workflow should run at the very last and update necessary metadata stores (Airtable, S3 JSON), and README. We can remove the step to create testing issues for community members at this point from this workflow.Documentation
ModelTester
class used in the test CLI command: https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/developer-docs/model-testerTesting Playground
utility: https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/developer-docs/testing-playgroundThe text was updated successfully, but these errors were encountered: