Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize input on Python benchmarks #118

Open
mcopik opened this issue Feb 20, 2023 · 18 comments
Open

Sanitize input on Python benchmarks #118

mcopik opened this issue Feb 20, 2023 · 18 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@mcopik
Copy link
Collaborator

mcopik commented Feb 20, 2023

All Python benchmarks require JSON as input. However, we never verify that the provided input matches expectations. When users invoke benchmarks incorrectly, they should receive a helpful notification. Right now, they usually receive a Python exception where a variable loaded from JSON ends up being None.

All benchmarks should verify that the input values are provided in the JSON and are of correct type, and return a clear error message upon a failure.

@nbcl
Copy link

nbcl commented Feb 23, 2023

Hello @mcopik!

I wanted to check in regarding the design of the sanitization. I already believe to have solved for the main API (i.e empty or undecodable JSON data) and I am looking to continue adding sanitization to benchmark data keys in a granular manner.

As an example, I used 010.sleep to write an approach on how we could sanitize for absent keys or incorrect value types.

sleep_time = event.get("sleep", default=False)
if not sleep_time:
    return 'Error: No key "sleep" found on input data.'
elif not isinstance(sleep_time, (int, float)):
    return f'Error: Key "sleep" expected int or float, not {type(sleep_time)}.'

I wanted to share my design here before I continue adding sanitization checks to the remaining benchmark variables, just to make sure I am on the right path, so please feel free to let me know if I need to redesing this or introduce any changes!

Here is my design draft: nbcl/serverless-benchmarks/1

@mcopik
Copy link
Collaborator Author

mcopik commented Feb 24, 2023

@nbcl Thank you so much for your interest in the project! Much appreciated.

I think the overall approach is correct; the only issue is that functions should return a dictionary. The dictionary is later included by the invoker to form an overall response (look at the final return value of each benchmark).

While thinking about your proposal, I realized that we do not have any way of signaling errors. I propose that the function should now return the following dictionary:

{
  "status": "success" OR "failure",
  "result": ...,
  "measurements": ...
}

When returning with an error, you should pass the string explaining the problem as "result" field. In addition to your PR, we will have to modify the handling of invocations to detect the failure. The final return should now include "status": "success" in the output dictionary :)(

@mcopik
Copy link
Collaborator Author

mcopik commented Apr 27, 2023

@nbcl Do you plan to submit a PR for this, or can someone else work on this feature?

@niranjank2022
Copy link

niranjank2022 commented Jun 23, 2023

Hi @mcopik!. It seems as is there is no activity in this issue for a very long time. I would like to work on this. Can you please guide me through this. Thank you!

@niranjank2022
Copy link

niranjank2022 commented Jun 23, 2023

@nbcl Thank you so much for your interest in the project! Much appreciated.

I think the overall approach is correct; the only issue is that functions should return a dictionary. The dictionary is later included by the invoker to form an overall response (look at the final return value of each benchmark).

While thinking about your proposal, I realized that we do not have any way of signaling errors. I propose that the function should now return the following dictionary:

{
  "status": "success" OR "failure",
  "result": ...,
  "measurements": ...
}

When returning with an error, you should pass the string explaining the problem as "result" field. In addition to your PR, we will have to modify the handling of invocations to detect the failure. The final return should now include "status": "success" in the output dictionary :)(

And, what does, "measurements" should contain?

@mcopik
Copy link
Collaborator Author

mcopik commented Jun 25, 2023

@niranjank2022 Absolutely, feel free to work on this! All contributions are appreciated.

In the new output, we should have a boolean status, a result which should be a string explaining error (if it occurs), and measurements should be the current output we return, i.e., the output of benchmark & function:

{
  "status": "success" OR "failure",
  "result": "string with error"
  "measurements": {
    # current JSON output
  }
}

@niranjank2022
Copy link

Hi @mcopik! In some of the files, like function.py in 210.thumbnailer, the return value is
return { 'result': { 'bucket': output_bucket, 'key': key_name }, 'measurement': { 'download_time': download_time, 'download_size': len(img), 'upload_time': upload_time, 'upload_size': resized_size, 'compute_time': process_time } }

Do I have to relocate the values in 'result' to 'measurement'?

@mcopik
Copy link
Collaborator Author

mcopik commented Jun 28, 2023

@niranjank2022 Correct!

@niranjank2022
Copy link

niranjank2022 commented Jul 7, 2023

Hello @mcopik
In this file,

import datetime
import igraph

def handler(event):

    size = event.get('size')

    graph_generating_begin = datetime.datetime.now()
    graph = igraph.Graph.Barabasi(size, 10)
    graph_generating_end = datetime.datetime.now()

    process_begin = datetime.datetime.now()
    result = graph.pagerank()
    process_end = datetime.datetime.now()

    graph_generating_time = (graph_generating_end - graph_generating_begin) / datetime.timedelta(microseconds=1)
    process_time = (process_end - process_begin) / datetime.timedelta(microseconds=1)

    return {
            'status': 'success',
            'result': result[0],
            'measurement': {
                'graph_generating_time': graph_generating_time,
                'compute_time': process_time
            }
    }

The key result holds the value result[0]. As result should be used to describe the returned value, what should be the new key name for result[0]?

@mcopik
Copy link
Collaborator Author

mcopik commented Jul 7, 2023

@niranjank2022 Benchmarks should not be touched at all - you should only have to modify handler functions in benchmark/wrappers as they define the final output passed from our executor to users.

EDIT: wrong, see the comment below.

@niranjank2022
Copy link

Then, I need to work on only the handler.py files inside wrapper/aws and wrapper/azure, right?

@mcopik
Copy link
Collaborator Author

mcopik commented Jul 15, 2023

@niranjank2022 I'm sorry for the confusion - I misread your question and my previous answer was wrong.

Handlers defined for each platform and language execute the benchmark, make measurements, and return the data back to the user. You can see it here - right now, we only extract the result field.

Now, the first step would be to modify the benchmarks themselves such that they return the result, status, and potential message explaining error - this needs to incorporate the input verification of benchmarks. Here I was wrong last time - we need to change benchmarks for that :) You only verify one instance of function code per benchmark.

Then, you need to modify handler.py for Python and handler,js for Node.js, each version for each platform. This is to ensure that this data is not lost but is instead passed to the user in the final output, like in here.

Finally, on the client side, you would need to modify the current results parsing, like here - we store the entire output and later access subfields. Instead, we should properly retrieve new information to easily verify that the function executed correctly :)

I apologise for the confusion - pleas let me know if you have further questions.

@niranjank2022
Copy link

@mcopik
Values like 'bucket' and 'object' are accessed in multiple files. Do we have to check whether they exist and if their type is correct in every file? Or leave them and check the others?

@mcopik
Copy link
Collaborator Author

mcopik commented Jul 24, 2023

@niranjank2022 In every benchmark that gets from the JSON data on bucket access, we should check if these exist.

You can do it manually or (better) validate with JSON schema :)

https://python-jsonschema.readthedocs.io/en/stable/

@niranjank2022
Copy link

Hello @mcopik! I have submitted my PR for this issue! Kindly review it and tell me your comments so I can improve it.

@mcopik
Copy link
Collaborator Author

mcopik commented Aug 12, 2023

@niranjank2022 Thank you for your work! I will try to review the PR soon :)

@octonawish-akcodes
Copy link
Collaborator

Is this issue available ? @mcopik

@mcopik
Copy link
Collaborator Author

mcopik commented Mar 22, 2024

@octonawish-akcodes There is a pending PR with unresolved review comments #173 - you can work on that and see if you can improve it :) The main remaining issue was unfinished comments and that we also have to adjust our triggers to properly catch failure/success (I left a comment on that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants