Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fixing Test-Cases] Functions with image inputs but checking lists #120

Open
ian-coccimiglio opened this issue Sep 9, 2024 · 0 comments · Fixed by #121
Open

[Fixing Test-Cases] Functions with image inputs but checking lists #120

ian-coccimiglio opened this issue Sep 9, 2024 · 0 comments · Fixed by #121

Comments

@ian-coccimiglio
Copy link
Contributor

If our test asks for images, I think we should make sure that the test-case is actually providing the images/arrays (rather than requiring the LLM to convert the input into images first)

Here's the sum_images test. The problem is that it's asking for images, but our test function is actually checking lists.

# "sum_images.ipynb"
def check(candidate):
    import numpy as np

    image1 = np.random.random((5,6))
    image2 = np.random.random((5,6))
    sum_image = image1 + image2
    
    assert np.allclose(candidate(image1, image2), sum_image)

    image1 =    [[1,2,3], [4,5,6]]
    image2 =    [[5,6,7], [0,1,2]]
    sum_image = [[6,8,10],[4,6,8]]
    
    assert np.allclose(candidate(image1, image2), sum_image)

This also applies to the mask_images test:

# "mask_image.ipynb"
def check(candidate):
    import numpy as np
    
    image = [
        [2,2,2,2,2],
        [2,2,3,2,2],
        [2,3,3,3,2],
        [2,2,3,2,2],
        [2,2,2,2,2],
    ]
    mask = [
        [0,0,0,0,0],
        [0,0,1,0,0],
        [0,1,1,1,0],
        [0,0,1,0,0],
        [0,0,0,0,0],
    ]
    reference = [
        [0,0,0,0,0],
        [0,0,3,0,0],
        [0,3,3,3,0],
        [0,0,3,0,0],
        [0,0,0,0,0],
    ]
    masked_image = candidate(image,mask)
    assert np.array_equal(masked_image, reference)

As well as mean squared error test:

# "mean_squared_error.ipynb"
def check(candidate):
    image1 = [
        [0,0,0,0,0],
        [0,1,0,0,0],
        [0,0,0,0,0],
        [0,0,0,2,0],
        [0,0,0,0,0],
    ]
    image2 = [
        [0,0,0,0,0],
        [0,1,0,0,0],
        [0,0,0,0,0],
        [0,0,0,2,0],
        [0,0,0,0,0],
    ]

    mse = candidate(image1,image2)
    print(mse)
    assert mse == 0

    image3 = [
        [0,0,0,0,0],
        [0,0,0,0,0],
        [0,0,0,0,0],
        [0,0,0,0,0],
        [0,0,0,0,0],
    ]

    mse = candidate(image1,image3)
    print(mse)
    assert mse == 5 / 25
    

This is just a sample, there may be more places where this is the case. Fixing these issues brings these (below 50%) benchmarks much more in line with our expectations.

Self-assigning this one.

@ian-coccimiglio ian-coccimiglio changed the title Test-cases asking for images but check lists [Fixing Test-Cases] Functions with image inputs but checking lists Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant