Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Regression tests #63

Closed
wants to merge 35 commits into from
Closed

WIP: Regression tests #63

wants to merge 35 commits into from

Conversation

j-maas
Copy link
Collaborator

@j-maas j-maas commented Dec 4, 2017

Compares the output of the parser to a known state of the Academica website on WaybackMachine (https://web.archive.org/web/20171204101926/http://www.studierendenwerk-aachen.de/speiseplaene/academica-w.html#).

This is a sanity check that can be used with any parser for the Aachener Academica website.

This was referenced Dec 4, 2017
@j-maas
Copy link
Collaborator Author

j-maas commented Dec 4, 2017

The build fails because this PR does not touch the parser which is broken on the new website.

@mswart
Copy link
Owner

mswart commented Jan 10, 2018

I am still unsure how to test parsers. But testing additionally old output might be a good safeguard for refactorings.

The PR should be rebased on current master and check whether pytest is executed by travis (I assume it is currently not the case).

@j-maas
Copy link
Collaborator Author

j-maas commented Jan 10, 2018

Currently, those tests are not run during the travis build. The main benefit of this test is for the developer to quickly check that he hasn't broken the parser when refactoring. It is actually worthless when the parser breaks because the website changed.
In that respect, I think it's not too bad if we do not run this test on travis. But some similar test could be discussed in #40.

A quick question: Would it have been a good idea to rebase this PR? Because the commits were already published, a rebase would change all their hashes which could potentially lead to trouble... That's why I simply merged the master into here.

@mswart
Copy link
Owner

mswart commented Jan 11, 2018

I am not aware of any relevant issues with rebasing within feature branches.

Yes, but travis is designed as safety-net if a developer didn't test everything locally or the issues does occurs within development.
During larger adjustments to the core infrastructure same individual parsers might break. So the regression test might be really helpful. If it is a false-negative, the test can simply be adjusted.
Otherwise I fear that the python test will be simply forgotten after some time.

@j-maas
Copy link
Collaborator Author

j-maas commented Jan 11, 2018

Yes, I only wanted to say that it is most valuable during refactoring. But I totally agree that it is worth running it on travis. I will try to make that happen.

Maybe we can find a way to generate this test automatically for each parser, but I think that's something that would fit better into #40.

@@ -65,6 +66,17 @@ def add_meals_from_table(canteen, table, day):
if category and name:
canteen.addMeal(day, category, name, notes, prices=price_tag)

# TODO: Move determinism upstream (PyOpenMensa)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deterministic XML output should be guaranteed in PyOpenMensa, since it cannot be handled here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, deterministic output is better ensured by PyOpenMensa.

@j-maas
Copy link
Collaborator Author

j-maas commented Jan 19, 2018

I've worked on making the regression test more generic. The workflow would be as follows:

  1. You generate the first set of snapshots for your parser.
  2. You check that the snapshot-result.xml is as you expect it to be, given the snapshot-website.html.
  3. You can now run the regression test and it will check that the parser still outputs the same XML. Even when the website changes, the test will rely on the website's snapshot, so you do not have to re-check that the parser's result is actually correct. Happy refactoring!
  4. If the website breaks (i. e. changes structurally, not just the content), you can fix your parser, update the snapshots and go back to step 2.

For now adding parsers to the list (from regression_test.py import parsers_to_test) only works for parsers that are structured like the Aachener one. It relies on having one main parser (parser = Parser(...)) and the canteens as subparsers (parser.define(...)).

@j-maas
Copy link
Collaborator Author

j-maas commented Jan 19, 2018

I seem to have messed up the Git submodule...

Y0hy0h added 2 commits January 19, 2018 20:07
The regression tests that were in the package are now factored out of it.
@j-maas
Copy link
Collaborator Author

j-maas commented Jan 19, 2018

The package python3-requests-mock is available in Debian jessie. I have tried to use native pytest, but I could not make it work...

@j-maas
Copy link
Collaborator Author

j-maas commented Feb 6, 2018

I've made the snapshot updater detect HTTP requests automatically. Unfortunately, this requires the usage of from urllib import request together with request.urlopen(...). With this simple change, the updater and the tests are able to mock HTTP requests reliably.

This change is required because of the following issue: https://stackoverflow.com/questions/48614058/mock-urlopen-in-unkown-subpackage/

@mswart
Copy link
Owner

mswart commented Feb 7, 2018

During the last week I kept thinking whether to store website and feed snapshots within this repository.
Sadly I did not came up with an answer. Again and again I though about advantages and disadvantages.
Storing them simplifies development and testing dramatically.
But the snapshots are relatively large. They might be changed often?
My gut feeling says they are data and do not belong within a code repository.

I still hoping for an reasonable alternative:

  • the data might be stored within an additional repository
  • LFS (large file storage)
  • Store them somewhere differently
  • In the beginning you received to website from webarchive

What is your standpoint?

@j-maas
Copy link
Collaborator Author

j-maas commented Feb 7, 2018

The snapshots for the (two) websites and the result use 90,2 kBytes. So for 10 parsers that would come in at 1 MByte. I feel that that's not a lot and that they're totally worth that.

The only real reason to update the snapshots I can come up with is that the website changed and broke the parser. Or there is some case that was not present in the old snapshot, but that occurs on the current website. But mostly they should remain the same and be used while refactoring the parser's code.

They are not data in the sense that they are integral parts of a regression test. Therefore they belong with the code, since the developer needs to have them handy while coding.
That said, there might be a way to exclude them from the final package. Do you know how?

@j-maas j-maas changed the title Regression test for Aachen Regression tests Feb 9, 2018
@mswart
Copy link
Owner

mswart commented Feb 13, 2018

Ok, I agree the snapshots should be an issue for now.

Moving the tests directory into the top-level directory should exclude it from the installation.
I would be fine with tests/aachen as directory: we primarily test parsers, so the additional parsers directory is not necessary.
Otherwise the installation setup would need an explicit exception/removing step.

Do you want to rebase the changes yourself or should I squash them during merge?

@j-maas
Copy link
Collaborator Author

j-maas commented Feb 13, 2018

I fixed the location of the tests.

I would be fine with tests/aachen as directory

I'm unsure whether you meant the tests being in parsers/tests/.... That is fixed now.
However, just in case you meant the additional academica folder in (now) parser_tests/aachen/academica: I would argue to keep it. This is the same mapping as used in the command line parse.py aachen academica to invoke a parser.

As to whether to squash or rebase: I discussed that I prefer keeping the individual commits in #68, but you decide. I think it's best to stick to what's been the procedure until now, so squashing is fine.

@j-maas
Copy link
Collaborator Author

j-maas commented Mar 2, 2018

What's the status on this PR? I would like it to be merged, if everyone agrees. ;)

@j-maas j-maas mentioned this pull request Mar 3, 2018
Copy link
Collaborator

@klemens klemens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the idea of having regression tests. This makes mass refactorings even possible without checking every single parser.

However there are some problems that I noticed.

get_snapshot_website_path, \
parse_mocked, parsers_to_test

base_directory = os.path.dirname(os.path.realpath(__file__))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally PyCharms flags unused variables, but it somehow missed this one...

def main():
if len(sys.argv) < 2:
usage_hint = "Usage: `update_snapshots.py <parser> <canteen>`"
print("Missing arguments.", usage_hint)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should exit with a non-zero statuscode in this case and print the usage message to stderr. Also the usage message should mention the --all:

print("Usage: {} --all | <parser> <canteen>".format(sys.argv[0]), file=sys.stderr)
sys.exit(1)

elif len(sys.argv == 3):
parser, canteen = sys.argv[1:3]
generate_snapshot(parser, canteen)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an else missing with an error.

@@ -30,7 +30,8 @@ def parse_legend(document):

def parse_all_days(canteen, document):
days = ('Montag', 'Dienstag', 'Mittwoch', 'Donnerstag', 'Freitag',
'MontagNaechste', 'DienstagNaechste', 'MittwochNaechste', 'DonnerstagNaechste', 'FreitagNaechste')
'MontagNaechste', 'DienstagNaechste', 'MittwochNaechste', 'DonnerstagNaechste',
'FreitagNaechste')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems unrelated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was probably an automatic formatting. I'll just leave it in.

.gitignore Outdated
@@ -1,3 +1,9 @@
__pycache__
build

# Python Virtual Environment
venv/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, didn't we discuss in the other pull request that it makes more sense to include such things locally in .git/info/exclude?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry! Will fix!

with open(get_snapshot_result_path(parser, canteen), encoding='utf-8') as result_file:
expected_result = result_file.read()
assert result == expected_result, ("Actual result:\n"
+ result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the result shown? I think the diff is enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sometimes wanted to inspect what was being built on Travis. The diff is too polluted to read properly, and I needed the clean result to diff it myself in a nicer view. I hope it's clear what I mean. 😅

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but the failed assert usually happens while developing, where you can just run the parser directly and see the output? (This is a minor point so I am also fine with keeping it if you want)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if I think about it, you're right. Maybe it was during development of the regression tests that I needed it, but now it should run the same on travis and locally. Will remove it. 👍


snapshot_website_path = get_snapshot_website_path(parser, canteen)
with open(snapshot_website_path, 'w', encoding='utf-8') as file:
json.dump(intercepted_requests, file, indent=4)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these stored as json? This makes reading git diffs impossible when updating them because the whole website is in one file. It would be better to store them as plain html. The filename could then eg. just be a hash of the url (and maybe other parameters) to supports multiple urls.

However the bigger problem I see is copyright/author's rights. Can we simply include complete website snapshots in the repo?

Copy link
Collaborator Author

@j-maas j-maas Mar 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because a parser can call multiple websites, I found it easier to store them in one JSON file under their URL as a key. It was simply the easiest way for me to implement it. We could generate separate HTML-files.

I am not sure about the copyright. A website is meant to be transmitted and copied, right? 😐 And the point of these snapshots is just to cache a request, so that it can be reused later. You're making a good point. I'm not seeing too much trouble here, although I'm not well educated in this area.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A website is meant to be transmitted and copied, right?

The problem is that we are redistributing the website with inclusion into the repo.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the problem now. Do you know how to solve this issue? I'm not sure how to proceed.


snapshot_result_path = get_snapshot_result_path(parser, canteen)
with open(snapshot_result_path, 'w', encoding='utf-8') as result_file:
result = parse_mocked(parser, canteen)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the parser failed? Then the source file was already updated, but the result file is not.

Copy link
Collaborator Author

@j-maas j-maas Mar 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't thought about that in great detail, but I think for now we should be fine:

It will hopefully fail by raising an exception, right? If you generate a snapshot for a specific parser, that should be enough. Maybe we can add a try mechanism when generating all parsers' snapshots and inform the user about which ones failed.

And even if only the website is updated and not the result: This means something is wrong with the parser. Once that's fixed, you could rerun the snapshot update.

I would suggest we observe this and fix it as soon as we have a problem with it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception is the problem, as it aborts the function and result_file.write(result) is never called but son.dump(…) was already. However, the fix is really simple, just put the result = parse_mocked(parser, canteen) call above the writing of the source (json) file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really elegant. 👍

return parsers[parser].parse('', canteen, 'full.xml')


def get_canteen_url(parser, canteen):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is defined here but only used in update_snapshots.py.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix!

return MockResponse()

with mock.patch('urllib.request.urlopen', mock_response):
return parsers[parser].parse('', canteen, 'full.xml')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't like the whole mocking thing. It just feels very brittle and will also fail in more elaborate cases like if the website requires submitting a form with parameters for displaying the menu.

I think it would be better to just change the parsers to adapt to a simplified test framework. Then every parser could use it however it wants. I haven't come up with a complete design but I have some ideas if you are interested.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely! The whole point of this is to just be able to reproduce the answers that a server has given. It is by far not perfect and I'm very intersested in what you have come up with!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simplified version of the leipzig parser looks like this:

def parse_url(url, today):
    canteen = LazyBuilder()
    day = datetime.date.today()
    parse_day(canteen, '{}&date={}'.format(url, day.strftime('%Y-%m-%d')))
    return canteen.toXMLFeed()

def parse_day(canteen, url):
    …

This doesn't work well with your mocking approach because day depends on the actualy day this code is executed. You would have to mock the complete environment to work around this, which doesn't seem feasable. So I instead suggest that parsers that want to implement regression tests implement two functions (the examples still refers to the leipzig parser).

The first function returns a list of RegressionState objects which are serialized and stored by the testing framework. The state is completely custom to the parser and is e.g. stored as a json string:

def generate_state(url):
    day = datetime.date.today().strftime('%Y-%m-%d')
    content = urlopen('{}&date={}'.format(url, day)).read()

    canteen = LazyBuilder()
    feed = parse_day(canteen, content).toXMLFeed()

    return [
        RegressionState(feed, state = content)
    ]

The second function is called by the test framework for every stored RegressionState and returns the feed which is compared by the test framework.

def test_state(state):
    canteen = LazyBuilder()
    return parse_day(canteen, state).toXMLFeed()

Both functions are registered with the Parser and called through it:

parser = Parser('leipzig',
    handler = parse_url,
    regression_test = RegressionTest(
        generate = generate_state,
        test = test_state
    ))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great! I also like this way of registering parser to be tested. Will see when I can implement this.

Copy link
Collaborator Author

@j-maas j-maas Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this a bit, but I'm having a lot of trouble with some details. Maybe I can come back to this later, but for now it seems like it's quite a lot of effort.

If you'd like to try it, please do. 😉

Copy link
Collaborator Author

@j-maas j-maas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review! It's great having your feedback and improvements! :)

@j-maas
Copy link
Collaborator Author

j-maas commented Mar 4, 2018

Ups, why did I leave a review instead of a comment? 🤔 Well, nevermind...

@j-maas j-maas changed the title Regression tests WIP: Regression tests Mar 5, 2018
@j-maas
Copy link
Collaborator Author

j-maas commented May 21, 2018

I've thought about this PR a bit more and I think, it isn't as useful as I first thought. Basically, in practice I expect there to be little value in storing snapshots of the websites, because you need regressions tests only for refactoring. So as soon as the parser breaks and needs to be fixed, by definition its behavior will change, and you will have to re-evaluate by hand that it works correctly, making previously stored snapshots invalid.

Once that is done, though, snapshots might help with refactoring. However, it isn't necessary to store the website snapshot in the repo, because it will be needed only during the refactoring. In addition to the complexity of mocking the HTTP calls generically, this makes it worth considering to drop the website snapshot entirely and only have utility that makes it possible to test whether the parser's output changed, always fetching the current, real website anew.
(That being said, it would be still helpful to make sure that the parser output doesn't change due to changes in the website. So maybe it is still useful to snapshot the website.)

So I think that we might be able to simplify this PR. Additionally, not publishing the website snapshot fixes the legal issue.

@j-maas j-maas mentioned this pull request Apr 6, 2019
@j-maas
Copy link
Collaborator Author

j-maas commented Apr 6, 2019

Closed in favor of #96.

@j-maas j-maas closed this Apr 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants