-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Regression tests #63
Conversation
The build fails because this PR does not touch the parser which is broken on the new website. |
I am still unsure how to test parsers. But testing additionally old output might be a good safeguard for refactorings. The PR should be rebased on current master and check whether pytest is executed by travis (I assume it is currently not the case). |
Currently, those tests are not run during the travis build. The main benefit of this test is for the developer to quickly check that he hasn't broken the parser when refactoring. It is actually worthless when the parser breaks because the website changed. A quick question: Would it have been a good idea to rebase this PR? Because the commits were already published, a rebase would change all their hashes which could potentially lead to trouble... That's why I simply merged the master into here. |
I am not aware of any relevant issues with rebasing within feature branches. Yes, but travis is designed as safety-net if a developer didn't test everything locally or the issues does occurs within development. |
Yes, I only wanted to say that it is most valuable during refactoring. But I totally agree that it is worth running it on travis. I will try to make that happen. Maybe we can find a way to generate this test automatically for each parser, but I think that's something that would fit better into #40. |
3a229fb
to
96dbcf6
Compare
parsers/aachen/parser.py
Outdated
@@ -65,6 +66,17 @@ def add_meals_from_table(canteen, table, day): | |||
if category and name: | |||
canteen.addMeal(day, category, name, notes, prices=price_tag) | |||
|
|||
# TODO: Move determinism upstream (PyOpenMensa) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deterministic XML output should be guaranteed in PyOpenMensa, since it cannot be handled here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, deterministic output is better ensured by PyOpenMensa.
96dbcf6
to
fd8547e
Compare
I've worked on making the regression test more generic. The workflow would be as follows:
For now adding parsers to the list ( |
I seem to have messed up the Git submodule... |
The regression tests that were in the package are now factored out of it.
948ae7e
to
0172252
Compare
The package |
I've made the snapshot updater detect HTTP requests automatically. Unfortunately, this requires the usage of This change is required because of the following issue: https://stackoverflow.com/questions/48614058/mock-urlopen-in-unkown-subpackage/ |
During the last week I kept thinking whether to store website and feed snapshots within this repository. I still hoping for an reasonable alternative:
What is your standpoint? |
The snapshots for the (two) websites and the result use 90,2 kBytes. So for 10 parsers that would come in at 1 MByte. I feel that that's not a lot and that they're totally worth that. The only real reason to update the snapshots I can come up with is that the website changed and broke the parser. Or there is some case that was not present in the old snapshot, but that occurs on the current website. But mostly they should remain the same and be used while refactoring the parser's code. They are not data in the sense that they are integral parts of a regression test. Therefore they belong with the code, since the developer needs to have them handy while coding. |
Ok, I agree the snapshots should be an issue for now. Moving the Do you want to rebase the changes yourself or should I squash them during merge? |
I fixed the location of the tests.
I'm unsure whether you meant the tests being in As to whether to squash or rebase: I discussed that I prefer keeping the individual commits in #68, but you decide. I think it's best to stick to what's been the procedure until now, so squashing is fine. |
What's the status on this PR? I would like it to be merged, if everyone agrees. ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like the idea of having regression tests. This makes mass refactorings even possible without checking every single parser.
However there are some problems that I noticed.
parser_tests/update_snapshots.py
Outdated
get_snapshot_website_path, \ | ||
parse_mocked, parsers_to_test | ||
|
||
base_directory = os.path.dirname(os.path.realpath(__file__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally PyCharms flags unused variables, but it somehow missed this one...
parser_tests/update_snapshots.py
Outdated
def main(): | ||
if len(sys.argv) < 2: | ||
usage_hint = "Usage: `update_snapshots.py <parser> <canteen>`" | ||
print("Missing arguments.", usage_hint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should exit with a non-zero statuscode in this case and print the usage message to stderr. Also the usage message should mention the --all
:
print("Usage: {} --all | <parser> <canteen>".format(sys.argv[0]), file=sys.stderr)
sys.exit(1)
parser_tests/update_snapshots.py
Outdated
elif len(sys.argv == 3): | ||
parser, canteen = sys.argv[1:3] | ||
generate_snapshot(parser, canteen) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an else missing with an error.
@@ -30,7 +30,8 @@ def parse_legend(document): | |||
|
|||
def parse_all_days(canteen, document): | |||
days = ('Montag', 'Dienstag', 'Mittwoch', 'Donnerstag', 'Freitag', | |||
'MontagNaechste', 'DienstagNaechste', 'MittwochNaechste', 'DonnerstagNaechste', 'FreitagNaechste') | |||
'MontagNaechste', 'DienstagNaechste', 'MittwochNaechste', 'DonnerstagNaechste', | |||
'FreitagNaechste') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was probably an automatic formatting. I'll just leave it in.
.gitignore
Outdated
@@ -1,3 +1,9 @@ | |||
__pycache__ | |||
build | |||
|
|||
# Python Virtual Environment | |||
venv/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, didn't we discuss in the other pull request that it makes more sense to include such things locally in .git/info/exclude
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry! Will fix!
parser_tests/regression_test.py
Outdated
with open(get_snapshot_result_path(parser, canteen), encoding='utf-8') as result_file: | ||
expected_result = result_file.read() | ||
assert result == expected_result, ("Actual result:\n" | ||
+ result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the result shown? I think the diff is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sometimes wanted to inspect what was being built on Travis. The diff is too polluted to read properly, and I needed the clean result to diff it myself in a nicer view. I hope it's clear what I mean. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, but the failed assert usually happens while developing, where you can just run the parser directly and see the output? (This is a minor point so I am also fine with keeping it if you want)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if I think about it, you're right. Maybe it was during development of the regression tests that I needed it, but now it should run the same on travis and locally. Will remove it. 👍
parser_tests/update_snapshots.py
Outdated
|
||
snapshot_website_path = get_snapshot_website_path(parser, canteen) | ||
with open(snapshot_website_path, 'w', encoding='utf-8') as file: | ||
json.dump(intercepted_requests, file, indent=4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these stored as json? This makes reading git diffs impossible when updating them because the whole website is in one file. It would be better to store them as plain html. The filename could then eg. just be a hash of the url (and maybe other parameters) to supports multiple urls.
However the bigger problem I see is copyright/author's rights. Can we simply include complete website snapshots in the repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because a parser can call multiple websites, I found it easier to store them in one JSON file under their URL as a key. It was simply the easiest way for me to implement it. We could generate separate HTML-files.
I am not sure about the copyright. A website is meant to be transmitted and copied, right? 😐 And the point of these snapshots is just to cache a request, so that it can be reused later. You're making a good point. I'm not seeing too much trouble here, although I'm not well educated in this area.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A website is meant to be transmitted and copied, right?
The problem is that we are redistributing the website with inclusion into the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the problem now. Do you know how to solve this issue? I'm not sure how to proceed.
parser_tests/update_snapshots.py
Outdated
|
||
snapshot_result_path = get_snapshot_result_path(parser, canteen) | ||
with open(snapshot_result_path, 'w', encoding='utf-8') as result_file: | ||
result = parse_mocked(parser, canteen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the parser failed? Then the source file was already updated, but the result file is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't thought about that in great detail, but I think for now we should be fine:
It will hopefully fail by raising an exception, right? If you generate a snapshot for a specific parser, that should be enough. Maybe we can add a try mechanism when generating all parsers' snapshots and inform the user about which ones failed.
And even if only the website is updated and not the result: This means something is wrong with the parser. Once that's fixed, you could rerun the snapshot update.
I would suggest we observe this and fix it as soon as we have a problem with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception is the problem, as it aborts the function and result_file.write(result)
is never called but son.dump(…)
was already. However, the fix is really simple, just put the result = parse_mocked(parser, canteen)
call above the writing of the source (json) file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's really elegant. 👍
parser_tests/regression_test.py
Outdated
return parsers[parser].parse('', canteen, 'full.xml') | ||
|
||
|
||
def get_canteen_url(parser, canteen): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is defined here but only used in update_snapshots.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix!
return MockResponse() | ||
|
||
with mock.patch('urllib.request.urlopen', mock_response): | ||
return parsers[parser].parse('', canteen, 'full.xml') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I don't like the whole mocking thing. It just feels very brittle and will also fail in more elaborate cases like if the website requires submitting a form with parameters for displaying the menu.
I think it would be better to just change the parsers to adapt to a simplified test framework. Then every parser could use it however it wants. I haven't come up with a complete design but I have some ideas if you are interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely! The whole point of this is to just be able to reproduce the answers that a server has given. It is by far not perfect and I'm very intersested in what you have come up with!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simplified version of the leipzig parser looks like this:
def parse_url(url, today):
canteen = LazyBuilder()
day = datetime.date.today()
parse_day(canteen, '{}&date={}'.format(url, day.strftime('%Y-%m-%d')))
return canteen.toXMLFeed()
def parse_day(canteen, url):
…
This doesn't work well with your mocking approach because day
depends on the actualy day this code is executed. You would have to mock the complete environment to work around this, which doesn't seem feasable. So I instead suggest that parsers that want to implement regression tests implement two functions (the examples still refers to the leipzig parser).
The first function returns a list of RegressionState
objects which are serialized and stored by the testing framework. The state is completely custom to the parser and is e.g. stored as a json string:
def generate_state(url):
day = datetime.date.today().strftime('%Y-%m-%d')
content = urlopen('{}&date={}'.format(url, day)).read()
canteen = LazyBuilder()
feed = parse_day(canteen, content).toXMLFeed()
return [
RegressionState(feed, state = content)
]
The second function is called by the test framework for every stored RegressionState
and returns the feed which is compared by the test framework.
def test_state(state):
canteen = LazyBuilder()
return parse_day(canteen, state).toXMLFeed()
Both functions are registered with the Parser
and called through it:
parser = Parser('leipzig',
handler = parse_url,
regression_test = RegressionTest(
generate = generate_state,
test = test_state
))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great! I also like this way of registering parser to be tested. Will see when I can implement this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried this a bit, but I'm having a lot of trouble with some details. Maybe I can come back to this later, but for now it seems like it's quite a lot of effort.
If you'd like to try it, please do. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review! It's great having your feedback and improvements! :)
Ups, why did I leave a review instead of a comment? 🤔 Well, nevermind... |
I've thought about this PR a bit more and I think, it isn't as useful as I first thought. Basically, in practice I expect there to be little value in storing snapshots of the websites, because you need regressions tests only for refactoring. So as soon as the parser breaks and needs to be fixed, by definition its behavior will change, and you will have to re-evaluate by hand that it works correctly, making previously stored snapshots invalid. Once that is done, though, snapshots might help with refactoring. However, it isn't necessary to store the website snapshot in the repo, because it will be needed only during the refactoring. In addition to the complexity of mocking the HTTP calls generically, this makes it worth considering to drop the website snapshot entirely and only have utility that makes it possible to test whether the parser's output changed, always fetching the current, real website anew. So I think that we might be able to simplify this PR. Additionally, not publishing the website snapshot fixes the legal issue. |
Closed in favor of #96. |
Compares the output of the parser to a known state of the Academica website on WaybackMachine (https://web.archive.org/web/20171204101926/http://www.studierendenwerk-aachen.de/speiseplaene/academica-w.html#).
This is a sanity check that can be used with any parser for the Aachener Academica website.