WIP: Regression tests #63

j-maas · 2017-12-04T10:50:33Z

Compares the output of the parser to a known state of the Academica website on WaybackMachine (https://web.archive.org/web/20171204101926/http://www.studierendenwerk-aachen.de/speiseplaene/academica-w.html#).

This is a sanity check that can be used with any parser for the Aachener Academica website.

j-maas · 2017-12-04T11:09:22Z

The build fails because this PR does not touch the parser which is broken on the new website.

mswart · 2018-01-10T12:29:50Z

I am still unsure how to test parsers. But testing additionally old output might be a good safeguard for refactorings.

The PR should be rebased on current master and check whether pytest is executed by travis (I assume it is currently not the case).

j-maas · 2018-01-10T13:34:11Z

Currently, those tests are not run during the travis build. The main benefit of this test is for the developer to quickly check that he hasn't broken the parser when refactoring. It is actually worthless when the parser breaks because the website changed.
In that respect, I think it's not too bad if we do not run this test on travis. But some similar test could be discussed in #40.

A quick question: Would it have been a good idea to rebase this PR? Because the commits were already published, a rebase would change all their hashes which could potentially lead to trouble... That's why I simply merged the master into here.

mswart · 2018-01-11T19:42:22Z

I am not aware of any relevant issues with rebasing within feature branches.

Yes, but travis is designed as safety-net if a developer didn't test everything locally or the issues does occurs within development.
During larger adjustments to the core infrastructure same individual parsers might break. So the regression test might be really helpful. If it is a false-negative, the test can simply be adjusted.
Otherwise I fear that the python test will be simply forgotten after some time.

j-maas · 2018-01-11T20:39:27Z

Yes, I only wanted to say that it is most valuable during refactoring. But I totally agree that it is worth running it on travis. I will try to make that happen.

Maybe we can find a way to generate this test automatically for each parser, but I think that's something that would fit better into #40.

j-maas · 2018-01-12T06:31:14Z

parsers/aachen/parser.py

@@ -65,6 +66,17 @@ def add_meals_from_table(canteen, table, day):
        if category and name:
            canteen.addMeal(day, category, name, notes, prices=price_tag)

+            # TODO: Move determinism upstream (PyOpenMensa)


Deterministic XML output should be guaranteed in PyOpenMensa, since it cannot be handled here.

Yes, deterministic output is better ensured by PyOpenMensa.

j-maas · 2018-01-19T11:30:16Z

I've worked on making the regression test more generic. The workflow would be as follows:

You generate the first set of snapshots for your parser.
You check that the snapshot-result.xml is as you expect it to be, given the snapshot-website.html.
You can now run the regression test and it will check that the parser still outputs the same XML. Even when the website changes, the test will rely on the website's snapshot, so you do not have to re-check that the parser's result is actually correct. Happy refactoring!
If the website breaks (i. e. changes structurally, not just the content), you can fix your parser, update the snapshots and go back to step 2.

For now adding parsers to the list (from regression_test.py import parsers_to_test) only works for parsers that are structured like the Aachener one. It relies on having one main parser (parser = Parser(...)) and the canteens as subparsers (parser.define(...)).

j-maas · 2018-01-19T11:30:40Z

I seem to have messed up the Git submodule...

The regression tests that were in the package are now factored out of it.

j-maas · 2018-01-19T19:50:01Z

The package python3-requests-mock is available in Debian jessie. I have tried to use native pytest, but I could not make it work...

j-maas · 2018-02-06T13:46:42Z

I've made the snapshot updater detect HTTP requests automatically. Unfortunately, this requires the usage of from urllib import request together with request.urlopen(...). With this simple change, the updater and the tests are able to mock HTTP requests reliably.

This change is required because of the following issue: https://stackoverflow.com/questions/48614058/mock-urlopen-in-unkown-subpackage/

mswart · 2018-02-07T19:24:28Z

During the last week I kept thinking whether to store website and feed snapshots within this repository.
Sadly I did not came up with an answer. Again and again I though about advantages and disadvantages.
Storing them simplifies development and testing dramatically.
But the snapshots are relatively large. They might be changed often?
My gut feeling says they are data and do not belong within a code repository.

I still hoping for an reasonable alternative:

the data might be stored within an additional repository
LFS (large file storage)
Store them somewhere differently
In the beginning you received to website from webarchive

What is your standpoint?

j-maas · 2018-02-07T19:42:51Z

The snapshots for the (two) websites and the result use 90,2 kBytes. So for 10 parsers that would come in at 1 MByte. I feel that that's not a lot and that they're totally worth that.

The only real reason to update the snapshots I can come up with is that the website changed and broke the parser. Or there is some case that was not present in the old snapshot, but that occurs on the current website. But mostly they should remain the same and be used while refactoring the parser's code.

They are not data in the sense that they are integral parts of a regression test. Therefore they belong with the code, since the developer needs to have them handy while coding.
That said, there might be a way to exclude them from the final package. Do you know how?

mswart · 2018-02-13T17:52:06Z

Ok, I agree the snapshots should be an issue for now.

Moving the tests directory into the top-level directory should exclude it from the installation.
I would be fine with tests/aachen as directory: we primarily test parsers, so the additional parsers directory is not necessary.
Otherwise the installation setup would need an explicit exception/removing step.

Do you want to rebase the changes yourself or should I squash them during merge?

j-maas · 2018-02-13T19:49:08Z

I fixed the location of the tests.

I would be fine with tests/aachen as directory

I'm unsure whether you meant the tests being in parsers/tests/.... That is fixed now.
However, just in case you meant the additional academica folder in (now) parser_tests/aachen/academica: I would argue to keep it. This is the same mapping as used in the command line parse.py aachen academica to invoke a parser.

As to whether to squash or rebase: I discussed that I prefer keeping the individual commits in #68, but you decide. I think it's best to stick to what's been the procedure until now, so squashing is fine.

j-maas · 2018-03-02T22:10:04Z

What's the status on this PR? I would like it to be merged, if everyone agrees. ;)

klemens

Overall I like the idea of having regression tests. This makes mass refactorings even possible without checking every single parser.

However there are some problems that I noticed.

klemens · 2018-03-04T15:11:14Z

parser_tests/update_snapshots.py

+    get_snapshot_website_path, \
+    parse_mocked, parsers_to_test
+
+base_directory = os.path.dirname(os.path.realpath(__file__))


This is not used.

Normally PyCharms flags unused variables, but it somehow missed this one...

klemens · 2018-03-04T15:18:33Z

parser_tests/update_snapshots.py

+def main():
+    if len(sys.argv) < 2:
+        usage_hint = "Usage: `update_snapshots.py <parser> <canteen>`"
+        print("Missing arguments.", usage_hint)


You should exit with a non-zero statuscode in this case and print the usage message to stderr. Also the usage message should mention the --all:

print("Usage: {} --all | <parser> <canteen>".format(sys.argv[0]), file=sys.stderr) sys.exit(1)

klemens · 2018-03-04T15:19:13Z

parser_tests/update_snapshots.py

+    elif len(sys.argv == 3):
+        parser, canteen = sys.argv[1:3]
+        generate_snapshot(parser, canteen)
+


I think there is an else missing with an error.

klemens · 2018-03-04T15:22:30Z

parsers/aachen.py

@@ -30,7 +30,8 @@ def parse_legend(document):

 def parse_all_days(canteen, document):
    days = ('Montag', 'Dienstag', 'Mittwoch', 'Donnerstag', 'Freitag',
-            'MontagNaechste', 'DienstagNaechste', 'MittwochNaechste', 'DonnerstagNaechste', 'FreitagNaechste')
+            'MontagNaechste', 'DienstagNaechste', 'MittwochNaechste', 'DonnerstagNaechste',
+            'FreitagNaechste')


This change seems unrelated.

Yes, this was probably an automatic formatting. I'll just leave it in.

klemens · 2018-03-04T15:24:57Z

.gitignore

@@ -1,3 +1,9 @@
 __pycache__
 build
+
+# Python Virtual Environment
+venv/


Hm, didn't we discuss in the other pull request that it makes more sense to include such things locally in .git/info/exclude?

Yes, sorry! Will fix!

klemens · 2018-03-04T15:40:17Z

parser_tests/regression_test.py

+    with open(get_snapshot_result_path(parser, canteen), encoding='utf-8') as result_file:
+        expected_result = result_file.read()
+        assert result == expected_result, ("Actual result:\n"
+                                           + result


Why is the result shown? I think the diff is enough.

I sometimes wanted to inspect what was being built on Travis. The diff is too polluted to read properly, and I needed the clean result to diff it myself in a nicer view. I hope it's clear what I mean. 😅

Hm, but the failed assert usually happens while developing, where you can just run the parser directly and see the output? (This is a minor point so I am also fine with keeping it if you want)

No, if I think about it, you're right. Maybe it was during development of the regression tests that I needed it, but now it should run the same on travis and locally. Will remove it. 👍

klemens · 2018-03-04T15:54:39Z

parser_tests/update_snapshots.py

+
+    snapshot_website_path = get_snapshot_website_path(parser, canteen)
+    with open(snapshot_website_path, 'w', encoding='utf-8') as file:
+        json.dump(intercepted_requests, file, indent=4)


Why are these stored as json? This makes reading git diffs impossible when updating them because the whole website is in one file. It would be better to store them as plain html. The filename could then eg. just be a hash of the url (and maybe other parameters) to supports multiple urls.

However the bigger problem I see is copyright/author's rights. Can we simply include complete website snapshots in the repo?

Because a parser can call multiple websites, I found it easier to store them in one JSON file under their URL as a key. It was simply the easiest way for me to implement it. We could generate separate HTML-files.

I am not sure about the copyright. A website is meant to be transmitted and copied, right? 😐 And the point of these snapshots is just to cache a request, so that it can be reused later. You're making a good point. I'm not seeing too much trouble here, although I'm not well educated in this area.

A website is meant to be transmitted and copied, right?

The problem is that we are redistributing the website with inclusion into the repo.

I see the problem now. Do you know how to solve this issue? I'm not sure how to proceed.

klemens · 2018-03-04T16:03:56Z

parser_tests/update_snapshots.py

+
+    snapshot_result_path = get_snapshot_result_path(parser, canteen)
+    with open(snapshot_result_path, 'w', encoding='utf-8') as result_file:
+        result = parse_mocked(parser, canteen)


What if the parser failed? Then the source file was already updated, but the result file is not.

I hadn't thought about that in great detail, but I think for now we should be fine:

It will hopefully fail by raising an exception, right? If you generate a snapshot for a specific parser, that should be enough. Maybe we can add a try mechanism when generating all parsers' snapshots and inform the user about which ones failed.

And even if only the website is updated and not the result: This means something is wrong with the parser. Once that's fixed, you could rerun the snapshot update.

I would suggest we observe this and fix it as soon as we have a problem with it.

The exception is the problem, as it aborts the function and result_file.write(result) is never called but son.dump(…) was already. However, the fix is really simple, just put the result = parse_mocked(parser, canteen) call above the writing of the source (json) file.

That's really elegant. 👍

klemens · 2018-03-04T16:15:14Z

parser_tests/regression_test.py

+        return parsers[parser].parse('', canteen, 'full.xml')
+
+
+def get_canteen_url(parser, canteen):


This function is defined here but only used in update_snapshots.py.

klemens · 2018-03-04T16:24:04Z

parser_tests/regression_test.py

+            return MockResponse()
+
+    with mock.patch('urllib.request.urlopen', mock_response):
+        return parsers[parser].parse('', canteen, 'full.xml')


To be honest, I don't like the whole mocking thing. It just feels very brittle and will also fail in more elaborate cases like if the website requires submitting a form with parameters for displaying the menu.

I think it would be better to just change the parsers to adapt to a simplified test framework. Then every parser could use it however it wants. I haven't come up with a complete design but I have some ideas if you are interested.

Yes, definitely! The whole point of this is to just be able to reproduce the answers that a server has given. It is by far not perfect and I'm very intersested in what you have come up with!

A simplified version of the leipzig parser looks like this:

def parse_url(url, today): canteen = LazyBuilder() day = datetime.date.today() parse_day(canteen, '{}&date={}'.format(url, day.strftime('%Y-%m-%d'))) return canteen.toXMLFeed() def parse_day(canteen, url): …

This doesn't work well with your mocking approach because day depends on the actualy day this code is executed. You would have to mock the complete environment to work around this, which doesn't seem feasable. So I instead suggest that parsers that want to implement regression tests implement two functions (the examples still refers to the leipzig parser).

The first function returns a list of RegressionState objects which are serialized and stored by the testing framework. The state is completely custom to the parser and is e.g. stored as a json string:

def generate_state(url): day = datetime.date.today().strftime('%Y-%m-%d') content = urlopen('{}&date={}'.format(url, day)).read() canteen = LazyBuilder() feed = parse_day(canteen, content).toXMLFeed() return [ RegressionState(feed, state = content) ]

The second function is called by the test framework for every stored RegressionState and returns the feed which is compared by the test framework.

def test_state(state): canteen = LazyBuilder() return parse_day(canteen, state).toXMLFeed()

Both functions are registered with the Parser and called through it:

parser = Parser('leipzig', handler = parse_url, regression_test = RegressionTest( generate = generate_state, test = test_state ))

That's great! I also like this way of registering parser to be tested. Will see when I can implement this.

I've tried this a bit, but I'm having a lot of trouble with some details. Maybe I can come back to this later, but for now it seems like it's quite a lot of effort.

If you'd like to try it, please do. 😉

j-maas

Thanks for your review! It's great having your feedback and improvements! :)

j-maas · 2018-03-04T20:58:19Z

Ups, why did I leave a review instead of a comment? 🤔 Well, nevermind...

j-maas · 2018-05-21T13:32:04Z

I've thought about this PR a bit more and I think, it isn't as useful as I first thought. Basically, in practice I expect there to be little value in storing snapshots of the websites, because you need regressions tests only for refactoring. So as soon as the parser breaks and needs to be fixed, by definition its behavior will change, and you will have to re-evaluate by hand that it works correctly, making previously stored snapshots invalid.

Once that is done, though, snapshots might help with refactoring. However, it isn't necessary to store the website snapshot in the repo, because it will be needed only during the refactoring. In addition to the complexity of mocking the HTTP calls generically, this makes it worth considering to drop the website snapshot entirely and only have utility that makes it possible to test whether the parser's output changed, always fetching the current, real website anew.
(That being said, it would be still helpful to make sure that the parser output doesn't change due to changes in the website. So maybe it is still useful to snapshot the website.)

So I think that we might be able to simplify this PR. Additionally, not publishing the website snapshot fixes the legal issue.

j-maas · 2019-04-06T15:04:15Z

Closed in favor of #96.

This was referenced Dec 4, 2017

Fix parser for aachen #59

Merged

Add requirements to setup.py #58

Closed

j-maas mentioned this pull request Dec 5, 2017

Parser for Aachen seems broken #61

Closed

j-maas added the enhancement label Jan 10, 2018

klemens mentioned this pull request Jan 10, 2018

Squashing commits on merge #68

Closed

j-maas force-pushed the regression-test branch from 3a229fb to 96dbcf6 Compare January 12, 2018 06:16

j-maas commented Jan 12, 2018

View reviewed changes

Y0hy0h added 5 commits January 14, 2018 11:23

Add regression test

015a38b

Create aachen package and move parser file and tests into it

fb7c4f8

Fix aachen_test.py import and rename to regression_test.py

4615e5c

Test parsers/ on build

f909889

Make XML output deterministic

fd8547e

j-maas force-pushed the regression-test branch from 96dbcf6 to fd8547e Compare January 14, 2018 10:24

Y0hy0h added 4 commits January 19, 2018 11:05

Base regression test on snapshots

b235233

Update .gitignore to exclude virtual environment directories

8f15ab7

Determine parser's URL programmatically

dc8cc32

Make regression tests generic

1727544

Y0hy0h added 2 commits January 19, 2018 20:07

Flatten Aachener parser package

0c963c1

The regression tests that were in the package are now factored out of it.

Print feedback when updating snapshots

0172252

j-maas force-pushed the regression-test branch from 948ae7e to 0172252 Compare January 19, 2018 19:07

Add missing build dependency

65c2b26

Make regression test independent of request-mock package

d673ff6

Y0hy0h added 2 commits February 6, 2018 14:41

Fix Aachener usage of requests for testability

519ac05

Update snapshots

ecce574

j-maas mentioned this pull request Feb 7, 2018

New OpenMensa model classes and use of alternative website for Aachen #70

Closed

j-maas added Parser Aachen and removed Parser Aachen labels Feb 9, 2018

j-maas changed the title ~~Regression test for Aachen~~ Regression tests Feb 9, 2018

Move regression tests to root

75d9b74

j-maas mentioned this pull request Mar 3, 2018

Aachen refactor #73

Closed

klemens requested changes Mar 4, 2018

View reviewed changes

Y0hy0h added 4 commits March 4, 2018 21:32

Fix update script argument count error message

ca528bb

Remove custom .gitignore entries

c3767b7

Add shebang to snapshot update_snapshots.py

2752637

Move function to the file it's used

e1f8307

j-maas commented Mar 4, 2018

View reviewed changes

Y0hy0h added 4 commits March 5, 2018 08:34

Remove dump of failed result on regression test

4d12ebb

Rework README.md explanation of how to write parsers

6a9180c

Explain regression tests in README.md

4d6ae42

Fix partial snapshot update bug

062e053

j-maas changed the title ~~Regression tests~~ WIP: Regression tests Mar 5, 2018

j-maas mentioned this pull request Apr 6, 2019

Snapshot tests #96

Open

j-maas closed this Apr 6, 2019

		return parsers[parser].parse('', canteen, 'full.xml')


		def get_canteen_url(parser, canteen):

WIP: Regression tests #63

WIP: Regression tests #63

Conversation

j-maas commented Dec 4, 2017

j-maas commented Dec 4, 2017 • edited Loading

mswart commented Jan 10, 2018 • edited Loading

j-maas commented Jan 10, 2018

mswart commented Jan 11, 2018

j-maas commented Jan 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

j-maas commented Jan 19, 2018 • edited Loading

j-maas commented Jan 19, 2018

j-maas commented Jan 19, 2018

j-maas commented Feb 6, 2018 • edited Loading

mswart commented Feb 7, 2018

j-maas commented Feb 7, 2018

mswart commented Feb 13, 2018 • edited Loading

j-maas commented Feb 13, 2018 • edited Loading

j-maas commented Mar 2, 2018 • edited Loading

klemens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

j-maas Mar 4, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

j-maas Mar 4, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

j-maas Mar 9, 2018 • edited Loading

Choose a reason for hiding this comment

j-maas left a comment

Choose a reason for hiding this comment

j-maas commented Mar 4, 2018

j-maas commented May 21, 2018

j-maas commented Apr 6, 2019

j-maas commented Dec 4, 2017 •

edited

Loading

mswart commented Jan 10, 2018 •

edited

Loading

j-maas commented Jan 11, 2018 •

edited

Loading

j-maas commented Jan 19, 2018 •

edited

Loading

j-maas commented Feb 6, 2018 •

edited

Loading

mswart commented Feb 13, 2018 •

edited

Loading

j-maas commented Feb 13, 2018 •

edited

Loading

j-maas commented Mar 2, 2018 •

edited

Loading

j-maas Mar 4, 2018 •

edited

Loading

j-maas Mar 4, 2018 •

edited

Loading

j-maas Mar 9, 2018 •

edited

Loading