-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added a first run of RU locale, added some POS in the definitions output #1271
Conversation
That's awesome! I would be more comfortable with several PRs to target specific changes. When you modify existent code, it will need tests :). Also, I am not sure about other changes, mostly because I do not see a before/after visual, it may help a lot here. Let's open new PRs and discuss about it there. 👍 |
now RU locale changes
Hi again, yes, I kind of expected to have to to split the problems :D I removed everything linked to POS, DE and the subsection "flattening"'. I'm entirely clueless about the testing part (I can read through some docs if needed?). Otherwise , if I manage to get the list of templates (find-templates gets killed by my machine or throws memory errors) and start working through them, I can add more to the template handler. Also, I was wondering if sometimes using the expanded templates via '&templates=expand' isn't easier? for example I can't find a way to get the pronunciation in russian without it. PS: for the visuals, you mean visualising the difference output? Something like this? Before Du latin salus, salūtem (« santé, salut »).
Du latin salus, salūtem (« santé, salut »).
|
Testing is easy when you've done it once ;) Let's say you want to test a word, pick that word on the Wiktionary, and from the "edit-mode" tab, copy the Wikicode. Then, in the folder import pytest
from wikidict.render import parse_word
from wikidict.utils import process_templates
@pytest.mark.parametrize(
"word, pronunciations, gender, etymology, definitions",
[
# Each and every word must contain all those details, it can be empty list or string when the information is not available
(
"<word>", # this must be the same name (case-sensitivity is important) as "tests/data/ru/<word>.wiki"
["pronunciation1", ...],
"gender",
["etymology"],
["definition1", ...],
),
],
)
def test_parse_word(
word, pronunciations, gender, etymology, definitions, page
):
"""Test the sections finder and definitions getter."""
code = page(word, "ru")
details = parse_word(word, code, "ru", force=True)
assert pronunciations == details.pronunciations
assert gender == details.gender
assert definitions == details.definitions
assert etymology == details.etymology This is a complete test for a given word. You can add as many words as files you will create in Then, maybe not now, but when you will work on adding templates, you will have to create lighter tests to only cover the template transformation. Here is the skeleton: @pytest.mark.parametrize(
"wikicode, expected",
[
# ("minimal wikicode template", "expected parsed result")
("{{1|Descendant}}", "Descendant"),
("{{1er}}", "1<sup>er</sup>"),
("{{1er|mai}}", "1<sup>er</sup> mai"),
],
)
def test_process_templates(wikicode, expected):
"""Test templates handling."""
assert process_templates("foo", wikicode, "ru") == expected |
Could you open an issue about the
Yes, exactly :)
@lasconic do you have a opinion? |
If I understand correctly, this is not an option, since it would work only with |
Regarding POS, let's discuss it somewhere else. Maybe here : #1149 |
tried to correct a unit test error?
The problem with find-templates was simply my linux machine running out of memory. I split the json file and worked through it by pieces... not sure its worth opening a ticket for that? Turns out because of this issue there are 18000 templates to work through though! Also, I'm kind of struggling with the details of getting my tests working, but I think(pray) this last correction will work! Once I get the first test working I can add some more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix quality code issues, just run ./check.sh
locally, and push changes.
Co-authored-by: Mickaël Schoentgen <[email protected]>
Ok, I finally installed pytest & dependencies... makes iterations much quicker! I guess before releasing the locale we would still need to solve the etymology/pronunciation templates issue, and ideally add a few more tests and template handlers? Anything else? |
|
added some test cases (esp. homonyms)
Sourcery Code Quality Report❌ Merging this PR will decrease code quality in the affected files by 0.73%.
Here are some functions in these files that still need a tune-up:
Legend and ExplanationThe emojis denote the absolute quality of the code:
The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request. Please see our documentation here for details on how these metrics are calculated. We are actively working on this report - lots more documentation and extra metrics to come! Help us improve this quality report! |
@all-contributors please add @victornove for code |
Thanks a lot @victornove 🍾 |
I've put up a pull request to add @victornove! 🎉 |
added RU locale, only few templates though as had memory issues with find-templates
modified some DE (flattened definitions with ordered lists) - can remove if needed?
modified definitions to include POS (checked for EN, FR, DE, RU) - had to modify DE to pull POS from different section into definition. - can also remove if needed?