added a first run of RU locale, added some POS in the definitions output #1271

victornove · 2022-03-31T08:19:12Z

added RU locale, only few templates though as had memory issues with find-templates
modified some DE (flattened definitions with ordered lists) - can remove if needed?
modified definitions to include POS (checked for EN, FR, DE, RU) - had to modify DE to pull POS from different section into definition. - can also remove if needed?

BoboTiG · 2022-03-31T10:04:45Z

That's awesome!

I would be more comfortable with several PRs to target specific changes. When you modify existent code, it will need tests :).
Let's tackle only the new locale here.

Also, I am not sure about other changes, mostly because I do not see a before/after visual, it may help a lot here. Let's open new PRs and discuss about it there.

👍

now RU locale changes

victornove · 2022-03-31T12:47:44Z

Hi again,

yes, I kind of expected to have to to split the problems :D

I removed everything linked to POS, DE and the subsection "flattening"'.

I'm entirely clueless about the testing part (I can read through some docs if needed?). Otherwise , if I manage to get the list of templates (find-templates gets killed by my machine or throws memory errors) and start working through them, I can add more to the template handler.

Also, I was wondering if sometimes using the expanded templates via '&templates=expand' isn't easier? for example I can't find a way to get the pronunciation in russian without it.

PS: for the visuals, you mean visualising the difference output? Something like this?

Before
'''
salut \sa.ly\ m.

Du latin salus, salūtem (« santé, salut »).

Fait d’être sauvé de la mort, d’un danger, d’échapper à une situation désagréable.
...
(Informel) (Familier) Bonjour, bonsoir, à quelqu’un que l’on tutoie.
'''
After
"""
salut \sa.ly\ m.

Du latin salus, salūtem (« santé, salut »).

nom: Fait d’être sauvé de la mort, d’un danger, d’échapper à une situation désagréable.
...
interjection: (Informel) (Familier) Bonjour, bonsoir, à quelqu’un que l’on tutoie.
"""

BoboTiG · 2022-03-31T13:15:27Z

I'm entirely clueless about the testing part (I can read through some docs if needed?). Otherwise , if I manage to get the list of templates (find-templates gets killed by my machine or throws memory errors) and start working through them, I can add more to the template handler.

Testing is easy when you've done it once ;)

Let's say you want to test a word, pick that word on the Wiktionary, and from the "edit-mode" tab, copy the Wikicode. Then, in the folder tests/data/ru/ create a file named <word>.wiki, and paste the Wikicode inside it.
Now, you have the Wiktionary word file, let's add the real test. You could copy any other test file, you seems to work in France, so maybe are you fluent in French, and it will be easier to copy-paste tests/test_fr.py as tests/test_ru.py. Anyway, here is the minimal skeleton for such test file:

import pytest

from wikidict.render import parse_word
from wikidict.utils import process_templates


@pytest.mark.parametrize(
    "word, pronunciations, gender, etymology, definitions",
    [
        # Each and every word must contain all those details, it can be empty list or string when the information is not available
        (
            "<word>",  # this must be the same name (case-sensitivity is important) as "tests/data/ru/<word>.wiki"
            ["pronunciation1", ...],
            "gender",
            ["etymology"],
            ["definition1", ...],
        ),
    ],
)
def test_parse_word(
    word, pronunciations, gender, etymology, definitions, page
):
    """Test the sections finder and definitions getter."""
    code = page(word, "ru")
    details = parse_word(word, code, "ru", force=True)
    assert pronunciations == details.pronunciations
    assert gender == details.gender
    assert definitions == details.definitions
    assert etymology == details.etymology

This is a complete test for a given word. You can add as many words as files you will create in tests/data/ru/*.wiki. Those tests are important to validate the parsing of Russian words. By default, you should add words that cover parsed sections. If you find one word covering all sections, that's great too!

Then, maybe not now, but when you will work on adding templates, you will have to create lighter tests to only cover the template transformation. Here is the skeleton:

@pytest.mark.parametrize(
    "wikicode, expected",
    [
        # ("minimal wikicode template", "expected parsed result")
        ("{{1|Descendant}}", "Descendant"),
        ("{{1er}}", "1<sup>er</sup>"),
        ("{{1er|mai}}", "1<sup>er</sup>&nbsp;mai"),
    ],
)
def test_process_templates(wikicode, expected):
    """Test templates handling."""
    assert process_templates("foo", wikicode, "ru") == expected

BoboTiG · 2022-03-31T13:16:38Z

Could you open an issue about the find-templates error (with the exact command, and the error) so that we could investigate.

PS: for the visuals, you mean visualising the difference output? Something like this?

Yes, exactly :)

Also, I was wondering if sometimes using the expanded templates via '&templates=expand' isn't easier? for example I can't find a way to get the pronunciation in russian without it.

@lasconic do you have a opinion?

lasconic · 2022-03-31T16:37:14Z

Also, I was wondering if sometimes using the expanded templates via '&templates=expand' isn't easier? for example I can't find a way to get the pronunciation in russian without it.

@lasconic do you have a opinion?

If I understand correctly, this is not an option, since it would work only with get-word and not with the whole dump where templates are not "expanded". Right ?

lasconic · 2022-03-31T16:38:53Z

Regarding POS, let's discuss it somewhere else. Maybe here : #1149

wikidict/render.py

tried to correct a unit test error?

victornove · 2022-04-11T09:56:22Z

The problem with find-templates was simply my linux machine running out of memory. I split the json file and worked through it by pieces... not sure its worth opening a ticket for that? Turns out because of this issue there are 18000 templates to work through though!

Also, I'm kind of struggling with the details of getting my tests working, but I think(pray) this last correction will work! Once I get the first test working I can add some more.

BoboTiG

To fix quality code issues, just run ./check.sh locally, and push changes.

tests/test_ru.py

wikidict/lang/ru/__init__.py

Co-authored-by: Mickaël Schoentgen <[email protected]>

…ader-dict into 1270localeRU

victornove · 2022-04-12T08:21:51Z

Ok, I finally installed pytest & dependencies... makes iterations much quicker!
(is there a way to run unit tests selectively? like only for russian language/changed files?)

I guess before releasing the locale we would still need to solve the etymology/pronunciation templates issue, and ideally add a few more tests and template handlers? Anything else?

BoboTiG · 2022-04-12T09:01:04Z

(is there a way to run unit tests selectively? like only for russian language/changed files?)

python -m pytest tests/test_ru.py

added some test cases (esp. homonyms)

sourcery-ai · 2022-05-11T08:21:14Z

Sourcery Code Quality Report

❌ Merging this PR will decrease code quality in the affected files by 0.73%.

Quality metrics	Before	After	Change
Complexity	30.13 😞	30.89 😞	0.76 👎
Method Length	94.40 🙂	96.15 🙂	1.75 👎
Working memory	11.26 😞	11.36 😞	0.10 👎
Quality	44.85% 😞	44.12% 😞	-0.73% 👎

Other metrics	Before	After	Change
Lines	612	622	10

Changed files	Quality Before	Quality After	Quality Change
wikidict/render.py	40.40% 😞	39.71% 😞	-0.69% 👎
wikidict/lang/init.py	81.64% ⭐	81.25% ⭐	-0.39% 👎

Here are some functions in these files that still need a tune-up:

File	Function	Complexity	Length	Working Memory	Quality	Recommendation
wikidict/render.py	find_etymology	49 ⛔	339 ⛔	16 ⛔	13.15% ⛔	Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
wikidict/render.py	parse_word	57 ⛔	324 ⛔	15 😞	13.40% ⛔	Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
wikidict/render.py	find_section_definitions	70 ⛔	248 ⛔	14 😞	16.42% ⛔	Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
wikidict/render.py	find_all_sections	16 🙂	169 😞	12 😞	42.52% 😞	Try splitting into smaller methods. Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

⭐ excellent
🙂 good
😞 poor
⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.

Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

BoboTiG · 2022-05-11T08:30:45Z

@all-contributors please add @victornove for code

BoboTiG · 2022-05-11T08:30:58Z

Thanks a lot @victornove 🍾

allcontributors · 2022-05-11T08:31:12Z

@BoboTiG

I've put up a pull request to add @victornove! 🎉

added a first run of RU locale, added some POS in the definitions output

8a4b9a8

removed changes linked to POS and DE locale

539377b

now RU locale changes

sourcery-ai bot mentioned this pull request Mar 31, 2022

added a first run of RU locale, added some POS in the definitions output (Sourcery refactored) #1272

Closed

lasconic reviewed Apr 2, 2022

View reviewed changes

wikidict/render.py Outdated Show resolved Hide resolved

lasconic mentioned this pull request Apr 2, 2022

[RU] Etymology is stored in a module... How do we deal with it ? #1275

Open

victor added 4 commits April 3, 2022 09:48

update tests and remove error

4667d24

corrected escaped string in tests

1051fd6

reviewed with black

e7fad20

tried to correct a unit test error?

fixed the test text for xa0 characters

a88c5f8

BoboTiG reviewed Apr 11, 2022

View reviewed changes

tests/test_ru.py Outdated Show resolved Hide resolved

wikidict/lang/ru/__init__.py Show resolved Hide resolved

victornove and others added 5 commits April 11, 2022 16:35

removed '\' from tests/test_ru.py

9966bab

Co-authored-by: Mickaël Schoentgen <[email protected]>

update release description

1e9a64f

Merge branch '1270localeRU' of https://github.com/victornove/ebook-re…

6d700c2

…ader-dict into 1270localeRU

another update to pass tests

22af735

again update to pass unit testing

9c8e44d

victor and others added 6 commits April 12, 2022 11:46

updated line lengths to pass tests

8b2945e

forgot a styling change

6ebd77f

fixed unstable behavior when no etymology

7d20010

added some test cases (esp. homonyms)

Delete <vide>.html

d5df3fe

Delete ru-langs.py

7d69914

Delete <vide>.wiki

bf5efa0

BoboTiG added 7 commits May 11, 2022 10:16

Update update-tags.sh

e19b6d7

Update __init__.py

81cc026

Update __init__.py

f2262a1

Update __init__.py

e9b6568

Update render.py

7902c05

Update auto-updates.yml

112b6ab

Update README.md

6a9edc5

BoboTiG merged commit da2b5d1 into BoboTiG:master May 11, 2022

allcontributors bot mentioned this pull request May 11, 2022

docs: add victornove as a contributor for code #1292

Merged

BoboTiG mentioned this pull request May 11, 2022

New locale: Russian #1270

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added a first run of RU locale, added some POS in the definitions output #1271

added a first run of RU locale, added some POS in the definitions output #1271

victornove commented Mar 31, 2022

BoboTiG commented Mar 31, 2022

victornove commented Mar 31, 2022

BoboTiG commented Mar 31, 2022 •

edited

Loading

BoboTiG commented Mar 31, 2022

lasconic commented Mar 31, 2022

lasconic commented Mar 31, 2022

victornove commented Apr 11, 2022

BoboTiG left a comment

victornove commented Apr 12, 2022

BoboTiG commented Apr 12, 2022

sourcery-ai bot commented May 11, 2022

BoboTiG commented May 11, 2022

BoboTiG commented May 11, 2022

allcontributors bot commented May 11, 2022

added a first run of RU locale, added some POS in the definitions output #1271

added a first run of RU locale, added some POS in the definitions output #1271

Conversation

victornove commented Mar 31, 2022

BoboTiG commented Mar 31, 2022

victornove commented Mar 31, 2022

BoboTiG commented Mar 31, 2022 • edited Loading

BoboTiG commented Mar 31, 2022

lasconic commented Mar 31, 2022

lasconic commented Mar 31, 2022

victornove commented Apr 11, 2022

BoboTiG left a comment

Choose a reason for hiding this comment

victornove commented Apr 12, 2022

BoboTiG commented Apr 12, 2022

sourcery-ai bot commented May 11, 2022

Sourcery Code Quality Report

Legend and Explanation

BoboTiG commented May 11, 2022

BoboTiG commented May 11, 2022

allcontributors bot commented May 11, 2022

BoboTiG commented Mar 31, 2022 •

edited

Loading