Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RU] Etymology is stored in a module... How do we deal with it ? #1275

Open
lasconic opened this issue Apr 2, 2022 · 3 comments
Open

[RU] Etymology is stored in a module... How do we deal with it ? #1275

lasconic opened this issue Apr 2, 2022 · 3 comments

Comments

@lasconic
Copy link
Collaborator

lasconic commented Apr 2, 2022

  • Wiktionary page:

Etymology is Этимология in russian.

Wikicode:

Происходит от {{этимология:лоскут|да}}

This will call the Etymology template, which call the module Etymology:лоскут, which store the etymology for the word... Crazy...


@victornove implemented it in his PR by making a get call to wiktionary. It's probably ok to do that for a single word in get-word but can we do better when we process the dump ? Are the modules stored in the wiktionary dump ?

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@victornove
Copy link
Contributor

Hi,
this is also a problem for the pronunciation:
{{transcription-ru|произноше́нье} calls some ru-pron module which generates the pronunciation of the word. I'm not sure I could get the content otherwise.

@victornove
Copy link
Contributor

I had a look in the dump, I found over 15000 etymology templates with "find-templates". The complete wiktionary list has 17000 of them: https://ru.wiktionary.org/wiki/Категория:Шаблоны_этимологии.
I can't find the the expanded text in the dump used here. I'm not sure all the namespaces are included? It must be somewhere in one of the dumps, these guys seem to find template definitions and execute them with Lua, but I haven't figured out how yet: https://github.com/tatuylonen/wikitextprocessor.

The only way I found so far to get the template content without querying the full html page is to call the special template parser page:

https://ru.wiktionary.org/wiki/Служебная:Развёртка_шаблонов?wpInput={{transcriptions-ru|страни́ца|страни́цы|Ru-страница.ogg}}

@lasconic
Copy link
Collaborator Author

Good to know the etymology modules are in the dump. We could extract them in a separate json file in the "parse" step and keep a second list to query when we "render".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants