Feature request: bilingual dictionaries #973

chopinesque · 2021-06-27T07:04:17Z

Would it be possible to create subdictionaries based on EN wiktionary for other languages?
For example, German-English (here is a German word: https://en.wiktionary.org/wiki/Nacht)

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

lasconic · 2021-06-28T12:10:59Z

Changing this line could work : https://github.com/BoboTiG/ebook-reader-dict/blob/master/wikidict/lang/en/__init__.py#L15
But I'm not sure why you would like to do so. For EN/DE kobo dictionary, you might want to check http://download.wikdict.com/dictionaries/kobo/
If it's not what you are looking for, please explain more in details.

chopinesque · 2021-06-28T13:38:57Z

The German was just an example. The idea is to produce bilingual dictionaries based on the EN one for example (and not from the Translations section of the English words). For example, when it comes to Ancient Greek, there is much larger coverage in main entries rather than entries in the Translations section.

Would changing that line you mention suffice? I read the add new local section but I am a little confused on how exactly to run it on a local Wiktionary dump.

lasconic · 2021-06-28T20:12:42Z

I replaced the line in question by

head_sections = ("==German==", "german")

And ran (sorry, my german is very very limited)

python -m wikidict en --gen-dict=Nacht,Kartoffel,schwarz --output=Nacht

And I got the attached file in the Nacht directory. You can try it on your Kobo and see if the 3 words can be find and look good.
dicthtml-en.zip

chopinesque · 2021-06-29T05:34:32Z

Thank you! Sadly, I use tsv or Stardict (no Kobo).

lasconic · 2021-06-29T07:35:16Z

Which language would you be the most interested in ?

chopinesque · 2021-06-29T08:12:59Z

Greek and Ancient Greek.

I can see the part of speech templates have an "el" (el-adj, el-verb...) or "grc" prefix for Greek and Ancient Greek respectively. Does the script figure out the templates by itself or one needs to add/finetune them?

lasconic · 2021-06-29T08:26:14Z

I believe part of speech are not extracted at all right now. @BoboTiG can confirm. We just use them to choose which definition we keep or not.

chopinesque · 2021-06-29T08:30:00Z

Yes, that is what I meant, these templates are needed to decide which part should be extracted and which not :)

lasconic · 2021-06-29T08:39:12Z

It seems to work without finetuning then.
I changed the line to:

head_sections = ("==Ancient Greek==", "ancientgreek")

and ran

python -m wikidict en --get-word="Γραῖα"

I got the following, compare with https://en.wiktionary.org/wiki/%CE%93%CF%81%CE%B1%E1%BF%96%CE%B1

Γραῖα   

A name meaning "grey", from Proto-Indo-European *ǵerh₂- (“to grow old”).


  1. Graea, Boeotia; Greece

BoboTiG · 2021-06-29T08:40:34Z

Indeed, we are only using parts that mater to the language: the project was not designed for cross-language stuff.

You could play with it and see how it works. Make a copy of the langs/en folder to langs/en_grc or something like that and tune templates handling and sections names.

chopinesque · 2021-06-29T08:51:45Z

Well, cross-language could be another possibility then, but thank you so much for all the work so far -:)
Having checked the relevant page, I am a bit at a loss at how to run the script on a wiktionary dump.

The Γραῖα example appears to maintain the Etymology, I guess this is not included normally.

lasconic · 2021-06-29T08:58:12Z

Etymology is always included in the other languages.

To run it on a dump, checkout the code, install the requirements, change the line for the language and run

python -m wikidict en

After some time, you will get a directory with .df file. You can convert it to Stardict with pyglossary:

pyglossary --no-progress-bar --no-color data/en/dict-en.df dict-data.ifo

chopinesque · 2021-06-29T08:59:41Z

But how is the path of the dump defined?
(Yes, I found this project via pyglossary -:) )

lasconic · 2021-06-29T09:00:45Z

The dump will be downloaded in data/en

chopinesque · 2021-06-29T09:01:22Z

So the script downloads the dump automatically?

lasconic · 2021-06-29T09:04:05Z

Yes

lasconic · 2021-06-29T09:05:34Z

I just ran the first steps, and there are only 16,431 in ancient greek.

lasconic · 2021-06-29T09:08:00Z

grc.zip

chopinesque · 2021-06-29T09:22:39Z

Wow, looks quite good after a quick look. Many thanks.
Some issues:

3.1 is a quotations drop down which is converted to <i>Q</i> <b>Od.</b>
https://en.wiktionary.org/wiki/%CE%BB%CE%B1%CE%BC%CE%B2%CE%AC%CE%BD%CF%89

lasconic · 2021-06-29T09:27:24Z

the quotation block is supposed to be entirely removed.

chopinesque · 2021-06-29T09:36:29Z

I guess then there is some difference in syntax so that the current regex for that block does not match it.

BoboTiG changed the title ~~[EN] Include other languages?~~ Feature request: bilingual dictionaries Mar 26, 2023

BoboTiG added the feature request label Feb 10, 2024

BoboTiG added 🏅Sponsor and removed 🏅Sponsor labels Jul 23, 2024

polar-sh bot added the Fund label Jul 23, 2024

BoboTiG mentioned this issue Sep 29, 2024

Bilingual: Sanskrit - English #2157

Closed

BoboTiG pinned this issue Oct 25, 2024

BoboTiG closed this as not planned Won't fix, can't repro, duplicate, stale Oct 25, 2024

Repository owner locked as resolved and limited conversation to collaborators Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: bilingual dictionaries #973

Feature request: bilingual dictionaries #973

chopinesque commented Jun 27, 2021 •

edited by polar-sh bot

Loading

lasconic commented Jun 28, 2021

chopinesque commented Jun 28, 2021

lasconic commented Jun 28, 2021

chopinesque commented Jun 29, 2021

lasconic commented Jun 29, 2021

chopinesque commented Jun 29, 2021

lasconic commented Jun 29, 2021

chopinesque commented Jun 29, 2021

lasconic commented Jun 29, 2021

BoboTiG commented Jun 29, 2021

chopinesque commented Jun 29, 2021 •

edited

Loading

lasconic commented Jun 29, 2021 •

edited

Loading

chopinesque commented Jun 29, 2021 •

edited

Loading

lasconic commented Jun 29, 2021

chopinesque commented Jun 29, 2021

lasconic commented Jun 29, 2021

lasconic commented Jun 29, 2021

lasconic commented Jun 29, 2021

chopinesque commented Jun 29, 2021

lasconic commented Jun 29, 2021

chopinesque commented Jun 29, 2021

Feature request: bilingual dictionaries #973

Feature request: bilingual dictionaries #973

Comments

chopinesque commented Jun 27, 2021 • edited by polar-sh bot Loading

Upvote & Fund

lasconic commented Jun 28, 2021

chopinesque commented Jun 28, 2021

lasconic commented Jun 28, 2021

chopinesque commented Jun 29, 2021

lasconic commented Jun 29, 2021

chopinesque commented Jun 29, 2021

lasconic commented Jun 29, 2021

chopinesque commented Jun 29, 2021

lasconic commented Jun 29, 2021

BoboTiG commented Jun 29, 2021

chopinesque commented Jun 29, 2021 • edited Loading

lasconic commented Jun 29, 2021 • edited Loading

chopinesque commented Jun 29, 2021 • edited Loading

lasconic commented Jun 29, 2021

chopinesque commented Jun 29, 2021

lasconic commented Jun 29, 2021

lasconic commented Jun 29, 2021

lasconic commented Jun 29, 2021

chopinesque commented Jun 29, 2021

lasconic commented Jun 29, 2021

chopinesque commented Jun 29, 2021

chopinesque commented Jun 27, 2021 •

edited by polar-sh bot

Loading

chopinesque commented Jun 29, 2021 •

edited

Loading

lasconic commented Jun 29, 2021 •

edited

Loading

chopinesque commented Jun 29, 2021 •

edited

Loading