-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: bilingual dictionaries #973
Comments
Changing this line could work : https://github.com/BoboTiG/ebook-reader-dict/blob/master/wikidict/lang/en/__init__.py#L15 |
The German was just an example. The idea is to produce bilingual dictionaries based on the EN one for example (and not from the Translations section of the English words). For example, when it comes to Ancient Greek, there is much larger coverage in main entries rather than entries in the Translations section. Would changing that line you mention suffice? I read the add new local section but I am a little confused on how exactly to run it on a local Wiktionary dump. |
I replaced the line in question by
And ran (sorry, my german is very very limited)
And I got the attached file in the Nacht directory. You can try it on your Kobo and see if the 3 words can be find and look good. |
Thank you! Sadly, I use tsv or Stardict (no Kobo). |
Which language would you be the most interested in ? |
Greek and Ancient Greek. I can see the part of speech templates have an "el" (el-adj, el-verb...) or "grc" prefix for Greek and Ancient Greek respectively. Does the script figure out the templates by itself or one needs to add/finetune them? |
I believe part of speech are not extracted at all right now. @BoboTiG can confirm. We just use them to choose which definition we keep or not. |
Yes, that is what I meant, these templates are needed to decide which part should be extracted and which not :) |
It seems to work without finetuning then.
and ran
I got the following, compare with https://en.wiktionary.org/wiki/%CE%93%CF%81%CE%B1%E1%BF%96%CE%B1
|
Indeed, we are only using parts that mater to the language: the project was not designed for cross-language stuff. You could play with it and see how it works. Make a copy of the |
Well, cross-language could be another possibility then, but thank you so much for all the work so far -:) The Γραῖα example appears to maintain the Etymology, I guess this is not included normally. |
Etymology is always included in the other languages. To run it on a dump, checkout the code, install the requirements, change the line for the language and run
After some time, you will get a directory with .df file. You can convert it to Stardict with pyglossary:
|
But how is the path of the dump defined? |
The dump will be downloaded in data/en |
So the script downloads the dump automatically? |
Yes |
I just ran the first steps, and there are only 16,431 in ancient greek. |
Wow, looks quite good after a quick look. Many thanks. 3.1 is a quotations drop down which is converted to |
the quotation block is supposed to be entirely removed. |
I guess then there is some difference in syntax so that the current regex for that block does not match it. |
Would it be possible to create subdictionaries based on EN wiktionary for other languages?
For example, German-English (here is a German word: https://en.wiktionary.org/wiki/Nacht)
Upvote & Fund
The text was updated successfully, but these errors were encountered: