-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best input format to create new dictionary #356
Comments
Best is kind of subjective and depends on your need or taste, But you can try Dictfile: Though if you want to add images, you will have to embed them in your text file (as base64) which may not be convenient. |
Dictfile format from the documentation URL you posted looks easy enough and can support bold, italic, bullets and even matching with another words. I have all this, but currently in RTF. Need to figure out how to parse all this from RTF. Have some ideas where to start. As I can see, pyglossary can convert from "Kobo E-Reader Dictfile (.df)" to Kobo "E-Reader Dictionary (.zip)" that I can put on my Kobo reader. |
There are several tools and websites that can convert RTF to HTML |
Hello @ilius. Thank you very much for the tips. Now I have all the RTF files. Each dictionary entry as own file. I think I should better write a RTF parser and parse all the bold, italic, bullets. Should be enough. Parsing to html adds way too much extra tags. Need to find example on how for example default english dictonary on Kobo is built. I could see it can also show multiple entries as single word on same screen when you look at word. For example. word1 Is there a Kobo df sample somewhere of an actual dictionary I could look? Examples from the page are no longer working. https://pgaskin.net/dictutil/examples/webster1913-convert.html Maybe someone has them and could share?
Thank you, |
You can try removing extra tags with PyGlossary in command line, by passing
Here is 2 examples from the website: You can also check this repo (they convert |
After days of coding I have finaly manage to create a Kobo df file from all the input dictionary entries. An example of one DF file dictionary entry:
I used also aliases (&) and html code so I can make italic for specific words. Now for final step. I need to test this on my Kobo. I converted from "Kobo E-Reader Dictfile (.df)" to "Kobo E-Reader Dictionary (.zip)" all output file looks fine. Will let you know after I do some more tests. Few more questions on the df format. Do you think this is valid? Two entries with same name, different meaning. One with number 1 and one with number 2 in name. How does Kobo process this?
Or this. Optional character inside parentheses. For example.
Thank you! |
From PyGlossary's point of view, you can even use the same headword (without adding 1 or 2) multiple times. But I don't know how Kobo will process it. If Kobo can render html lists, it would be my preferred form (have all definitions in the same entry). |
I will ask also the author of dictutil. I could merge them somehow. Just need suggestion what is proper way :). Thank you. |
If you want to target other e-book reader users as well, you may also try to test KOReader with StarDict format. Some dictionary apps have "prefix search" feature (show "test (1)" when you lookup test) and some don't. I was just having an interesting discussion here: |
Hello @ilius. Since I only have Kobo reader (my first ebook reader ever) and I love it so much I wish to make this dictionary only for it. Df format is perfect. I now used dictgen to convert it to Kobo zip, as it give me more warning in my df struction that I needed to fix, and it is working fine on Kobo reader. I am in touch with the author to figure out proper df file structure that would work best on Kobo. Fixed the words with (x) inside name but I am still looking for best way to manage word1, word2, word3, ... headwords. I think we can close this ticket and conclude that in my opinion best format to make custom dictionaries for Kobo reader is Kobo df. You can easily make it as it is text based and has well defined structure. You can convert it to Kobo zip (to put in custom-dic path) with PyGlossary or dictgen. |
Hello,
I wish to create a new dictionary for my Kobo reader. I can get around 70.000 dictionary entries in plain text or in RTF format. Each entry as seperated file or I can join them together. I would prefer RTF format as it includes formatting (bold, italic, bullets, etc...).
If needed I could convert RTF to another format and perserve the formatting if format supports it.
I kindy ask for advise what input format would be best, so I can join all the dictionary entries with formatting into a input file, that pyglossary could process and create custom dictionary for Kobo ebook readers.
Thank you very much for the help and a great library.
THJ
The text was updated successfully, but these errors were encountered: