-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need help with understanding JWPUB format #1
Comments
Hi! I had this idea (scrapping jwpub files) somedays ago and was searching for anything about these JW Library files. Appearly, these files have some linking directly with jw.org, but even then, I don't got anything about how this linking works. By the way, I was thinking that bytecode should be a id from words table too, but I also don't think this is a directly id, maybe have some instructions that JW made for it. You're a Jehovah Witness? |
I even sent a couple of emails with a request for documentation, but got a response that said that they are unable to answer my question from this email address. So my idea was to call the number from https://www.jw.org/en/jehovahs-witnesses/contact/united-states/, but recently I didn't had much time, so I didn't do that. And yes, I am |
I've put a lot of time into understanding this format, but still no results worth showing. It's sad that most of the new publications are PDF/JWPUB only, PDF just doesn't scale well, and JWPUB is ugh., I still have an idea - scraping |
Haha, I don't think they would simply give us their code, sadly. Anyways, scrapping After all, all we can do is trial and error. We have at least a hint, that files works like a Epub, with XML files inside it, the difference it's hard modified and for some reason there's binary code that isn't a match with a list of words. I will try doing something with my knowledge with Python, I don't know that I will help in something, but at least I will try for fun. I really like the fact to use JW Library in PC, and it's sad that Watchtower don't have ported it to some Linux distro. I don't think it will go for a long time, maybe some day they release a version for a famous Linux distro. A final question... I saw with your project that you use the app API from JW but, this is allowed? Isn't a violation from some of the App's terms of use? |
Hello! So, I made some experiencies with the JWPUB file to know how it works and I think I got some hot things working! First of all, content is directly related with the page and don't accept something new in (maybe because content have a fixed size bytes and I inserted more than that? I don't know). Furthermore, the Words table don't work the way we thought, I changed a word in this table and all I got is the way I find it on the book, now I need to search by "subjecters" instead "subject", and after all, the word in the documents keep the same. So, after all, I got a "How to Remain in God's Love" Book with the subjects section with title "Edited Subjects" and a blank "Letter from the Governing Body". EDIT: I read the documentation that you gave, maybe the begin and end can be the initial byte and the final byte to be converted, but there's the question: Converted in what if it's not an index from words table? |
Wow! That's great! I lost so much time with the Have you seen things below the sentence You also need to have there:
Which is quite short, so my guess was that it use Or another scenario the |
Okayyy I think I got a problem with the customized JWPUB and I don't know what exactly was charging it. I saw what was in the jwpub converting doc before and yeah, it could be it but... There's something strange with it and I don't know what exactly happened. I changed a lot of things in the original db because I thought I was compacting it with a new jwpub file with my code but no, and when I fixed that, I had changed a lot in the DB and I think I got a corrupted publication (Or modified so long that it's don't load anything). Remembering that I changed just one content column. But this is really strange, I didn't saw that yesterday but even then it's strange how it's going. After all, there's a lot of things working behind the jwpub specifications, there's even a schema specification for publication view and, with words table, there's some strange tables that's is like a pre compiled search. I'm really thinking about what some of a reading program forum responded to a request to create a support for the JW files, they said these files have requests for the JW API. I don't trust in everything, but this really was stuck in my mind, but even then doesn't make any sense, why a 100mb or + will need from JW? And if it's, how the pioneers book are distributed? |
I'll check the network thing tonight.. I'll download a publication and just watch for the traffic in burp suite, that should clarify if the requests are sent there or not. |
So first of all, I had some problems with android studio, then it was just late and I forgot to reply. |
Yea.. but if we fail that's the only option.
Not really - it can be converted on the go, and then just kept in some html format.
That's sad :( I wish that there would be a decent watchtower library app made with gtk ;p
Since I don't reupload the content, it is legal, but I'm not a lawyer https://www.jw.org/en/terms-of-use/ and even if it's against the terms.. sigh. I'm not switching back to android, so I'll continue to develop this app. (sorry for late reply.. I missed this comment) |
No no! It's okay, brother! I too don't have so much time for searching more these days, after all, I'm still have 15 years old and have some homework to do here for school. ^^ But I will still following the project flow, if I can get something new here, I make a new response on this issue. And if you can't got any new thing from the JWPUB convertion, you still have a more easy task to do, like the video player :) (I really like the way the PC JW Library app can be easily "hacked" to have a new video on, lol) |
Ignore what this weirdo said. It's here: https://github.com/Miaosi001/JW-Library-macOS/blob/main/JWLibrary/SubViews/PubbView.swift |
@MrCyjaneK did you figure out how to read Document.Content? |
I'm not working on this app anymore, spending time on open source alternative to something that is clearly using DRM when it shouldn't (can somebody give me one single reason for which it is worth to encrypt such content when it is freely available?) And the elephant in the room. WHY isn't the app open source in the first place? Until somebody gives me answers to that questions I'm not going to work on this project. wol.jw.org is enough for me.
|
Security? If anyone could get a publication and easily edit it the risk of spreading misleading information would be very high. |
@darioragusa As they can do with .epub, .mobi, and .pdf. Also there is a tool for that used widely in the internet, you can sign things with PGP that would allow 3rd party apps to be developed and would cause less risk (currently we can edit the publications - drm is defectivebydesign.org). |
@MrCyjaneK I know you can edit the other formats without problems but the most of us use the JW Library app. I download a jwpub knowing that it comes from jw.org or the app and I trust the content. It's not a random txt file sent by a random guy opened with Word or Adobe Reader which may or may not contain the correct informations. An example: if I send to my grandma an EPUB she my be not able to open it but, if I send a jwpub she taps the file, a trusted app she always use show up and for her it's all ok: a normal article with the reliable content that is supposed to be there. A jwpub can still be edited but it's not a thing that anyone with basic knowledge of Word can do: less editors -> less edited files. Perhaps I'm totally wrong but those are my two cents. |
The thing is current method allows editing, and signing would make it impossible while allowing moders like us to easily read the content |
I don't know much about signing files, but I guess that the app should have a key and using this key with (something, idk) they should get a value. It's like checking the hash? If a bit changes the value is different? |
It's like checking if the content was modified, the content can be signed to verify that it was created by somebody and after modding it the signature will not match. It's like encrypting but you can see the content and can't modify it. |
Ok, but this way they shouldn't save the signatures for every version of every article in every publication in every language? |
pgp signatures do not add a lot of extra size to publication so I don't consider this a problem. (hence you could sign a sha512sum of publication and get similar result) + you can sign them as they are served to download. |
If the signature is stored with the publication what stops me to change it? |
You can change it - you can even sign it with your key but it will be invalid
|
If you need more specific help, I can help you. Me and the Reviw community have created all these JWPUBs: livrasand.github.io Send me an email and we can talk about how to help you |
hi everyone what's up ? |
Sorry for late, I was in vacation. What is "design interface (Ui)"? I'm not dev, just a normal guy |
It's User Interface, probably they were referring to Graphical User Interface, the window that the application show the text and images |
Good news, guys! I discovered how to extract all the styles from publications. I kinda wasted some time trying to discover it in past, but now I got how they setup the styles. Basically, I did this:
Another thing I discovered, (Someone already know this, I think) is that backup files hold some metadata for publication markups, like color index, paragraph index, Token index start and index end. There are 3 tables if I'm not mistaken that holds markup data, one with markup start end and paragraph, one with location/publication and another with colorIndex. I still have to mess up with path matches, since I'm back to Windows and compiling the code to Linux would not work at all. But I will discover it out how to make it more legible. For now, I just bloated the code from Rust to JavaScript with React, so it still look a mess. When done, I will push a commit to my repo. |
I couldn't find it, I found it once, but didn't at the second time, neither Windows or Android, just on EDIT: Nah, nevermind, just found it on msibundle from windows store version |
Hey @orangethewell |
Sadly I can't, since it can fall into a copyright content infringement, but it isn't that hard to get it, just download the windows edition, unzip the file, rename the msixbundle to a zip extension, unzip it, same step on any of the versions inside msixbundle, preferably the suffixed with x64, unzip it then the MEPS unit is available in Data folder |
Perfect - thank you very much. I had searched in vain for a long time for the relationship between LanguageID and symbol in the installation directory (library) on the Mac and couldn't find anything. Now I see that I should have looked directly in the installation package. |
Hi man, thanks for your reply and indeed I was talking about that. I don't understand everything through the different posts, but I need help please: I want to try to help a small congregation in Guadeloupe, the elders are a bit old, there are not many young and not many ministerial servant. I would just like to do learn 2 things mainly: retrieve the content of the publications (bible and mwb mainly) in json format from JW.org, and be able to create a jwpub file. If someone can help me that would be really nice brothers 🙏 (Pr 15:22) |
I don't see a reason to get content and make a jwpub file from a publication that already has one, but, even that is your objective, you would have to parse the html from JW.org into the json struct you need and reparse it into a html to bundle into jwpub database. |
You would have to understand your objective, as @orangethewell mentioned, downloading the Bible and editing it doesn't make much sense. You mentioned helping the old elders, but how would that help them? Isn't it just downloading it from the official jw.org website? What is the main idea? That way we can help you. |
Thanks @gokusander and @orangethewell . These are two different projects. We'll start with the main one: making the work of the Life and Ministry meeting manager easier, by retrieving the week's schedule, to make it easier for him to schedule participants. I've already had the opportunity to create a private Chrome extension that parses HTML, but for scalability reasons it's not very practical, so I'm looking for a way to be able to retrieve a Json directly from the site JW.org, rather than having to generate it myself. |
i believe you can download jwpub files directly from jw.org
if it is just the schedule, https://github.com/AntonyCorbett/OnlyT has a way to to retrieve the times for each participant |
Thanks but is not what i need. |
so what do you actually need? |
firstly, what I want is to retrieve (in json) the content of the mwb from the JW.org site. Example: for the text of the day, we can do it with wol.jw.org/wol/dt/r30/lp-e/2025/1/30 There must be that for the mwb I suppose... do you have any suggestions please brothers? |
Hi @in-Load, I have such a function in Kingdom Hall Attendant, you can take it and use it as you please. https://github.com/livrasand/Kingdom-Hall-Attendant/blob/main/app.py#L33420def extract_data_from_WOL(year, week):
url = f"https://wol.jw.org/es/wol/meetings/r4/lp-s/{year}/{week}"
try:
# Establece un timeout de 10 segundos
response = requests.get(url, timeout=10)
# Lanza un error si la respuesta no es 200 OK
response.raise_for_status()
except requests.exceptions.Timeout:
print("La solicitud ha superado el tiempo de espera.")
return None, {} # Retorna None y un diccionario vacío si hay un timeout
except requests.exceptions.RequestException as e:
print(f"Ocurrió un error: {e}")
return None, {} # Retorna None y un diccionario vacío si hay un error
else:
soup = BeautifulSoup(response.content, 'html.parser')
week_info = soup.find('h1').text
data = {}
current_h2 = None
for element in soup.find_all(['h2', 'h3']):
if element.name == 'h2':
current_h2 = element.text.strip()
data[current_h2] = []
elif element.name == 'h3' and current_h2:
data[current_h2].append(element.text.strip())
return week_info, data
def get_previous_and_next_urls(year, week):
# Calcular la semana anterior
previous_week_date = datetime.datetime.strptime(f"{year}-{week}-1", "%Y-%W-%w") - datetime.timedelta(weeks=1)
previous_year = previous_week_date.year
previous_week = previous_week_date.isocalendar()[1]
url_previous = f"/nuevo-vida-ministerio?year={previous_year}&week={previous_week}"
# Calcular la semana siguiente
next_week_date = datetime.datetime.strptime(f"{year}-{week}-1", "%Y-%W-%w") + datetime.timedelta(weeks=1)
next_year = next_week_date.year
next_week = next_week_date.isocalendar()[1]
url_next = f"/nuevo-vida-ministerio?year={next_year}&week={next_week}"
return url_previous, url_next |
Great 👍, thank you very much @livrasand 🙏. I just did a test, and I saw how to modify your code to retrieve it in the right way. On the other hand, it forces me to parse it... 🤔 do you know if it is possible to retrieve the content directly in json please? |
I am creating a personal study .jwpub. I would like to add the search engine to my .jwpub file, could someone help? I have already created the entire .db with images, videos, notes and references. All that is missing is the word search. |
Hello my brother @gokusander 🤗 I won't be able to help you unfortunately but I would like you to teach me please, I would also like to do that! Would you prefer that we discuss it in private? |
It could be right here, I'm not an expert on the subject. For what purpose do you plan to create it? Jwpub is a .db database that needs to be edited in html and decrypted to be read by the app. I used Reviw for that (by @livrasand ) |
well just like you: I want to be able to create a file for my own notes or my own assembly summaries for example. |
Word search use some bitwise magic to get words and associated data, take a look on livrasand/Reviw#106 |
Voce conseguiu criar esse sistema de busca? Eu realmente não entendo de dev. (sou BR também) |
Hi, maybe this could help. |
Não, ainda sem sucesso e um pouco sem tempo. Na verdade, a última coisa que eu estava trabalhando na minha versão para Linux era o sistema de marcação de texto, que é um tanto complexo porque o JW Library usa um sistema de tokenização do texto. Acredito que essa tokenização funcione de forma parecida para o sistema de busca, mas não tenho certeza. Mas até o momento, o que eu e o @livrasand encontramos foi apenas requisições aos dados das tabelas Search e outras relacionadas a ela, onde o app faz uma conta em escala de bits. Espero em breve poder estudar um pouco mais sobre, mas é porque ultimamente eu realmente estou com tempo apertado 😔 |
Ahh sim, sem problemas meu mano. Você já está criando algo mais complexo, eu estou apenas criando meu jwpub pessoal de estudo e pastoreio. Essa busca seria só pra encontrar algo que anotei no meu jwpub. O livrado meio que abandonou o projeto, mas sorte que aprendi a fazer um do zero mais ou menos, só faltando essa função mesmo. Mas qualquer coisa que precisar e eu puder ajudar só me pingar. Não sou programador, eu trabalho na área da saúde, então o que sei é o básico do básico ahahha Agradeço mesmo assim, abraços |
I really don't know anything about programming. What I learned to do was from the manual he created a while back. I've been doing my own stuff based on that. I'll try to read what he sent me about the research. I don't know how to work with reverse scripting, my job is in the health field, hahaha. The jwpub is just for personal study and shepherding call. |
I have no idea how to get words out of
Content
in.db
file located in jwpub archive. what I know. So any help is needed.The text was updated successfully, but these errors were encountered: