Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: using Zotero translators to extract pdf and bibtex entries from different websites #809

Closed
mpedramfar opened this issue Oct 30, 2020 · 4 comments

Comments

@mpedramfar
Copy link

Zotero is a widely used free and open source reference management tool. One of its useful features is the Zotero connector, which is a browser extension which can be used to download papers from almost any source online. The way they handle so many online sources is with Zotero translators. For any online source (e.g. ArXiv), there is a javascript file that has the code to download the pdf file and metadata from the link. Currently there seems to be about 500 Zotero translators and there is an active community of people who write new translators and maintain them.

Right now, it seems that for every online source, org-ref would need some extra code and extra functions. For example, to get a paper from ArXiv, I use the function arxiv-get-pdf-add-bibtex-entry from org-ref-arxiv.el, which works only with ArXiv and no other source.

I was wondering if it would be possible for org-ref to use Zotero translators directly. Maybe the user can have a folder containing different translators, similar to Zotero. This way org-ref will be able to extract data from any website that Zotero can, without any hassle. All the user has to do is to download the translator for that website and add it to that folder. (or just pull the latest commit from the Zotero translators repo, if a new translator is added there)

@mpedramfar
Copy link
Author

P.S. There is a translation server that can use Zotero translators without Zotero client. I guess if org-ref could run this server and then sends queries to it, it would be able to use the full power of translators.

@mpedramfar
Copy link
Author

So I looked into the translation server and wrote a few functions.
The function zot-translate-add-bibtex-entry below is similar to arxiv-add-bibtex-entry, except it queries the translation server API instead of ArXiv API.

There are two issues.

  1. Right now, the translation server doesn't give us a way to download the pdf files.
    There is a pull request, changing only a few lines, that makes the translation server include the address(es) for the pdf files in its response. I don't know why they haven't merged it yet, but hopefully they will soon. When they do, adding a small function to download the pdf files would be easy. In the meantime, one can use the repo that has the updates to give us the urls. I suppose when they do merge it, the merged version will not be identical to this repo. After all, the original authors might have a different API specification in mind, based on where they want to take their project.

  2. The user needs to manually run the translation server.
    As described in the repo, the server can be run with 2 lines of code, using Docker, or 3 lines of code, using npm. Running Docker from Emacs is probably easier, but the container in DockerHub was last updated a year ago. Even if they merge the pull request and solve the issue with pdf file, it might take a while until they upgrade their Docker container.
    Running it using npm would probably need a bit more work, if we want to be sure that the code works cross-platform. (e.g. Linux, Windows and MacOS)
    Another option is to clone the repository and build the Docker container locally. This might be the best way. It also seems that running the server from the Docker would ensure that the translators list is up to date, while npm doesn't update.

(defun zot-translate-get-json (url)
  "Get citation data of URL in Zotero JSON format, using Zotero translation server."
  (let*
      ((url-request-method "POST")
       (url-request-extra-headers '(("Content-Type" . "text/plain")))
       (url-request-data url)
       (response-buffer (url-retrieve-synchronously "http://127.0.0.1:1969/web"))
       (output (with-current-buffer response-buffer
		 (goto-char (point-min))
		 (search-forward "\n\n")
		 (delete-region (point-min) (point))
		 (buffer-string))))
    (kill-buffer response-buffer)
    (if (equal output "URL not provided")
	(user-error "URL not provided")
      output)))


(defun zot-translate-get-bibtex-from-json (json)
  "Convert Zotero JSON format to bibtex, using Zotero translation server."
  (let*
      ((url-request-method "POST")
       (url-request-extra-headers '(("Content-Type" . "application/json")))
       (url-request-data json)
       (response-buffer
	(url-retrieve-synchronously "http://127.0.0.1:1969/export?format=bibtex"))
       (output (with-current-buffer response-buffer
		 (goto-char (point-min))
		 (search-forward "\n\n")
		 (delete-region (point-min) (point))
		 (buffer-string))))
    (kill-buffer response-buffer)
    (if (equal output "Bad Request")
	(user-error "Bad Request")
      output)))



(defun zot-translate-get-bibtex (url)
  "Get bibtex data for URL using Zotero translation server."
  (zot-translate-get-bibtex-from-json (zot-translate-get-json url)))



(defun zot-translate-add-bibtex-entry (url bibfile)
  "Add bibtex entry for URL to BIBFILE, obtained using Zotero translation server."
  (interactive
   (list (read-string
          "url: "
          (ignore-errors (current-kill 0 t)))
         ;;  now get the bibfile to add it to
         (completing-read
          "Bibfile: "
          (append (f-entries "." (lambda (f) (f-ext? f "bib")))
                  org-ref-default-bibliography))))
  (save-window-excursion
    (find-file bibfile)
    (goto-char (point-max))
    (when (not (looking-at "^")) (insert "\n"))
    (insert (zot-translate-get-bibtex url))
    (org-ref-clean-bibtex-entry)
    (goto-char (point-max))
    (when (not (looking-at "^")) (insert "\n"))
    (save-buffer)))

@jkitchin
Copy link
Owner

jkitchin commented Nov 4, 2020

Maybe the best thing here is to create an org-ref-zotero.el file in a pull-request that could be optionally loaded. I don't think I want to add a docker dependency to org-ref, but as an optional setup it could be ok.

@mpedramfar
Copy link
Author

That sounds good. I'm a bit busy now, I'll make a pull request, with the org-ref-zotero.el file and some explanation in the README.md for installing the dependencies and running the docker container on Emacs startup, in a few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants