Skip to content

Commit

Permalink
Limit the response size for ORCID lookup
Browse files Browse the repository at this point in the history
To avoid situations when looking up works for an ORCID are encoutering a
Varnish error from the OpenAlex API when the response is too big, we can
only request the DOI, which limits the size of the response. This means
we can continue paging with `per_page=200`.

Fixes #79
  • Loading branch information
edsu committed Jul 11, 2024
1 parent fd25a8b commit 1899346
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 1 deletion.
4 changes: 3 additions & 1 deletion rialto_airflow/harvest/openalex.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,9 @@ def dois_from_orcid(orcid: str, limit=None):

# get all the works for the openalex author id
work_count = 0
for page in Works().filter(author={"id": author_id}).paginate(per_page=200):
for page in (
Works().filter(author={"id": author_id}).select(["doi"]).paginate(per_page=200)
):
for pub in page:
if pub.get("doi"):
work_count += 1
Expand Down
6 changes: 6 additions & 0 deletions test/harvest/test_openalex.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,3 +98,9 @@ def test_pyalex_urlencoding():
)
== 2
), "we handle url URL encoding DOIs until pyalex does"


def test_pyalex_varnish_bug():
# it seems like this author has a few records that are so big they blow out
# OpenAlex's Varnish index. See https://groups.google.com/u/1/g/openalex-community/c/hl09WRF3Naw
assert len(list(openalex.dois_from_orcid("0000-0003-3859-2905"))) > 270

0 comments on commit 1899346

Please sign in to comment.