questions about caching with path #442

mkoohafkan · 2024-02-15T17:58:27Z

I'm a little confused about if/how caching works when path is specified. Modifying the example in the documentation for req_cache():

library(httr2)

# create cache directory
td = file.path(tempdir(), "cache")
dir.create(td)

url <- paste0(
  "https://raw.githubusercontent.com/allisonhorst/palmerpenguins/",
  "master/inst/extdata/penguins.csv"
)

# Here I set debug = TRUE so you can see what's happening
req <- request(url) |> req_cache(td, debug = TRUE)

# First request downloads the data
tf <- tempfile(fileext = ".csv")
resp <- req |> req_perform(path = tf)

toString(resp$body)
## [1] "C:\\Temp\\1\\RtmpExocAw\\file53f823441cef.csv"

# Second request retrieves it from the cache
tf2 <- tempfile(fileext = ".csv")
resp <- req |> req_perform(path = tf2)
## Found url in cache "d5d1ddd7f99f55dbc920c63f942804c0"
## Cached value is fresh; retrieving response from cache

toString(resp$body)
## [1] "C:\\Temp\\1\\RtmpExocAw/foo/d5d1ddd7f99f55dbc920c63f942804c0.body"

file.exists(tf2)
## [1] FALSE


# wait a while, cache is now stale
tf3 <- tempfile(fileext = ".csv")
 resp <- req |> req_perform(path = tf3)
## Found url in cache "d5d1ddd7f99f55dbc920c63f942804c0"
## Cached value is stale; checking for updates
## Cached value still ok; retrieving body from cache

toString(resp$body)
## [1] "C:\\Temp\\1\\RtmpExocAw\\file53f815c51993.csv"

file.exists(tf3)
## [1] TRUE

When the cached value is fresh, the response returns a different path, which makes sense since there is no guarantee that the path specified in the original call still exists when the request is made a second time. My questions are:

Is the file downloaded once and written to both the path and cache? (Appears to be the case).
Is there a way to tell the cache to use the same file extension as specified in path? I can see this possibly being an issue for some functions that expect a certain file extension.
If the cache is stale, it appears to re-download the file. If it decides the cached value is still ok, it claims to retrieve the body from cache but actually provides path. It seems to do this for every subsequent request, i.e., the cache is not "refreshed" and it continues to think the cache is stale. Is this a bug?

The text was updated successfully, but these errors were encountered:

hadley · 2024-02-20T18:30:21Z

Hmmmm, I'm not sure how well I thought through the case of using path with cache files, because the behaviour certainly doesn't look correct to me. I think the behaviour you see in the first request is correct: we save the file to the cache and copy it to the requested path. But we also need to do that for the second request, so that the specified path is actually used (that would fix the problem with the extension too). I'm not sure what's going on with the stale cache; that definitely sounds like a bug.

I don't have time to look into this in more detail right now, but I'll definitely take a look and fix when I'm next working on httr2 so thanks for filing this issue!

mkoohafkan · 2024-03-04T07:52:26Z

Regarding question 2, one approach might be to pass path to cache_pre_fetch(), and replace

httr2/R/req-cache.R

Line 181 in 824f142

cache_get(req)

with something like

    cached_req = cache_get(req)
    if (!is.null(path)) {
      file.copy(cached_req$body, path, overwrite = TRUE)
      cached_req$body <- path
    }
    cached_req

But I haven't thought through the full consequences of that.

hadley · 2024-09-03T17:16:43Z

Simpler reprex:

library(httr2)
library(testthat, warn.conflicts = FALSE)

req <- request(example_url()) |>
  req_url_path("/cache/2") |> 
  req_cache(tempfile(), debug = TRUE)

path1 <- tempfile()
resp1 <- req |> req_perform(path = path1)
#> Pruning cache
#> Saving response to cache "38c683fd8d6c408b437f509bc0f0ca9b"
expect_equal(resp1$body[[1]], path1)

path2 <- tempfile() 
resp2 <- req |> req_perform(path = path2)
#> Found url in cache "38c683fd8d6c408b437f509bc0f0ca9b"
#> Cached value is fresh; using response from cache
expect_equal(resp2$body[[1]], path2)
#> Error: resp2$body[[1]] not equal to `path2`.
#> 1/1 mismatches
#> x[1]: "/tmp/RtmpgJztVU/file79df1addc1e6/38c683fd8d6c408b437f509bc0f0ca9b.body"
#> y[1]: "/tmp/RtmpgJztVU/file79df2a6fb4d6"

Sys.sleep(2) # wait for cache to expire
path3 <- tempfile() 
resp3 <- req |> req_perform(path = path3)
#> Found url in cache "38c683fd8d6c408b437f509bc0f0ca9b"
#> Cached value is stale; checking for updates
#> Saving response to cache "38c683fd8d6c408b437f509bc0f0ca9b"
expect_equal(resp3$body[[1]], path3)

path4 <- tempfile() 
resp4 <- req |> req_perform(path = path4)
#> Found url in cache "38c683fd8d6c408b437f509bc0f0ca9b"
#> Cached value is fresh; using response from cache
expect_equal(resp4$body[[1]], path4)
#> Error: resp4$body[[1]] not equal to `path4`.
#> 1/1 mismatches
#> x[1]: "/tmp/RtmpgJztVU/file79df1addc1e6/38c683fd8d6c408b437f509bc0f0ca9b.body"
#> y[1]: "/tmp/RtmpgJztVU/file79df529ab65d"

^{Created on 2024-09-03 with reprex v2.1.0}

Fixes #442

hadley added the bug an unexpected problem or unintended behavior label Feb 20, 2024

hadley added a commit that referenced this issue Sep 3, 2024

Also use cache_body() in cache_prefetch()

af5fd61

Fixes #442

hadley mentioned this issue Sep 3, 2024

Also use cache_body() in cache_prefetch() #531

Merged

hadley closed this as completed in #531 Sep 3, 2024

hadley closed this as completed in 579bc3f Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

questions about caching with path #442

questions about caching with path #442

mkoohafkan commented Feb 15, 2024 •

edited

Loading

hadley commented Feb 20, 2024

mkoohafkan commented Mar 4, 2024 •

edited

Loading

hadley commented Sep 3, 2024

questions about caching with path #442

questions about caching with path #442

Comments

mkoohafkan commented Feb 15, 2024 • edited Loading

hadley commented Feb 20, 2024

mkoohafkan commented Mar 4, 2024 • edited Loading

hadley commented Sep 3, 2024

mkoohafkan commented Feb 15, 2024 •

edited

Loading

mkoohafkan commented Mar 4, 2024 •

edited

Loading