Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 127: add offset to ts models #129

Merged
merged 1 commit into from
Nov 18, 2024
Merged

Issue 127: add offset to ts models #129

merged 1 commit into from
Nov 18, 2024

Conversation

SamuelBrand1
Copy link
Collaborator

@SamuelBrand1 SamuelBrand1 commented Nov 15, 2024

This PR closes #127.

@damonbayer identified that the problem with NA forecast output was occurring in numerator forecasts when the data hit zero due to log transform.

I've added an offset following identical pattern (1 / max observed to symmetrise in log domain) to elsewhere in this repo.

@dylanhmorris
Copy link
Contributor

I would've thought offset of 1 for raw counts; we're using 1 / max_visits as the offset for proportions

@damonbayer
Copy link
Collaborator

@dylanhmorris Agree that it's not consistent with what we do in the scoring, but an offset of 1 could be relatively big since the case counts can be so low.

@dylanhmorris
Copy link
Contributor

@dylanhmorris Agree that it's not consistent with what we do in the scoring, but an offset of 1 could be relatively big since the case counts can be so low.

Just subtract 1 when you convert the forecast back to natural scale? (i.e. we explicitly forecast the quantity log(1+x) and then compute x)

@damonbayer
Copy link
Collaborator

damonbayer commented Nov 15, 2024

@dylanhmorris Yes. Let's actually "undo" the offset here after fitting the model, before saving the results. (The output of generate is already on the natural scale). If we do that, I don't particularly care what the offset is.

Somewhere in here:

forecast_samples <- fit |>
generate(h = forecast_horizon, times = n_samples) |>
as_tibble() |>
mutate("{output_col}" := .sim, .draw = as.integer(.rep)) |> # nolint
select(date, .draw, !!output_sym)

@dylanhmorris
Copy link
Contributor

@SamuelBrand1 can you implement?

@SamuelBrand1
Copy link
Collaborator Author

I would've thought offset of 1 for raw counts; we're using 1 / max_visits as the offset for proportions

The same underlying idea applies I think: in log domain the maximum value is log(max_visits) and the minimum is -log(max_visits) if we have a zero in the natural domain. I assumed this was the idea, its symmetrises the log-domain data in the presence of zeros which seems a good idea irrespective of whether its counts or proportions?

@SamuelBrand1
Copy link
Collaborator Author

@dylanhmorris Agree that it's not consistent with what we do in the scoring, but an offset of 1 could be relatively big since the case counts can be so low.

Just subtract 1 when you convert the forecast back to natural scale? (i.e. we explicitly forecast the quantity log(1+x) and then compute x)

fable does the backtransform automatically in the case of log(x + offset) transforms https://fable.tidyverts.org/articles/transformations.html . Although, for more complex transforms you have to wrap the transform with its backtransform in fabletools::new_transformation.

@SamuelBrand1
Copy link
Collaborator Author

@SamuelBrand1 can you implement?

As above, I think the back-transform is done automatically... Do you want this for the pyrenew output?

Copy link
Collaborator

@damonbayer damonbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SamuelBrand1 fable is very impressive! Thanks!

@SamuelBrand1 SamuelBrand1 merged commit 47be1d6 into main Nov 18, 2024
3 checks passed
@SamuelBrand1 SamuelBrand1 deleted the fix-zero-issue branch November 18, 2024 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate NA forecasts
3 participants