-
-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider interpolation HDI calculations #2168
Comments
If I understood correctly the computation of the quantile function by xarray involves an interpolation before returning the values. Instead the az.hdi function doesn't. It just returns the raw values, which generally looks very noisy when plotted. In arviz we solve thi problem using the az.plot_hdi, which has a smooth argument, being true by default. We can use az.plot_hdi inside bambi, but that will requiere to change the logic of the plot_cap function. One alternative will be to implement a hdi function that behaves similar to that from xarray, that is to return interpolated values. |
I think trying to do something similar to xarray is a good idea. I remember one example where I wanted to plot the mean and the HDI, using EDIT I found the example I had in mind Left: Using The code is here https://github.com/bambinos/unfinished-ideas/blob/main/splines.ipynb |
Adding some of these interpolation methods to hdi sounds like a good idea. As far as I understand, it is quite different from the current smoothing in plot_hdi, at least conceptually. The interpolation in numpy quantile works along the dimension that is reduced, so that if Note, I think using quantile for hdi would be very misleading, so I propose changing the name of the issue to "Consider interpolation HDI calculations" so that we add something similar to this |
@OriolAbril I think I understand why using quantiles would be misleading for HDIs, [P2.5%, P97.5%] doesn't necessarily give a 95% HDI, but I don't understand the name suggestion. |
I don't see the current issue title (consider using np.quantile for hdi) as viable, and instead see two underlying issues that need addressing. The first is being able to use any function to generate intervals or bands. This is a know issue and we are working on it, but imo it requires refactoring the plots module. I have started some experiments at https://xrtist.readthedocs.io/en/latest/ for example. The second is stabilising our current hdi approach in a manner similar to what np.quantile does. I hadn't really realized the instability that comes with returning existing samples can actually be fixed with these interpolation methods. I don't see much value in focusing this issue on the first thing, but I think it would be very helpful to focus on this second one |
np.quantile
for HDI calculations
@tomicapretto #2207 (comment) might help clarify a bit the different interpolation options |
Should the quantiles be used directly though or have something like:
|
I wouldn't recommend doing this. With MCMC you are getting samples from the whole posterior distribution (or posterior predictive too with the extra forward sampling step). Both quantile computation in numpy and hdi computation in arviz are non-parametric because we never now which is the actual distribution of these posterior samples. Computing the percentiles from the student-t distribution assumes the posterior follows a student-t distribution which will not be the case in the vast majority of cases (even if the likelihood were a student-t distribution). Therefore, I wouldn't recommend that approach, especially not combined with MCMC, it could be useful if working with optimizer (so you only get the MAP estimates for the student-t parameters, not draws from the whole distribution) or if working with conjugate distributions). |
Cool, all reasonable points, thank you. (I am not an experienced Bayesian Statistics user.) So where do you want changes to be done? Should I fork |
We first need to identify which algorithm to implement for the interpolation. HDI and quantiles (with which we get the ETI, equal tail intervals) are very different concepts, so we can't just apply the same interpolation that is done in the quantile computation. I haven't yet been able to search anything on this yet, either searching for papers on this or checking implementations of HDI in other libraries (for example in R or Julia) would be a good starting point. Is that something you'd be interested in? Once we know what changes to make then yeah, fork and make the changes, then submit a PR for us to review. That would be very appreciated |
Hmm...easystats/bayestestR#39 seems to touch upon this exactly but as |
Thanks, that is great! Printing a warning every time the function is executed seems a bit excessive, especially given it is quite arbitrary and only valid for 95% CI (if at all). We'd also need to compute ESS within HDI in order to get the number of independent draws. I think as a start the best would be to document all this properly in the notes section of the HDI docstring (which currently has no notes on the algorithm used at all). There we can document the algorithm used, cite Khruske's book and add some advise on samples above. Regarding # of samples advise, in our case it would be best to advise using the MCSE of HDI to decide if there are enough samples. I'll need to finish #1974 so it can be used, but I think it is what makes more sense conceptually. Do you want to send a PR to update the docs on this? We have some guidance at https://python.arviz.org/en/stable/contributing/index.html, and don't hesitate to ask any questions you may have during the process. |
Tell us about it
In Bambi we have a function called
plot_cap
that is used to obtain visualizations of the fitted curve. We overlay a credible interval so users can visualize the uncertainty around the mean estimate. Internally, we're usingaz.hdi()
to obtain the bounds. Today, I was implementing some improvements and found the plots look quite noisy. See the following examplesThe following is Bambi specific code, it's not that important for what I want to show
Get the bands using
az.hdi()
Get the bands using
.quantile()
inDataArray
, which callsnp.quantile
under the hood (if I understood correctly)Thoughts on implementation
I'm not aware of the historical details that led to the current implementation of
az.hdi()
. But I think it's worth considering other alternatives since the current behavior returns very noisy results. I have other examples where it looks even worse, for example hereTagging @aloctavodia because we talked about this via chat
The text was updated successfully, but these errors were encountered: