On Conformalizing Bayes #10
Unanswered
PaulScemama
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Anastasios, 2 days in a row ha.
I'm interested in the effect of having an underlying Bayesian model when considering conformal prediction.
Part I
In your "Gentle Introduction" Section 2.4 illustrates what a Bayesian would do without conformal prediction in the continuous case. That is, they would create sets following
where$\hat{f}(y|x)$ is the posterior predictive distribution.
Question 1: In the discrete setting, would this amount to
?
Edit: I now believe that, for the discrete setting, this would amount to taking the negative of the probability mass assigned to the correct class of the posterior predictive density:$-\hat{f}(y|x)$ .
Part II
To "conformalize" the procedure, you outline a conformal score,
which is just the negative of the probability density (mass) assigned to the correct label$y$ .
I just want to make sure my understanding is correct:
Question 2: can we use any conformal score involving the posterior predictive distribution (denoted as$\hat{f}(y|x)$ ) or is it only $-\hat{f}(y|x)$ that leads to optimality.
Part III
At the very end of section 2.4 you mention that
(a) when certain technical assumptions are satisfied, the conformal sets used on top of a Bayesian model have the best Bayes risk among all prediction sets with$1-\alpha$ coverage. Question/Clarification: does this mean the conformal sets will have the best Bayes risk among all prediction sets with $1-\alpha$ , holding a particular conformal score constant?
Edit: After rereading, I think it is optimal over all other conformal scores.
You then mention
(b) To be more precise, under the assumptions in [11],$C(X_{\text{test}})$ has the smallest average size of any conformal procedure with $1-\alpha$ coverage, where the average is taken over the data and the parameters. Question: does this mean, assuming the assumptions outlined in [11]...if we have a Bayesian model and a Non-Bayesian model and we create prediction sets with conformal prediction on top of both (respectively), the sets with the underlying Bayesian model will be smaller than the sets with the underlying Deterministic model?
Edit: After rereading, I think the sets with the underlying Bayesian model will be smaller (on average) only if the Bayesian model approximates the true posterior predictive density well.
Part IV
Bonus: if you aren't familiar with the assumptions in [11] that is totally fine and ignore this. But if you are familiar! would you give me a brief outline of what they would be?
Thanks a ton; this is really interesting stuff!
Beta Was this translation helpful? Give feedback.
All reactions