On Conformalizing Bayes #10

PaulScemama · 2023-06-08T14:12:48Z

PaulScemama
Jun 8, 2023

Hi Anastasios, 2 days in a row ha.

I'm interested in the effect of having an underlying Bayesian model when considering conformal prediction.

Part I

In your "Gentle Introduction" Section 2.4 illustrates what a Bayesian would do without conformal prediction in the continuous case. That is, they would create sets following

$$ S(x) = \{ y : \hat{f}(y|x) > t \}, \text{ where }t\text{ is chosen so } \int_{y \in S(x)} \hat{f}(y|x) dy = 1-a. $$

where $\hat{f}(y|x)$ is the posterior predictive distribution.

Question 1: In the discrete setting, would this amount to

sorting the probabilities from greatest to smallest
starting from the class with the greatest probability mass assigned to it, and then continue including classes until you've reach $1-\alpha$ cumulative probability mass (much like the APS score).

?

Edit: I now believe that, for the discrete setting, this would amount to taking the negative of the probability mass assigned to the correct class of the posterior predictive density: $-\hat{f}(y|x)$.

Part II

To "conformalize" the procedure, you outline a conformal score,

$$ s(x,y) = - \hat{f}(y|x) $$

which is just the negative of the probability density (mass) assigned to the correct label $y$.

I just want to make sure my understanding is correct:

Question 2: can we use any conformal score involving the posterior predictive distribution (denoted as $\hat{f}(y|x)$ ) or is it only $-\hat{f}(y|x)$ that leads to optimality.

Part III

At the very end of section 2.4 you mention that

(a) when certain technical assumptions are satisfied, the conformal sets used on top of a Bayesian model have the best Bayes risk among all prediction sets with $1-\alpha$ coverage. Question/Clarification: does this mean the conformal sets will have the best Bayes risk among all prediction sets with $1-\alpha$, holding a particular conformal score constant?

Edit: After rereading, I think it is optimal over all other conformal scores.

You then mention

(b) To be more precise, under the assumptions in [11], $C(X_{\text{test}})$ has the smallest average size of any conformal procedure with $1-\alpha$ coverage, where the average is taken over the data and the parameters. Question: does this mean, assuming the assumptions outlined in [11]...if we have a Bayesian model and a Non-Bayesian model and we create prediction sets with conformal prediction on top of both (respectively), the sets with the underlying Bayesian model will be smaller than the sets with the underlying Deterministic model?

Edit: After rereading, I think the sets with the underlying Bayesian model will be smaller (on average) only if the Bayesian model approximates the true posterior predictive density well.

Part IV

Bonus: if you aren't familiar with the assumptions in [11] that is totally fine and ignore this. But if you are familiar! would you give me a brief outline of what they would be?

Thanks a ton; this is really interesting stuff!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On Conformalizing Bayes #10

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

On Conformalizing Bayes #10

PaulScemama Jun 8, 2023

Part I

Part II

Part III

Part IV

Replies: 0 comments

PaulScemama
Jun 8, 2023