Add CLI support for unconditional sampling and assessing data likelihood of any dataset #178

Schaechtle · 2024-08-21T16:44:27Z

What does this issue request?

Add CLI support for two modes of evaluating posterior models, to make it easy to do either of two things:

1. Unconditional sampling of synthetic data

The goal here is to generate posterior samples to create datasets that can be compared to the training data/observations and held-out data.

2. Compute held-out likelihood scores

The goal here is to be able to compute held-out likelihood scores for training and held-out data.

Why do we want this?

This allows us to compare special cases of GenDB against relevant baselines, like previous CrossCat implementations. Note that some relevant baselines may generate synthetic data but not allow assessing held-out likelihood. The effort should be particularly relevant for evaluations that would be needed for a future publication -- but it's also not urgent right now.

ThomasColthurst · 2024-08-22T16:13:34Z

For the held-out likelihoods, do you need them per held-out item or only per test set?

(Or to ask another way: if there are 1000 items in the held out test set, do you want a single log likelihood for all of them, or do you want 1000 individual log likelihoods?)

Schaechtle · 2024-09-04T13:32:22Z

For the held-out likelihoods, do you need them per held-out item or only per test set?

(Or to ask another way: if there are 1000 items in the held out test set, do you want a single log likelihood for all of them, or do you want 1000 individual log likelihoods?)

I would return a table or list of the latter because we can aggregate easily -- but we also may want to dig into which rows generate which log-likelihood values in case we encounter bugs (e.g. NaNs for certain held-out rows).

ThomasColthurst self-assigned this Aug 22, 2024

ThomasColthurst mentioned this issue Aug 22, 2024

Add --heldout option to hirm to calculate the log likelihood of held out data #181

Merged

ThomasColthurst mentioned this issue Sep 3, 2024

Add --holdout to pclean #185

Merged

This was referenced Sep 5, 2024

Add --samples to hirm binary #187

Merged

Add --sample to pclean #198

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CLI support for unconditional sampling and assessing data likelihood of any dataset #178

Add CLI support for unconditional sampling and assessing data likelihood of any dataset #178

Schaechtle commented Aug 21, 2024

ThomasColthurst commented Aug 22, 2024

Schaechtle commented Sep 4, 2024

Add CLI support for unconditional sampling and assessing data likelihood of any dataset #178

Add CLI support for unconditional sampling and assessing data likelihood of any dataset #178

Comments

Schaechtle commented Aug 21, 2024

What does this issue request?

1. Unconditional sampling of synthetic data

2. Compute held-out likelihood scores

Why do we want this?

ThomasColthurst commented Aug 22, 2024

Schaechtle commented Sep 4, 2024