Closed lower bound questions are misleading in UI vs API, and possibly unnecessary?? #1685

jkraybill · 2024-12-10T10:05:58Z

jkraybill
Dec 10, 2024

Current behavior

Consider post 28845, "How many Metaculus users that ranked in the top 16 in the Q3 2024 Quarterly Cup will remain in the top 16 at the end of Q4 2024 Quarterly Cup?". The answer UI makes it look like you can enter guesses from 1 to 16. So does the API, where scaling.range_min=1 and scaling.range_max=16.

I'm not certain what happens in the database when you try to place a significant weight on the "1" in the above, but in the API, it's not allowed; cdf[0] must equal 0. The UI implies that you are able to weight that 0-index value (or close to it) -- as you can see above, many users have non-zero values for what I assume must be cdf[1].

In my uninformed opinion, cdf[0] should be an allowably non-zero value for both open and closed lower bound questions, and I further contest that there is no difference between the two in the Metaculus paradigm. cdf[0] should never be required to be zero, and should be always treated as the P(x<=min). cdf[max] being required to be 1 makes sense, but the constraints around cdf[0] to me make no sense. In the above example, does anyone get points if the answer resolves to 1? Clearly a number of people think so, but the API implies no.

Expected/desired behavior

Someone needs to rework the numeric question framework, it has many problems that are especially palpable for quantizable (N(A) < 200) questions.

Answered by SylvainChevalier

Dec 12, 2024

Yes, that sounds correct.

View full answer

SylvainChevalier · 2024-12-10T14:39:35Z

SylvainChevalier
Dec 10, 2024
Maintainer

@jkraybill you're confusing pdfs and cdfs. Your screenshot shows the pdf, which indeed can be >0 at the lower bound.

Without using Dirac deltas (which we don't support) it's impossible for the cdf at a closed lower bound to be anything but 0. This is actually symmetrical with the cdf being 1 at closed upper bounds.

Here is a screenshot of the same question, using the cdf view:

As you can see the cdf starts at 0 and ends at 1.

To be extra clear: currently the first two cdf values for the Community Prediction are 0 and 0.0008179. This means that its pdf at the lower bound is (0.0008179 - 0) * 200 = 0.1636, which is indeed what the pdf graph on the question page reports, and what the Community Prediction will be scored on.

0 replies

jkraybill · 2024-12-10T22:31:47Z

jkraybill
Dec 10, 2024
Author

@SylvainChevalier thanks so much for this explanation, it helped but I am still unclear on a couple things if you don't mind a couple more questions?

When I submit a CDF, is cdf[0] the P(x <= min) or P(x < min)? In the above example, you said that if x=1, you would get scored on cdf[1] - cdf[0], which to me implies that cdf[0] is the P(x < min). But if that is the case, then wouldn't cdf[201] be P(x < max), making 16 an unscoreable outcome in the above example?

I feel like I'm missing something pretty fundamental here, if this is well-documented somewhere please let me know. Basically I'm unclear how both 1 and 16 can be valid, scoreable outcomes in the example above, and more generally in discrete-answer questions with double-closed bounds how both P(min) and P(max) can be estimable. Thanks for any assistance you can offer in helping me understand this. Happy to take it to Discord if that's more appropriate.

0 replies

jkraybill · 2024-12-11T08:48:37Z

jkraybill
Dec 11, 2024
Author

@SylvainChevalier OK after reading the code and docs more, I think I understand:

for all CDFs (closed or not), cdf[i] represents P(X < x) where x is (min + ((max - min) / 200 ) * i). The only exception is cdf[200], which represents P(X <= max)? If you confirm, sorry I wasted your time with the earlier question!

0 replies

SylvainChevalier · 2024-12-11T14:52:56Z

SylvainChevalier
Dec 11, 2024
Maintainer

I don't mind questions! I agree this is somewhat confusing, because we're using a continuous question to represent a discrete event.

If X was truly a continuous variable that cannot be <1, then we would have P(X<1) = P(X<=1) = P(X=1) = 0. This is why we use probability density functions, and pdf(x) != P(X=x).

But in our case, it is actually very possible for the outcome to turn out to be 1 (or 16)! There are two ways to think about it:

You can think the question is correct, but your prediction should be a bunch of Dirac deltas on the possible values. This would let you set cdf[0] > 0. We don't allow that because it doesn't work with our scores.
You can think the question is wrong, and should be a new "Discrete" question type, that lets you define Probability Mass Functions and Cumulative Distributtion Functions over a discrete list of ordered values. This new question type has been suggested here.

In the meantime, we can mostly ignore the issue in practice, since what matters for scoring is the value of the pdf you put on the resolution value. And that can be >0, even if your cdf there is indeed 0. The main downside is confusion as soon as one thinks to hard about it. Sorry about that!

0 replies

jkraybill · 2024-12-11T21:49:04Z

jkraybill
Dec 11, 2024
Author

Awesome, thanks. So to help me clarify my thinking, I've got another question. Imagine we have a discrete question that has only three possible outcomes [0,1,2], each of which have equal probability, but it's been launched as a continuous question with closed bounds.

If I want to maximise my score (and therefore pdf) at each of those resolution values, I believe I would set cdf[1], cdf[101], and cdf[200] to 0.3333, 0.6666, and 1 respectively. cdf[0] would have to be 0, and every other cdf value would be set to the minimum allowable value of cdf[x] = cdf[x-1] + (0.01 / 200).

Is that the correct way to maximise my score for this question via the submitted cdf, or am I still missing something??

0 replies

SylvainChevalier · 2024-12-12T13:21:24Z

SylvainChevalier
Dec 12, 2024
Maintainer

Yes, that sounds correct.

1 reply

jkraybill Dec 12, 2024
Author

Thank you for taking the time to answer my questions, it was really helpful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closed lower bound questions are misleading in UI vs API, and possibly unnecessary?? #1685

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Closed lower bound questions are misleading in UI vs API, and possibly unnecessary?? #1685

jkraybill Dec 10, 2024

Current behavior

Expected/desired behavior

Replies: 6 comments · 1 reply

SylvainChevalier Dec 10, 2024 Maintainer

jkraybill Dec 10, 2024 Author

jkraybill Dec 11, 2024 Author

SylvainChevalier Dec 11, 2024 Maintainer

jkraybill Dec 11, 2024 Author

SylvainChevalier Dec 12, 2024 Maintainer

jkraybill Dec 12, 2024 Author

jkraybill
Dec 10, 2024

Replies: 6 comments 1 reply

SylvainChevalier
Dec 10, 2024
Maintainer

jkraybill
Dec 10, 2024
Author

jkraybill
Dec 11, 2024
Author

SylvainChevalier
Dec 11, 2024
Maintainer

jkraybill
Dec 11, 2024
Author

SylvainChevalier
Dec 12, 2024
Maintainer

jkraybill Dec 12, 2024
Author