Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is it possible to predict recall in pure sql? #67

Open
LazerJesus opened this issue Apr 30, 2024 · 3 comments
Open

is it possible to predict recall in pure sql? #67

LazerJesus opened this issue Apr 30, 2024 · 3 comments

Comments

@LazerJesus
Copy link

the title says it all.
is it possible to predict recall in pure sql?

my usecase is that i'd like to maintain a column recallProbabilityNow and update that every few minutes.
of course, i can just load all the rows and do the compute in python/js, but it would be cool (and fast, i guess) to run this inside the db.

@fasiha
Copy link
Owner

fasiha commented Apr 30, 2024

@LazerJesus which version are we talking about? (I think the answer is going to be "not really" but depending on which version you have different options 😅)

@LazerJesus
Copy link
Author

i am on v2

@fasiha
Copy link
Owner

fasiha commented Aug 28, 2024

First, sorry for the delay! Life got busy (but happy busy 😁).

Second, to do this in SQL we have to use an approximation. We pretend that $E[p^\delta] \approx (E[p])^\delta$.

In words, we know that the expected value of a random variable $p$ (probability of recall) raised to a power is not equal to the expected value itself raised to the power. (There's a brief discussion of this at https://fasiha.github.io/ebisu/#why-we-work-with-random-variables.)

However, in practice, maybe the difference isn't that bad. Here's how to compute this approximation in both Python and SQL: I create a simple SQLite db that contains a timestamp (seconds in the Unix epoch) and the Ebisu model as JSON and then, given the current time, calculate the approximate recall probability entirely in SQLite:

import sqlite3
import time
import json
from math import exp
import ebisu  # type: ignore

HOURS_PER_SECOND = 1 / 3600
connection = sqlite3.connect(":memory:")
cursor = connection.cursor()

cursor.execute("""
CREATE TABLE IF NOT EXISTS data (
    quiz TEXT,
    timestamp FLOAT, -- seconds in Unix epoch
    ebisu TEXT
)
""")

quizName = 'foo'
model = ebisu.defaultModel(24, 2.5)  # 24 hours halflife, α = β = 2.5
t = time.time()  # seconds in Unix epoch

cursor.execute("INSERT INTO data (quiz, timestamp, ebisu) VALUES (?, ?, ?)",
               (quizName, t, json.dumps(model)))

elapsedHours = 10
newT = t + elapsedHours / HOURS_PER_SECOND  # some time in the future, Unix seconds

exact = ebisu.predictRecall(model, elapsedHours,
                            exact=True)  # full Bayesian expectation
approx = exp(-((newT - t) * HOURS_PER_SECOND) / model[-1])

# replace the first ? with `strftime('%s', 'now')` if you want SQL to compute the current
# time (seconds in Unix epoch) instead of passing it in
query = """
SELECT exp(-((? - timestamp) / (60 * 60)) / json_extract(ebisu, '$[2]')) AS approx
FROM data
WHERE quiz=?
"""
cursor.execute(query, (newT, quizName))
fromSql = cursor.fetchone()[0]

relativeError = abs(fromSql - approx) / approx
print(f'{approx=:0.6f}, {fromSql=:0.6f} ({relativeError=}; {exact=:0.2f})')

Output:

approx=0.659241, fromSql=0.659241 (relativeError=0.0; exact=0.73)

There's no error between Python and SQL calculating the exp, but you see there's quite a bit of difference between the approximation, 66%, and the exact prediction 73%. Probably in practice this difference isn't that big a deal, since Ebisu v2 is bad at the exact probability (though hopefully the relative ranking is acceptable).

Does this help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants