Rethinking our runtime error design #8008
frankmcsherry
started this conversation in
Technical musings
Replies: 1 comment
-
Another thing to determine and commit to is whether errors reflect "deterministic errors associated with the input data and computation" or are extended to include "transient and non-deterministic errors that reflect the operating environment". I personally prefer the former, in that we want to distinguish between Materialize's transient difficulty in producing the right answer, and the right answer itself. But I'm happy to hear from folks who think we might want the latter. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Materialize's current run-time error design (e.g. "you divided by zero") forms parallel streams of errors that flow along error-free results. This has good properties, but also some limiting properties: it is difficult to recover from errors, as their context is lost.
It seems reasonable to reconsider our error design, to try and think through whether there are other patterns that might be more expressive, potentially less complicated, and ideally less ambiguous.
First, let's talk through some desiderata. I don't know that we have any hard constraints, other than not crashing things and correctness.
For example, I added that last bit "that rely on the value of the expression" because our current error strategy is more casual than this. If errors are produced but then discarded, for some reason, we will still produce an error. Ideally we would realize that we didn't actually depend on the value through some analysis, but perhaps that was hard for some run-time decision reason.
We also have some other free-form errors like
AvroParseError
orSubqueryGeneratedTooManyResults
that may not obviously correspond to expressions.Here are some design questions that don't have clear solutions
Currently they replace entire rows, which is part of what makes recovering from them hard. At the same time, it is very easy to see if a result is an error or a valid row (as it is in the type, rather than in the data). Expression operators like
IFERROR
are left hanging because we cannot "undo" errors that affected only one expression.Separating errors out makes operations like
join
much easier as we can join only the valid data. On the other hand, it makes operations likereduce
much harder, as we cannot produce two arrangements as output (this blocks pushing potentially erroring computation likeHAVING 1/AVG(x) > 3
into the operator). The answer to this question might be different for arrangements where errors occur in the keys, where we perhaps always want to partition the results away.join
,reduce
, andtopk
?Unless the error is in the key, or the aggregate expressions for the
reduce
, we can still operate on the row that contains an error, and make "discovering" the error someone else's job later on. In essence, we delay the "evaluation" of the error.So clearly an alternate proposal is "extend
Datum
to contain anError
variant" and then update our logic in most places to deal with that variant, most often propagating the error, and in some cases addressing the fact thatError
is meant to be a special value that 1. should not just join with other errors, and 2. somehow taints results from aggregations somehow, 3. other things we don't realize yet. Tbh, independent of whether we like this or not, I'd really like to go through the process of determining the intended results of operators on inputs that contain errors.Beta Was this translation helpful? Give feedback.
All reactions