-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create payload->blob type system #1120
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
This reverts commit 422c2a8. Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass. Still have more to review but tired.
// This is the blob length IN SYMBOLS, not in bytes | ||
blobLength uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider renaming to blobLengthSymbols
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although is symbol standard here? I feel like our whitepaper is the only place I've ever seen it. Should we just use FE for field element everywhere? BlobLengthFE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally used blobLength
to try to keep consistency with the struct in the commitment, but I agree, it's not ideal that I have to put in comments everywhere that it's in symbols
I don't have a preference between BlobLengthFE
or BlobLengthSymbols
. Usage of Symbols
is pretty pervasive in our codebase, but also I don't think anyone would be confused by the FE
terminology. Do you have a preference?
coeffPolynomial *coeffPoly | ||
// blobLength must be a power of 2, and should match the blobLength claimed in the BlobCommitment | ||
// This is the blob length IN SYMBOLS, not in bytes | ||
blobLength uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unclear how coeffPolynomial and blobLength are related. Does len(coeffPolynomial) == blobLength
? Guessing we allow last zeros to be truncated. We should explain that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also can't we just enforce that blobLength = nextPowerOf2(len(coeffPolynomial)) and then we don't need blobLength here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a corner case here:
- Imagine a user disperses a very small blob, only 64 bytes, and the last 40 bytes are trailing 0s
- When a different user fetches the blob from a relay, we've established that the relay could remove trailing 0s
- If we were to say that
blobLength = nextPowerOf2(len(coeffPolynomial))
, then the user fetching and reconstructing this blob would determine that the blob length is 1 symbol, when it's actually 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a description of this corner case as a comment, if you confirm my logic here
func BlobFromBytes(bytes []byte, blobLength uint32) (*Blob, error) { | ||
poly, err := coeffPolyFromBytes(bytes) | ||
if err != nil { | ||
return nil, fmt.Errorf("polynomial from bytes: %w", err) | ||
} | ||
|
||
return BlobFromPolynomial(poly, blobLength) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above, do we not want to enforce some relationship between len(bytes) and blobLength?
func (b *Blob) GetBytes() []byte { | ||
return b.coeffPolynomial.getBytes() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have a preference for Bytes()
without the get
prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or toBytes()
to be consistent with ToPayload
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally tend to understand a nuanced difference between To
and Get
to
(or in other contexts,compute
) seems to indicate that the data is changing form, or being processed in some wayget
implies that the method is simply fetching something that already exists, byte for byte
So, I chose ToPayload
because the method is transforming the data, to produce a payload. And it's GetBytes
, because we are returning the literal bytes, as they already exist.
But maybe I imagine this nuance, in which case I'm ok changing the current names. What would your preferred naming be for these two methods? Bytes()
and ToPayload()
?
// The payloadStartingForm indicates how payloads are constructed by the dispersing client. Based on the starting form | ||
// of the payload, we can determine what operations must be done to the blob in order to reconstruct the original payload | ||
func (b *Blob) ToPayload(payloadStartingForm PolynomialForm) (*Payload, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer if get rid of the starting
terminology. It introduces time which I personally find confusing. For me its really just "how does a rollup interpret its payloads as polynomial, ALWAYS", aka its more of an invariant or a way of thinking, than a "starting point".
Wdyt of just payloadForm
or payloadPolyForm
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think payloadForm
is good. I will change it
@@ -22,7 +22,7 @@ func (v DefaultBlobCodec) EncodeBlob(rawData []byte) ([]byte, error) { | |||
codecBlobHeader := make([]byte, 32) | |||
// first byte is always 0 to ensure the codecBlobHeader is a valid bn254 element | |||
// encode version byte | |||
codecBlobHeader[1] = byte(DefaultBlobEncoding) | |||
codecBlobHeader[1] = byte(PayloadEncodingVersion0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this function EncodedBlob
need to be changed? it returns []byte, shouldnt it return a Blob in your type system? Or whatever else this is (prob just an encodedPayload?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_blob_codec
will not be used in v2, all encoding is contained in the new structs
case PayloadEncodingVersion0: | ||
return DefaultBlobCodec{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks a bit weird now. Why is payload version 0 mapping to a defaultBlobCodec? Should we use BlobCodecVersion0 instead? Or do we want separate versioning for the payloadEncoding and BlobCodec? Or are these things the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the constant to what I felt made sense for the new code, and but the name doesn't work so well in the old blob_codec
code. If I don't rename the constant, then the name doesn't work so well in the new code
Or are these things the same?
Yes, they are essentially the same thing.
func testBlobConversionForForm(t *testing.T, payloadBytes []byte, form PolynomialForm) { | ||
payload := NewPayload(payloadBytes) | ||
|
||
blob, err := payload.ToBlob(form) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The natural interpretation for this reads wrong :(
It reads "payload is getting to blob in this form" instead of "payload, interpreted in this form, gets converted to a blob".
Maybe there's a different way to structure the method such that the interpretation makes more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like perhaps defining on Blob a method .fromPayloadInForm()
? Then we could use like
new(Blob).fromPayloadInForm(payload, form)
. It's def more ugly but at least it reads more easily I feel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of this idea: when constructing a payload, declare in the constructor what form it is. The sequence would then read:
payload := NewPayload(payloadBytes, form)
blob, err := payload.ToBlob()
I personally really dislike like the pattern of creating a new empty object, just to call a method that creates another instance of the object
api/clients/codecs/blob_test.go
Outdated
for i := 0; i < iterations; i++ { | ||
originalData := testRandom.Bytes(testRandom.Intn(1024) + 1) | ||
testBlobConversionForForm(t, originalData, PolynomialFormEval) | ||
testBlobConversionForForm(t, originalData, PolynomialFormCoeff) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fuzzing would be better here as it does a smarter search than this random iid generation of test examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also we should be testing the edge cases like empty bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on board with fuzz testing, but when do we actually run our fuzz tests? Do they run in CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a stab at fuzz testing in c70fad8d LMK what you think
// blobLength is the length of the blob IN SYMBOLS, as claimed by the blob commitment. This is needed to make sure | ||
// that the claimed length in the encoded payload header is valid relative to the total blob length | ||
func encodedPayloadFromElements(fieldElements []fr.Element, blobLength uint32) (*encodedPayload, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment makes me wonder whether we shouldn't instead define this method on the blob itself? Why do we need this method here? Seems like its at the wrong place given that it requires a blobLength?
type coeffPoly struct { | ||
fieldElements []fr.Element | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicates an existing type (currently in encoding/rs/frame_coeffs.go
in master, named something different if you aren't up to date with the latest
master`).
type FrameCoeffs []fr.Element
Also, there is a bunch of serialization logic in that file as well for going back and forth between serialized and deserialized form.
Is the plan to circle back and replace these old types in a follow up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see []fr.Element
used directly as a type in many places in this PR. Is there a reason why you don't use the fancy type you define here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you put this into a struct instead of doing type coefPoly []fr.Element
?
func (p *coeffPoly) getBytes() []byte { | ||
return rs.FieldElementsToBytes(p.fieldElements) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My personal preference for methods like this where we convert to/from []byte
is to use a serialize
/deserialize
naming schema. Open to discussion if you don't agree.
type evalPoly struct { | ||
fieldElements []fr.Element | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Particular reason why you use a struct instead of type fieldElements []fr.Element
?
powers := make([]T, powersToGenerate) | ||
for i := T(0); i < powersToGenerate; i++ { | ||
powers[i] = T(math.Pow(2, float64(i))) | ||
} | ||
|
||
return powers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can get more efficiently get powers of two by doing the following:
powers := make([]T, powersToGenerate) | |
for i := T(0); i < powersToGenerate; i++ { | |
powers[i] = T(math.Pow(2, float64(i))) | |
} | |
return powers | |
powers := make([]T, powersToGenerate) | |
for i := T(0); i < powersToGenerate; i++ { | |
powers[i] = 1 << i | |
} | |
return powers |
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Why are these changes needed?
Checks