Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create payload->blob type system #1120

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open

Conversation

litt3
Copy link
Contributor

@litt3 litt3 commented Jan 16, 2025

Why are these changes needed?

Checks

  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • I've checked the new test coverage and the coverage percentage didn't drop.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

@litt3 litt3 requested review from samlaf and bxue-l2 January 16, 2025 20:44
@litt3 litt3 self-assigned this Jan 16, 2025
@litt3 litt3 requested a review from anupsv February 12, 2025 16:46
@litt3 litt3 requested a review from cody-littley February 12, 2025 18:35
@litt3 litt3 marked this pull request as ready for review February 12, 2025 18:35
Copy link
Contributor

@samlaf samlaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass. Still have more to review but tired.

Comment on lines +13 to +14
// This is the blob length IN SYMBOLS, not in bytes
blobLength uint32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider renaming to blobLengthSymbols

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although is symbol standard here? I feel like our whitepaper is the only place I've ever seen it. Should we just use FE for field element everywhere? BlobLengthFE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally used blobLength to try to keep consistency with the struct in the commitment, but I agree, it's not ideal that I have to put in comments everywhere that it's in symbols

I don't have a preference between BlobLengthFE or BlobLengthSymbols. Usage of Symbols is pretty pervasive in our codebase, but also I don't think anyone would be confused by the FE terminology. Do you have a preference?

Comment on lines +11 to +14
coeffPolynomial *coeffPoly
// blobLength must be a power of 2, and should match the blobLength claimed in the BlobCommitment
// This is the blob length IN SYMBOLS, not in bytes
blobLength uint32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclear how coeffPolynomial and blobLength are related. Does len(coeffPolynomial) == blobLength? Guessing we allow last zeros to be truncated. We should explain that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also can't we just enforce that blobLength = nextPowerOf2(len(coeffPolynomial)) and then we don't need blobLength here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a corner case here:

  • Imagine a user disperses a very small blob, only 64 bytes, and the last 40 bytes are trailing 0s
  • When a different user fetches the blob from a relay, we've established that the relay could remove trailing 0s
  • If we were to say that blobLength = nextPowerOf2(len(coeffPolynomial)), then the user fetching and reconstructing this blob would determine that the blob length is 1 symbol, when it's actually 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a description of this corner case as a comment, if you confirm my logic here

Comment on lines +20 to +27
func BlobFromBytes(bytes []byte, blobLength uint32) (*Blob, error) {
poly, err := coeffPolyFromBytes(bytes)
if err != nil {
return nil, fmt.Errorf("polynomial from bytes: %w", err)
}

return BlobFromPolynomial(poly, blobLength)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above, do we not want to enforce some relationship between len(bytes) and blobLength?

Comment on lines +40 to +42
func (b *Blob) GetBytes() []byte {
return b.coeffPolynomial.getBytes()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a preference for Bytes() without the get prefix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or toBytes() to be consistent with ToPayload?

Copy link
Contributor Author

@litt3 litt3 Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally tend to understand a nuanced difference between To and Get

  • to (or in other contexts, compute) seems to indicate that the data is changing form, or being processed in some way
  • get implies that the method is simply fetching something that already exists, byte for byte

So, I chose ToPayload because the method is transforming the data, to produce a payload. And it's GetBytes, because we are returning the literal bytes, as they already exist.

But maybe I imagine this nuance, in which case I'm ok changing the current names. What would your preferred naming be for these two methods? Bytes() and ToPayload()?

Comment on lines +46 to +48
// The payloadStartingForm indicates how payloads are constructed by the dispersing client. Based on the starting form
// of the payload, we can determine what operations must be done to the blob in order to reconstruct the original payload
func (b *Blob) ToPayload(payloadStartingForm PolynomialForm) (*Payload, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer if get rid of the starting terminology. It introduces time which I personally find confusing. For me its really just "how does a rollup interpret its payloads as polynomial, ALWAYS", aka its more of an invariant or a way of thinking, than a "starting point".

Wdyt of just payloadForm or payloadPolyForm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think payloadForm is good. I will change it

@@ -22,7 +22,7 @@ func (v DefaultBlobCodec) EncodeBlob(rawData []byte) ([]byte, error) {
codecBlobHeader := make([]byte, 32)
// first byte is always 0 to ensure the codecBlobHeader is a valid bn254 element
// encode version byte
codecBlobHeader[1] = byte(DefaultBlobEncoding)
codecBlobHeader[1] = byte(PayloadEncodingVersion0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this function EncodedBlob need to be changed? it returns []byte, shouldnt it return a Blob in your type system? Or whatever else this is (prob just an encodedPayload?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_blob_codec will not be used in v2, all encoding is contained in the new structs

Comment on lines +22 to 23
case PayloadEncodingVersion0:
return DefaultBlobCodec{}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks a bit weird now. Why is payload version 0 mapping to a defaultBlobCodec? Should we use BlobCodecVersion0 instead? Or do we want separate versioning for the payloadEncoding and BlobCodec? Or are these things the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the constant to what I felt made sense for the new code, and but the name doesn't work so well in the old blob_codec code. If I don't rename the constant, then the name doesn't work so well in the new code

Or are these things the same?

Yes, they are essentially the same thing.

func testBlobConversionForForm(t *testing.T, payloadBytes []byte, form PolynomialForm) {
payload := NewPayload(payloadBytes)

blob, err := payload.ToBlob(form)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The natural interpretation for this reads wrong :(
It reads "payload is getting to blob in this form" instead of "payload, interpreted in this form, gets converted to a blob".
Maybe there's a different way to structure the method such that the interpretation makes more sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like perhaps defining on Blob a method .fromPayloadInForm()? Then we could use like
new(Blob).fromPayloadInForm(payload, form). It's def more ugly but at least it reads more easily I feel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of this idea: when constructing a payload, declare in the constructor what form it is. The sequence would then read:

payload := NewPayload(payloadBytes, form)
blob, err := payload.ToBlob()

I personally really dislike like the pattern of creating a new empty object, just to call a method that creates another instance of the object

Comment on lines 16 to 21
for i := 0; i < iterations; i++ {
originalData := testRandom.Bytes(testRandom.Intn(1024) + 1)
testBlobConversionForForm(t, originalData, PolynomialFormEval)
testBlobConversionForForm(t, originalData, PolynomialFormCoeff)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fuzzing would be better here as it does a smarter search than this random iid generation of test examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we should be testing the edge cases like empty bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on board with fuzz testing, but when do we actually run our fuzz tests? Do they run in CI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a stab at fuzz testing in c70fad8d LMK what you think

Comment on lines +90 to +92
// blobLength is the length of the blob IN SYMBOLS, as claimed by the blob commitment. This is needed to make sure
// that the claimed length in the encoded payload header is valid relative to the total blob length
func encodedPayloadFromElements(fieldElements []fr.Element, blobLength uint32) (*encodedPayload, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment makes me wonder whether we shouldn't instead define this method on the blob itself? Why do we need this method here? Seems like its at the wrong place given that it requires a blobLength?

Comment on lines +14 to +16
type coeffPoly struct {
fieldElements []fr.Element
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates an existing type (currently in encoding/rs/frame_coeffs.go in master, named something different if you aren't up to date with the latest master`).

type FrameCoeffs []fr.Element

Also, there is a bunch of serialization logic in that file as well for going back and forth between serialized and deserialized form.

Is the plan to circle back and replace these old types in a follow up PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see []fr.Element used directly as a type in many places in this PR. Is there a reason why you don't use the fancy type you define here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you put this into a struct instead of doing type coefPoly []fr.Element?

Comment on lines +53 to +55
func (p *coeffPoly) getBytes() []byte {
return rs.FieldElementsToBytes(p.fieldElements)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference for methods like this where we convert to/from []byte is to use a serialize/deserialize naming schema. Open to discussion if you don't agree.

Comment on lines +14 to +16
type evalPoly struct {
fieldElements []fr.Element
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Particular reason why you use a struct instead of type fieldElements []fr.Element?

Comment on lines +12 to +17
powers := make([]T, powersToGenerate)
for i := T(0); i < powersToGenerate; i++ {
powers[i] = T(math.Pow(2, float64(i)))
}

return powers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can get more efficiently get powers of two by doing the following:

Suggested change
powers := make([]T, powersToGenerate)
for i := T(0); i < powersToGenerate; i++ {
powers[i] = T(math.Pow(2, float64(i)))
}
return powers
powers := make([]T, powersToGenerate)
for i := T(0); i < powersToGenerate; i++ {
powers[i] = 1 << i
}
return powers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants