Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create payload->blob type system #1120

Open
wants to merge 38 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
9b222e0
Create type system
litt3 Jan 16, 2025
bae8843
Rename encoding
litt3 Jan 16, 2025
ebf9979
Merge branch 'master' into payload-blob-type-system
litt3 Jan 16, 2025
d66ddf2
Fix padding logic
litt3 Jan 16, 2025
f6021ea
Pad upon poly construction
litt3 Jan 16, 2025
422c2a8
Change name to ProtoBlob
litt3 Jan 16, 2025
a3d184b
Update comments
litt3 Jan 16, 2025
856f5fc
Revert "Change name to ProtoBlob"
litt3 Jan 17, 2025
1eeec85
Rename PayloadHeader to EncodedPayloadHeader
litt3 Jan 21, 2025
a30a4ad
Merge branch 'master' into payload-blob-type-system
litt3 Feb 7, 2025
5eab8dd
Rename to PayloadEncodingVersion
litt3 Feb 7, 2025
132e8de
Do a bunch of cleanup
litt3 Feb 7, 2025
da63536
Fix PayloadVersion naming
litt3 Feb 7, 2025
3016cac
Merge branch 'master' into payload-blob-type-system
litt3 Feb 7, 2025
9f98462
Use new conversion method
litt3 Feb 7, 2025
58b76c1
Add better descriptions to codec utils
litt3 Feb 10, 2025
916573f
Do test work
litt3 Feb 10, 2025
7eed618
Add length checks
litt3 Feb 11, 2025
c3a6291
Merge branch 'master' into payload-blob-type-system
litt3 Feb 11, 2025
b604c82
Write test for power of 2 util
litt3 Feb 11, 2025
2ef8e99
Write more tests
litt3 Feb 12, 2025
13d16c7
Finish utils tests
litt3 Feb 12, 2025
22e8134
Do FFT work
litt3 Feb 12, 2025
23b49c4
Clean up encoded payload
litt3 Feb 12, 2025
2add571
Merge branch 'master' into payload-blob-type-system
litt3 Feb 12, 2025
e5363e4
Make encoding version uint8 instead of byte
litt3 Feb 13, 2025
7801044
Add explanation for 0x00 padding
litt3 Feb 13, 2025
c70fad8
Use fuzz testing
litt3 Feb 13, 2025
617faa7
Simplify class structure
litt3 Feb 18, 2025
e22238b
Improve docs and naming
litt3 Feb 18, 2025
db45645
Improve test var name
litt3 Feb 18, 2025
c70db76
Use smarter power of 2 calc
litt3 Feb 18, 2025
5a0350f
Merge branch 'master' into payload-blob-type-system
litt3 Feb 18, 2025
3c239cc
Collapse fuzz tests
litt3 Feb 18, 2025
15805a9
Fix borked alignment
litt3 Feb 18, 2025
c028c00
Rename blobLength to blobLengthSymbols
litt3 Feb 18, 2025
693d513
Fix test
litt3 Feb 18, 2025
f1a4e85
Use serialze/deserialize nomenclature
litt3 Feb 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions api/clients/codecs/blob.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
package codecs

import (
"fmt"
)

// Blob is data that is dispersed on eigenDA.
//
// A Blob is represented under the hood by a coeff polynomial
type Blob struct {
coeffPolynomial *coeffPoly
// blobLength must be a power of 2, and should match the blobLength claimed in the BlobCommitment
// This is the blob length IN SYMBOLS, not in bytes
blobLength uint32
samlaf marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclear how coeffPolynomial and blobLength are related. Does len(coeffPolynomial) == blobLength? Guessing we allow last zeros to be truncated. We should explain that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also can't we just enforce that blobLength = nextPowerOf2(len(coeffPolynomial)) and then we don't need blobLength here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a corner case here:

  • Imagine a user disperses a very small blob, only 64 bytes, and the last 40 bytes are trailing 0s
  • When a different user fetches the blob from a relay, we've established that the relay could remove trailing 0s
  • If we were to say that blobLength = nextPowerOf2(len(coeffPolynomial)), then the user fetching and reconstructing this blob would determine that the blob length is 1 symbol, when it's actually 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a description of this corner case as a comment, if you confirm my logic here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@samlaf samlaf Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Wouldn't it be easier to just force the relay to keep those 0 bytes around? I'm always in favor of simplifying logic, especiallywhen the "corner cases" are things that will tbh never happen. Let me finish reviewing the PR to get the full picture of this implementation though.

Copy link
Contributor Author

@litt3 litt3 Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is any good way that we can force relays to keep these bytes around: it's a consequence of how our commitment scheme works. Let's assume we were to implement a policy that "forces" relays to NOT truncate trailing 0 bytes. Here a story which shows the problem with this policy:

  1. A dispersing client crafts a payload, which yields a blob with some trailing 0s
  2. This dispersing client is malicious, and decides to truncate the trailing 0s before sending the blob to the disperser, just to cause trouble
  3. The disperser receives the blob, everything goes smoothly, and the blob gets attested to by DA nodes
  4. When a retrieving client later fetches the blob from a relay, the commitments verify correctly
  5. When the retrieving client tries to decode the blob into a payload, though, it appears that this relay has maliciously truncated some zeros
  6. The retrieving client therefore decides to try with another relay. But whaddayaknow, this relay also appears to have truncated some zeros
  7. Etc., etc.. A malicious dispersing client is able to cause a situation where all commitments verify correctly, but it's not possible to retrieve a decodable blob from anywhere. Retrieving clients will spend lots of time and resources attempting to fetch the blob from every imaginable source, until finally timing out.

Aside from this resource consuming attack vector, a policy which attempts to "force" relays not to truncate zeros also complicates retrieval logic. As currently implemented, if the bytes sent by a relay are successfully verified against the commitments, there is never a need to try fetching from a different relay: either the received bytes are decodable or not decodable. If we add a way that a relay might respond "maliciously", which doesn't result in failed commitment verification, that would mean a failed payload decode might require another fetch attempt, contingent upon on the mode of failure.

}

// BlobFromBytes initializes a Blob from bytes
//
// blobLength is the length of the blob IN SYMBOLS
func BlobFromBytes(bytes []byte, blobLength uint32) (*Blob, error) {
poly, err := coeffPolyFromBytes(bytes)
if err != nil {
return nil, fmt.Errorf("polynomial from bytes: %w", err)
}

return BlobFromPolynomial(poly, blobLength)
}
samlaf marked this conversation as resolved.
Show resolved Hide resolved

// BlobFromPolynomial initializes a blob from a polynomial
//
// blobLength is the length of the blob IN SYMBOLS
func BlobFromPolynomial(coeffPolynomial *coeffPoly, blobLength uint32) (*Blob, error) {
return &Blob{
coeffPolynomial: coeffPolynomial,
blobLength: blobLength,
}, nil
}

// GetBytes gets the raw bytes of the Blob
func (b *Blob) GetBytes() []byte {
return b.coeffPolynomial.getBytes()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a preference for Bytes() without the get prefix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or toBytes() to be consistent with ToPayload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samlaf How do you feel about using the serialize / deserialize nomenclature? #1120 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes that’s even better!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized now that golang has standard interfaces for serialization: MarshalJSON() and MarshalBinary(). Might be worth it to add a comment as to why we define our own function that isn't marshalBinary (perhaps we should just implement MarshalBinary() instead? so that people could encode our structs to gob if they need to for whatever reason).

Also the fact that we need that extra length param to unmarshal is another consequence of this weird "edge case that'll never happen". If we changed that requirement then this code would also be much simpler and more standard go. Again, we can discuss (perhaps with your brother).. I might be missing something.

Copy link
Contributor Author

@litt3 litt3 Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth implementing an interface just because it exists, without a concrete use case. I suppose it would be fine to add a comment saying why we don't implement the interface, but it doesn't seem necessary: throughout our whole codebase, there are no implementations of BinaryMarshaler: I think the implicit explanation for why it's not implemented is that we don't need it. @cody-littley any strong opinions here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly one way or another about the golang interfaces, but I'm generally in favor of avoiding code complexity that doesn't have a justification on the short-to-medium term roadmap.

Perhaps also relevant, the string MarshalBinary doesn't appear anywhere in the eigenDA repo right now, so this isn't a pattern we are currently using.


// ToPayload converts the Blob into a Payload
//
// The payloadStartingForm indicates how payloads are constructed by the dispersing client. Based on the starting form
// of the payload, we can determine what operations must be done to the blob in order to reconstruct the original payload
func (b *Blob) ToPayload(payloadStartingForm PolynomialForm) (*Payload, error) {
samlaf marked this conversation as resolved.
Show resolved Hide resolved
var encodedPayload *encodedPayload
var err error
switch payloadStartingForm {
case PolynomialFormCoeff:
// the payload started off in coefficient form, so no conversion needs to be done
samlaf marked this conversation as resolved.
Show resolved Hide resolved
encodedPayload, err = b.coeffPolynomial.toEncodedPayload(b.blobLength)
if err != nil {
return nil, fmt.Errorf("coeff poly to encoded payload: %w", err)
}
case PolynomialFormEval:
// the payload started off in evaluation form, so we first need to convert the blob's coeff poly into an eval poly
samlaf marked this conversation as resolved.
Show resolved Hide resolved
evalPoly, err := b.coeffPolynomial.toEvalPoly(b.blobLength)
if err != nil {
return nil, fmt.Errorf("coeff poly to eval poly: %w", err)
}

encodedPayload, err = evalPoly.toEncodedPayload(b.blobLength)
if err != nil {
return nil, fmt.Errorf("eval poly to encoded payload: %w", err)
}
default:
return nil, fmt.Errorf("invalid polynomial form")
}

payload, err := encodedPayload.decode()
if err != nil {
return nil, fmt.Errorf("decode payload: %w", err)
}

return payload, nil
}
14 changes: 7 additions & 7 deletions api/clients/codecs/blob_codec.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,22 @@ import (
"fmt"
)

type BlobEncodingVersion byte
type PayloadEncodingVersion byte
samlaf marked this conversation as resolved.
Show resolved Hide resolved

const (
// This minimal blob encoding contains a 32 byte header = [0x00, version byte, uint32 len of data, 0x00, 0x00,...]
// PayloadEncodingVersion0 entails a 32 byte header = [0x00, version byte, big-endian uint32 len of payload, 0x00, 0x00,...]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add comment as to why we add the first 0x00 byte (so that it gets parsed as a bn254 FE). Took me a second to remember this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// followed by the encoded data [0x00, 31 bytes of data, 0x00, 31 bytes of data,...]
DefaultBlobEncoding BlobEncodingVersion = 0x0
PayloadEncodingVersion0 PayloadEncodingVersion = 0x0
)

type BlobCodec interface {
DecodeBlob(encodedData []byte) ([]byte, error)
EncodeBlob(rawData []byte) ([]byte, error)
}

func BlobEncodingVersionToCodec(version BlobEncodingVersion) (BlobCodec, error) {
func BlobEncodingVersionToCodec(version PayloadEncodingVersion) (BlobCodec, error) {
switch version {
case DefaultBlobEncoding:
case PayloadEncodingVersion0:
return DefaultBlobCodec{}, nil
Comment on lines +24 to 25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks a bit weird now. Why is payload version 0 mapping to a defaultBlobCodec? Should we use BlobCodecVersion0 instead? Or do we want separate versioning for the payloadEncoding and BlobCodec? Or are these things the same?

Copy link
Contributor Author

@litt3 litt3 Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the constant to what I felt made sense for the new code, but the name doesn't work so well in the old blob_codec code. If I don't rename the constant, then the name doesn't work so well in the new code

Or are these things the same?

Yes, they are essentially the same thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! Think I'm missing the venn diagram of which code is used for v1, which is used for v2, and which is used for both. is it not possible to completely separate v1 and v2 stuff?

default:
return nil, fmt.Errorf("unsupported blob encoding version: %x", version)
Expand All @@ -33,7 +33,7 @@ func GenericDecodeBlob(data []byte) ([]byte, error) {
// version byte is stored in [1], because [0] is always 0 to ensure the codecBlobHeader is a valid bn254 element
// see https://github.com/Layr-Labs/eigenda/blob/master/api/clients/codecs/default_blob_codec.go#L21
// TODO: we should prob be working over a struct with methods such as GetBlobEncodingVersion() to prevent index errors
version := BlobEncodingVersion(data[1])
version := PayloadEncodingVersion(data[1])
codec, err := BlobEncodingVersionToCodec(version)
if err != nil {
return nil, err
Expand All @@ -49,7 +49,7 @@ func GenericDecodeBlob(data []byte) ([]byte, error) {

// CreateCodec creates a new BlobCodec based on the defined polynomial form of payloads, and the desired
// BlobEncodingVersion
func CreateCodec(payloadPolynomialForm PolynomialForm, version BlobEncodingVersion) (BlobCodec, error) {
func CreateCodec(payloadPolynomialForm PolynomialForm, version PayloadEncodingVersion) (BlobCodec, error) {
lowLevelCodec, err := BlobEncodingVersionToCodec(version)
if err != nil {
return nil, fmt.Errorf("create low level codec: %w", err)
Expand Down
39 changes: 39 additions & 0 deletions api/clients/codecs/blob_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
package codecs

import (
"testing"

"github.com/Layr-Labs/eigenda/common/testutils/random"
"github.com/stretchr/testify/require"
)

// TestBlobConversion checks that internal blob conversion methods produce consistent results
func TestBlobConversion(t *testing.T) {
testRandom := random.NewTestRandom(t)

iterations := 1000

for i := 0; i < iterations; i++ {
originalData := testRandom.Bytes(testRandom.Intn(1024) + 1)
testBlobConversionForForm(t, originalData, PolynomialFormEval)
testBlobConversionForForm(t, originalData, PolynomialFormCoeff)
}
}
samlaf marked this conversation as resolved.
Show resolved Hide resolved

func testBlobConversionForForm(t *testing.T, payloadBytes []byte, form PolynomialForm) {
payload := NewPayload(payloadBytes)

blob, err := payload.ToBlob(form)
samlaf marked this conversation as resolved.
Show resolved Hide resolved
require.NoError(t, err)

blobBytes := blob.GetBytes()
blobFromBytes, err := BlobFromBytes(blobBytes, blob.blobLength)
require.NoError(t, err)

decodedPayload, err := blobFromBytes.ToPayload(form)
require.NoError(t, err)

decodedPayloadBytes := decodedPayload.GetBytes()

require.Equal(t, payloadBytes, decodedPayloadBytes)
}
63 changes: 63 additions & 0 deletions api/clients/codecs/coeff_poly.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
package codecs

import (
"fmt"

"github.com/Layr-Labs/eigenda/encoding/fft"
"github.com/Layr-Labs/eigenda/encoding/rs"
"github.com/consensys/gnark-crypto/ecc/bn254/fr"
)

// coeffPoly is a polynomial in coefficient form.
//
// The underlying bytes represent 32 byte field elements, and each field element represents a coefficient
type coeffPoly struct {
fieldElements []fr.Element
}
cody-littley marked this conversation as resolved.
Show resolved Hide resolved

// coeffPolyFromBytes creates a new polynomial from bytes. This function performs the necessary checks to guarantee that the
// bytes are well-formed, and returns a new object if they are
func coeffPolyFromBytes(bytes []byte) (*coeffPoly, error) {
fieldElements, err := rs.ToFrArray(bytes)
if err != nil {
return nil, fmt.Errorf("deserialize field elements: %w", err)
}

return &coeffPoly{fieldElements: fieldElements}, nil
}

// coeffPolyFromElements creates a new coeffPoly from field elements.
func coeffPolyFromElements(elements []fr.Element) *coeffPoly {
return &coeffPoly{fieldElements: elements}
}

// toEvalPoly converts a coeffPoly to an evalPoly, using the FFT operation
//
// blobLength (in SYMBOLS) is required, to be able to choose the correct parameters when performing FFT
func (p *coeffPoly) toEvalPoly(blobLength uint32) (*evalPoly, error) {
// TODO (litt3): this could conceivably be optimized, so that multiple objects share an instance of FFTSettings,
// which has enough roots of unity for general use. If the following construction of FFTSettings ever proves
// to present a computational burden, consider making this change.
fftSettings := fft.FFTSettingsFromBlobLength(blobLength)

// the FFT method pads to the next power of 2, so we don't need to do that manually
fftedElements, err := fftSettings.FFT(p.fieldElements, false)
if err != nil {
return nil, fmt.Errorf("perform FFT: %w", err)
}

return evalPolyFromElements(fftedElements), nil
}

// GetBytes returns the bytes that underlie the polynomial
func (p *coeffPoly) getBytes() []byte {
return rs.FieldElementsToBytes(p.fieldElements)
}
cody-littley marked this conversation as resolved.
Show resolved Hide resolved

// toEncodedPayload converts a coeffPoly into an encoded payload
//
// blobLength is required, to be able to perform length checks on the encoded payload during construction.
// blobLength is in symbols, NOT bytes
func (p *coeffPoly) toEncodedPayload(blobLength uint32) (*encodedPayload, error) {
return encodedPayloadFromElements(p.fieldElements, blobLength)
}
52 changes: 52 additions & 0 deletions api/clients/codecs/conversion_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
package codecs

import (
"bytes"
"testing"

"github.com/Layr-Labs/eigenda/common/testutils/random"
"github.com/stretchr/testify/require"
)

// TestConversionConsistency checks that data can be encoded and decoded repeatedly, always getting back the original data
// TODO: we should probably be using fuzzing instead of this kind of ad-hoc random search testing
func TestConversionConsistency(t *testing.T) {
testRandom := random.NewTestRandom(t)

iterations := 100

for i := 0; i < iterations; i++ {
originalData := testRandom.Bytes(testRandom.Intn(1024) + 1) // ensure it's not length 0

payload := NewPayload(originalData)

blob1, err := payload.ToBlob(PolynomialFormEval)
require.NoError(t, err)

blob2, err := payload.ToBlob(PolynomialFormCoeff)
require.NoError(t, err)

decodedPayload1, err := blob1.ToPayload(PolynomialFormEval)
require.NoError(t, err)

decodedPayload2, err := blob2.ToPayload(PolynomialFormCoeff)
require.NoError(t, err)

// Compare the original data with the decoded data
if !bytes.Equal(originalData, decodedPayload1.GetBytes()) {
t.Fatalf(
"Iteration %d: original and data decoded from blob1 do not match\nOriginal: %v\nDecoded: %v",
i,
originalData,
decodedPayload1.GetBytes())
}

if !bytes.Equal(originalData, decodedPayload2.GetBytes()) {
t.Fatalf(
"Iteration %d: original and data decoded from blob2 do not match\nOriginal: %v\nDecoded: %v",
i,
originalData,
decodedPayload1.GetBytes())
}
}
}
2 changes: 1 addition & 1 deletion api/clients/codecs/default_blob_codec.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ func (v DefaultBlobCodec) EncodeBlob(rawData []byte) ([]byte, error) {
codecBlobHeader := make([]byte, 32)
// first byte is always 0 to ensure the codecBlobHeader is a valid bn254 element
// encode version byte
codecBlobHeader[1] = byte(DefaultBlobEncoding)
codecBlobHeader[1] = byte(PayloadEncodingVersion0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this function EncodedBlob need to be changed? it returns []byte, shouldnt it return a Blob in your type system? Or whatever else this is (prob just an encodedPayload?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_blob_codec will not be used in v2, all encoding is contained in the new structs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see... shouldnt we separate v1 and v2 codecs then? Why are they all in the same package? Or am I missing something?


// encode length as uint32
binary.BigEndian.PutUint32(codecBlobHeader[2:6], uint32(len(rawData))) // uint32 should be more than enough to store the length (approx 4gb)
Expand Down
Loading
Loading