Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential for resource exhaustion vulnerability in case of a byzantine Execution Node #6899

Open
AlexHentschel opened this issue Jan 15, 2025 · 2 comments
Labels
Epic Execution Cadence Execution Team Preserve Stale Bot repellent Protocol Team: Issues assigned to the Protocol Pillar. S-BFT

Comments

@AlexHentschel
Copy link
Member

AlexHentschel commented Jan 15, 2025

Problem description

Please take a look at the DefaultSerializer in module/executiondatasync/execution_data/serializer.go

It defines its own encoding without the cbor flag cbor.ExtraDecErrorUnknownField (see networking layer's decoder configuration for comparison). My understanding is that the DefaultSerializer not strict in that it would just ignore data in the decoding step that is for an unknown struct field.
In PR #6879, we have added the following warning to the DefaultSerializer:

// DefaultSerializer is the default implementation for an Execution Data serializer.
// It is configured to use cbor encoding with LZ4 compression.
// 
// CAUTION: this encoding should only be used for encoding/decoding data within a node.
// If used for decoding data that is shared between nodes, it makes the recipient VULNERABLE
// to RESOURCE EXHAUSTION ATTACKS, where a byzantine sender could include garbage data in the
// encoding, which would not be noticed by the recipient because the garbage data is dropped 
// at the decoding step - yet, it consumes the recipient's networking bandwidth. 
var DefaultSerializer Serializer

Though, the problem remains that the usage of the DefaultSerializer in our code base is potentially unsafe and may expose a variety of nodes roles to undetectable resource exhaustion attacks.

Details on areas that may be vulnerable to resource exhaustion attacks

I haven't scanned through the code entirely and properly analyzed every usage, but based on a preliminary investigation, I think we may have major vulnerabilities for a resource exhaustion attacks in our code base 😱 ... At least, there is no argument provided in the code, why the implementation is not vulnerable, which would be the required level of rigour. Here is my understanding:

  • The DefaultSerializer is used in a variety of places Screenshot 2025-01-15 at 3 05 41 PM
  • My understanding is that the DefaultSerializer is used ExecutionDataStore, which in turn is used for chopping the execution data into blobs and sharing that data via bitswap with other nodes (?)
  • So a byzantine execution node could include a bunch of garbage data in the serialized data structure, which would be dropped by the recipient during the decoding and hence not noticed. Though, a byzantine data provider could easily consume the recipients entire networking bandwidth and thereby render it unresponsive.

Nodes that could be vulnerable:

  • verification nodes (receiving execution data)
  • access nodes (receiving execution data)
  • observer nodes (receiving execution data)

Nodes that could potentially mount the attack:

  • byzantine Execution Node
  • honest Verification, Access, Observer and Archive Nodes might unknowingly amplify an attack (if they nodes are authorized to provide data blobs, which is desired to reduce load from Execution Nodes where this data originates from), by sharing bloated data blobs

Definition of Done

This is an epic. The issue requires further investigation and scoping.

  1. investigate all usages of DefaultSerializer (outside of tests)
  2. fix usages that are vulnerable to resource exhaustion attacks
  3. For every usages of DefaultSerializer that are not vulnerable, a conclusive argument must be provided in the code explaining why the implementation is not vulnerable. A strong argument of byzantine-fault-tolerance is the required level of rigour for a high-assurance BFT software system. Absence of known vulnerabilities if insufficient.

I want to emphasize that I don't like that the word "default" is in the name of the DefaultSerializer struct. This is because components used by "default" should be BFT, which is not the case for the DefaultSerializer (principle of "Safety By Default").

@AlexHentschel AlexHentschel added Epic Execution Cadence Execution Team Preserve Stale Bot repellent Protocol Team: Issues assigned to the Protocol Pillar. S-BFT labels Jan 15, 2025
@fxamacker
Copy link
Member

fxamacker commented Jan 19, 2025

It defines its own encoding without the cbor flag cbor.ExtraDecErrorUnknownField

Thanks Alex! I'm not familiar with DefaultSerializer, its requirements, etc.

However, here is general info for people looking into using CBOR and security considerations.

As mentioned last year in my comment to @AlexHentschel on Feb 23, 2024:

The CBOR codec we use has configurable settings for security, deterministic encoding, etc.

The README of https://github.com/fxamacker/cbor has more info about security considerations, cbor settings, and trade-offs (speed vs security, etc.)

The README also includes a Quick Start Guide, links to CBOR RFCs, etc.

@fxamacker
Copy link
Member

fxamacker commented Jan 27, 2025

I defer to others on DefaultSerializer and the networking layer's decoder.

As a separate example, in onflow/cadence the CCF codec uses cbor and it is designed to not rely on cbor.ExtraDecErrorUnknownField.

CCF decoder doesn't need it to ignore unknown fields, so the cbor.ExtraDecErrorUnknownField flag is intentionally not used in CCF codec's default cbor decoder settings:

// defaultCBORDecMode
//
// See https://github.com/fxamacker/cbor:
// "For best performance, reuse EncMode and DecMode after creating them."
//
// Security Considerations in Section 10 of RFC 8949 states:
//
// "Hostile input may be constructed to overrun buffers, to overflow or underflow integer arithmetic,
// or to cause other decoding disruption. CBOR data items might have lengths or sizes that are
// intentionally extremely large or too short. Resource exhaustion attacks might attempt to lure a
// decoder into allocating very big data items (strings, arrays, maps, or even arbitrary precision numbers)
// or exhaust the stack depth by setting up deeply nested items. Decoders need to have appropriate resource
// management to mitigate these attacks."
var defaultCBORDecMode = func() cbor.DecMode {
	decMode, err := cbor.DecOptions{
		IndefLength:      cbor.IndefLengthForbidden,
		IntDec:           cbor.IntDecConvertNone,
		MaxArrayElements: 20_000_000, // 20 MB is current grpc size limit so this is more than enough
		MaxMapPairs:      20_000_000, // 20 MB is current grpc size limit so this is more than enough
		MaxNestedLevels:  math.MaxInt16,
	}.DecMode()
	if err != nil {
		panic(err)
	}
	return decMode
}()

// Decoder decodes CCF-encoded representations of Cadence values.
// Since CBOR security considerations apply to CCF, the CBOR
// codec used by CCF Decoder uses limits (e.g. MaxArrayElements,
// MaxMapPairs, MaxNestedLevels) specified by CBORDecMode.

CCF codec doesn't rely on cbor.ExtraDecErrorUnknownField flag because:

  • CCF Specification (onflow/ccf) defines known fields (in RFC 8610 CDDL notation).
  • CCF Codec (onflow/cadence) treats unknown fields as error.

Security considerations noted by CCF specs and CCF codec are from RFC 8949 and docs at fxamacker/cbor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic Execution Cadence Execution Team Preserve Stale Bot repellent Protocol Team: Issues assigned to the Protocol Pillar. S-BFT
Projects
None yet
Development

No branches or pull requests

2 participants