-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: note on DA, DAS, erasure codes #135
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
9bb06e0
feat: note on DA, DAS, erasure codes
ultraviolet10 36c7eaf
Merge branch 'main' into ac/da
ultraviolet10 8e1f766
add resource links
ultraviolet10 16c01db
Merge branch 'ac/da' of https://github.com/ultraviolet10/protocol-stu…
ultraviolet10 75a6cc0
Merge branch 'main' into ac/da
ultraviolet10 fa8cf5a
fix: detail
ultraviolet10 e32a62d
DA edits
taxmeifyoucan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# Data Availability | ||
|
||
A core responsibility of any layer-1 blockchain is to guarantee _data availability_. | ||
|
||
Data availability refers to the availability of block data, i.e. headers and bodies containing the transaction tree. During synchronization, nodes download each block from its peers to verify it and add to local database. If the recieved block contains any incorrect transactions or any block validity rules, the is rejected as invalid. A block validation during sync is necessary for the node to operate trustlessly and build the current state which requires both historical and latest data. | ||
|
||
Ethereum solves the data availability problem by requiring full nodes to download each block (and reject it if part of the data is unavailable). Each node can just request a specific block over p2p network from its peers and verify it locally. | ||
|
||
The data availability becomes a bottleneck for scalable systems like rollups which aim to proces a lot of data. Optimistic rollups require transaction data to be available for their fraud-proof mechanism to identify and prove the incorrect state transition. And even validity-proof-based systems require data availability to ensure liveness (e.g., to prove asset ownership for a rollup’s escape hatch or forced transaction inclusion mechanism). | ||
|
||
Thus, the data availability problem for rollups (and other off-chain scaling protocols) is proving to the base layer (L1) that data for recreating the L2 state is available without requiring L1 nodes to download blocks and store copies of the data. | ||
|
||
## The Ethereum Approach (pre-4844) | ||
|
||
Rollups on Ethereum are operated by a _sequencer_: a full node that accepts transactions from users on a L2, processes those transactions (using the rollup's virtual machine), and produces L2 blocks that update the state of contracts and accounts on the rollup. | ||
|
||
To enable Ethereum to monitor and enforce a rollup’s state transitions, a special “consensus” contract stores a state root—a cryptographic commitment to the rollup’s canonical state (similar to a block header on the L1 chain) that is generated after processing all transactions in a new batch. | ||
|
||
At intervals, the sequencers arranges transactions in a new batch, and publishes the batch to the L1 by calling the L1 contract that stores rollup batches and passing the compressed data as _calldata_ to the batch submission function. Effectively storing L2 data on the L1 to ensure the availability. | ||
|
||
This was a significant cost overhead for L2s for which _blobs_ were introduced as an alternative. Compression techniques, such as removing redundancy, using bitmaps, exploiting common function signatures, employing scientific notation for large numbers, and utilizing contract storage for repetitive data, enabled the reduction of calldata size, leading to substantial gas savings. | ||
|
||
## Data Availability Sampling | ||
|
||
The purpose of sampling is to provide the node a probabilistic guarantee that data is available. Light nodes do require a connection to other (honest) nodes to make samples. This is also true for full nodes in all blockchains – they don’t conduct data availability sampling but they require a connection to at least one other node that will provide them with historical data over the p2p network. | ||
|
||
In DAS, each node only ends up downloaing a small portion of the data, but the sampling is done _client-side_, and within each blob rather than between blobs. Each node (including client nodes that are not participating in staking) checks every blob, but instead of downloading the whole blob, they privately select N random indices in the blob (eg. N = 20), and attempt to download the data at just those positions. | ||
|
||
The goal of this task is to verify that at least half of the data in each blob is _available_. | ||
|
||
![data-availability-sampling](../img/scaling/das.png) | ||
|
||
### Erasure Coding | ||
|
||
Erasure coding allows us to encode blobs in such a way that if at least half of the data in a blob is published, anyone in the network can reconstruct and re-publish the rest of the data; once that re-published data propagates, the clients that initially rejected the blob will converge to accept it. | ||
|
||
On a high level, an erasure correcting code works like this: A vector of `k` information chunks gets encoded into a _longer_ vector of `n` coded chunks. The rate `R = k/n` of the code measures the redundancy introduced by the code. Subsequently, from certain subsets of the coded chunks we can decode the original information chunks. | ||
|
||
If the code is [_maximum distance separable_](https://www.johndcook.com/blog/2020/03/07/mds-codes/) (MDS), then the original information chunks can be recovered from any subset of size of coded chunks, a useful efficiency and robustness guarantee. The concept works on the simple principle that if you know 2 points of a line, you can uniquely determine the entire line. In other words: Once you know the polynomial’s evaluation at `t` distinct locations, you can obtain its evaluation at any other location (by first recovering the polynomial, and then evaluating it). | ||
|
||
We require blocks to commit to the Merkle root of this "extended" data, and have light clients probabilistically check that the majority of the extended data is available, on which one of the three things is true | ||
|
||
- The entire extended data is available, the erasure code is constructed correctly, and the block is valid. | ||
- The entire extended data is available, the erasure code is constructed correctly, but the block is invalid. | ||
- The entire extended data is available, but the erasure code is constructed incorrectly. | ||
|
||
In case (1), the block is valid and the light client can accept it. In case (2), it is expected that some other node will quickly construct and relay a fraud proof. In case (3), it is also expected that some other node will quickly construct and relay a specialized kind of fraud proof that shows that the erasure code is constructed incorrectly. If a light client receives no fraud proofs for some time, it will take that as evidence that the block is in fact valid. | ||
|
||
An interesting implementation to study can be found [here](https://github.com/ethereum/research/tree/master/erasure_code/ec65536). | ||
|
||
Other resources: | ||
|
||
- https://github.com/ethereum/research/wiki/A-note-on-data-availability-and-erasure-coding#what-is-the-data-availability-problem | ||
- https://hackmd.io/@alexbeckett/a-brief-data-availability-and-retrievability-faq | ||
- https://arxiv.org/abs/1809.09044 | ||
- https://ethereum.org/en/developers/docs/data-availability/ |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a section for resources. There are many great articles on DA and solutions you explained here. Ethorg does a great job on DA https://ethereum.org/en/developers/docs/data-availability/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few interesting ones, lmk what you think!