Enable a sequencer to re-sequence batches #894

cffls · 2024-08-01T03:33:24Z

Feature PR for #667

Overall workflow:

Stop sequencer
Unwind to target batch (Using command under https://github.com/0xPolygonHermez/cdk-erigon/blob/zkevm/cmd/integration/commands/stage_stages_zkevm.go). e.g. CDK_ERIGON_SEQUENCER=1 go run ./cmd/integration state_stages_zkevm --config=/path/to/config.yaml --unwind-batch-no=8 --chain dynamic-kurtosis --datadir /path/to/chain/data
Backup data stream files in case the files are corrupted during re-sequencing
Restart sequencer
The sequencer will detect rollback and start re-sequencing automatically
On re-sequencing, L2 blocks will either be the same as before, split into multiple, but they can never be merged. Some properties of rollback behavior:
- If a block is split into multiple blocks after re-sequencing, the split blocks will have the same timestamp as the original block.
- A new L2 block won’t include transactions that belonged to different blocks that got rolled back.
- A new sequenced batch won’t include blocks that belonged to different batches that got rolled back.

Tested the following rollback scenarios in kurtosis:
The original chain (before resequenced) contains both empty blocks and non-empty blocks with uniswap transactions.

Condition: ResequenceStrict=true. No L1InfoUpdate on L1. No counter limit change.
Result: Resequenced L2 blocks have exactly the same block hashes as previously created.
Condition: ResequenceStrict=true. No L1InfoUpdate on L1. Lowered counter limit.
Result: An error was raised because a previous batch had to be split into two batches.
Condition: ResequenceStrict=false. No L1InfoUpdate on L1. Lowered counter limit.
Result: Resequence completed with more number of batches than before.
Condition: ResequenceStrict=true. Multiple L1InfoUpdates on L1. No counter limit change.
Result: Same number of blocks and batches were resequenced. L2 block hash were the same.
Condition: ResequenceStrict=false. Multiple L1InfoUpdates on L1. No counter limit change.
Result: Same number of blocks and batches were resequenced. L2 block hash were different because different L1InfoUpdate index were used.

Stefan-Ethernal

Leaving some early feedback.

cmd/utils/flags.go

zk/datastream/client/stream_client.go

zk/tx/tx.go

zk/stages/stage_sequence_execute.go

hexoscott · 2024-08-05T13:51:54Z

Hey @cffls a few thoughts on the changes here:

the sequencer is the master source of the datastream on the network so connecting to itself using a stream client to get the batches to resequence might not be a good approach.
the sequencer code already has quite a few branches in logic around limbo and l1 recovery so the 3rd resequence might get confusing.

To my knowledge the process for marking a batch as bad would be along the lines of:

restart sequencer with a flag to mark an explicit stop batch height
allow it to run to this height where it will pause
let the sequence sender etc. catch up and get the network in a verified state
stop the sequencer
make the TX to the L1 to mark a batch as bad
wait for the block including this TX to be finalised on the L1
start the sequencer in L1 recovery mode...

This last step in my head would start the code we already have for L1 recovery but some new L1 tracking code would be scanning the L1 for this event around a bad batch and storing it in the database. Once we know the batch is bad we can immediately roll back the datastream to the last good batch along with the database using the normal unwind feature, from there the normal L1 recovery code would start but as part of our "bad batch" checks during recovery we'd have one more check for this scenario, when we encounter it we simply mark the batch as bad as do already and move on to normal recovery. This way blocks and batches would be an exact replicate of the L1 data apart from skipping the bad batch. Once L1 recovery is completed to the previous tip you can restart the sequencer as normal and the network continues.

I think this approach works out simpler and we don't need to do anything fancy around splitting blocks or handling timestamps, we just recreate the L1 exactly as it was before. This has the advantage of also handling things like parent hashes automatically without fuss.

What do you think?

hexoscott · 2024-08-05T14:06:44Z

I think maybe there has been something missed here with regards to the banana fork requirements of the batch that will be removed being something that happens on the L1 to affect the chains history.

hexoscott

Hi @cffls - after the discussion yesterday and looking over the changes in a new light it's looking good and making a lot more sense approach wise to me now.

There are a couple of things that would help here I think:

Making this process really explicit by using a flag, if it isn't set then the sequencer won't perform the checks to see if it is behind and just behave as normal.
Rather than using a datastream client pointing back to itself, we have the data_stream_server.go file that should let you work with the stream without needing the client and it will work directly with the files on disk. That approach seems cleaner to me.

If we can make those changes I think we'll be in a safe place to test this and get it merged.

cffls · 2024-08-06T20:13:29Z

There are a couple of things that would help here I think:
Making this process really explicit by using a flag, if it isn't set then the sequencer won't perform the checks to see if it is behind and just behave as normal.
Rather than using a datastream client pointing back to itself, we have the data_stream_server.go file that should let you work with the stream without needing the client and it will work directly with the files on disk. That approach seems cleaner to me.

Thanks @hexoscott . I've addressed your feedbacks accordingly.

hexoscott · 2024-08-07T12:58:42Z

Great 😄 could you give me a nudge once the latest zkevm branch stuff is merged in so I don't miss it 🙏

cffls · 2024-08-07T20:43:27Z

Thanks @hexoscott . I've merged from zkevm branch and resolved the conflicts.

hexoscott

Nice updates 😄 just one little thing around the use of the flag in the sequencing stage

zk/stages/stage_sequence_execute.go

…rge conflict

sonarcloud · 2024-09-03T20:10:52Z

Quality Gate passed

Issues
9 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

hexoscott

Code looks good, just a few comments from the last time I had a peek at it.

cmd/integration/commands/stage_stages_zkevm.go

zk/stages/stage_sequence_execute.go

zk/stages/stage_sequence_execute_utils.go

cla-bot bot added the cla-signed label Aug 1, 2024

cffls requested review from hexoscott, mandrigin and Stefan-Ethernal August 1, 2024 03:33

Stefan-Ethernal reviewed Aug 1, 2024

View reviewed changes

cffls marked this pull request as ready for review August 2, 2024 15:29

cffls force-pushed the resequence branch from 0bdc8c0 to b46d46c Compare August 5, 2024 20:58

hexoscott requested changes Aug 6, 2024

View reviewed changes

cffls force-pushed the resequence branch from 6df6d1e to e4a7345 Compare August 6, 2024 20:12

cffls force-pushed the resequence branch from e4a7345 to 154c504 Compare August 7, 2024 20:40

hexoscott requested changes Aug 8, 2024

View reviewed changes

zk/stages/stage_sequence_execute.go Show resolved Hide resolved

cffls added 2 commits August 8, 2024 09:41

Enable a sequencer to re-sequence batches

0b97e62

Stop producing blocks after resequencing completes

e2a6df5

cffls force-pushed the resequence branch from 154c504 to e2a6df5 Compare August 8, 2024 19:42

cffls added 3 commits August 8, 2024 16:02

Introduce flag reuseL1InfoIndex

1a110cf

Merge remote-tracking branch 'origin/zkevm' into resequence

efc7748

Merge remote-tracking branch 'origin/zkevm' into resequence

8376630

Sharonbc01 mentioned this pull request Aug 15, 2024

Ability for CDK Erigon to support rollback sequences features #667

Closed

cffls added 7 commits August 15, 2024 13:56

Merge remote-tracking branch 'origin/zkevm' into resequence

8e8ea1e

Merge remote-tracking branch 'origin/zkevm' into resequence

135a0fb

Merge remote-tracking branch 'origin/zkevm' into resequence

407f919

Merge remote-tracking branch 'origin/zkevm' into resequence

977de7a

Move ReadParsedProto back to data stream client in order to reduce me…

234144c

…rge conflict

Merge remote-tracking branch 'origin/zkevm' into resequence

089573a

Skip iteration when the last batch is incomplete

a979f1b

cffls added 2 commits August 30, 2024 10:20

Merge remote-tracking branch 'origin/zkevm' into resequence

37097f1

Merge remote-tracking branch 'origin/zkevm' into resequence

c783cc9

Merge remote-tracking branch 'origin/zkevm' into resequence

b8fb897

This was referenced Sep 9, 2024

Regression test CDK Erigon with integrated banana rollback feature code changes #1141

Closed

Merge Banana rollback feature PR following Sequencer deployment to Cardona #1156

Closed

cffls added 3 commits September 12, 2024 12:17

Merge remote-tracking branch 'origin/zkevm' into resequence

eac6c21

Wait for pending verifications during resequencing

ed27ecc

Merge remote-tracking branch 'origin/zkevm' into resequence

46b2e1a

cffls requested a review from hexoscott September 17, 2024 17:18

Merge remote-tracking branch 'origin/zkevm' into resequence

2f1351b

cffls requested review from Stefan-Ethernal and revitteth September 18, 2024 19:00

hexoscott requested changes Sep 19, 2024

View reviewed changes

cmd/integration/commands/stage_stages_zkevm.go Outdated Show resolved Hide resolved

zk/stages/stage_sequence_execute.go Show resolved Hide resolved

zk/stages/stage_sequence_execute_utils.go Outdated Show resolved Hide resolved

zk/stages/stage_sequence_execute_utils.go Show resolved Hide resolved

Address PR comments

bccd57f

cffls requested a review from hexoscott September 19, 2024 17:00

hexoscott approved these changes Sep 20, 2024

View reviewed changes

Merge branch 'zkevm' into resequence

b69fcc2

cffls merged commit 292a2ec into 0xPolygonHermez:zkevm Sep 20, 2024
13 of 16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable a sequencer to re-sequence batches #894

Enable a sequencer to re-sequence batches #894

cffls commented Aug 1, 2024 •

edited

Loading

Stefan-Ethernal left a comment

hexoscott commented Aug 5, 2024 •

edited

Loading

hexoscott commented Aug 5, 2024

hexoscott left a comment

cffls commented Aug 6, 2024

hexoscott commented Aug 7, 2024

cffls commented Aug 7, 2024

hexoscott left a comment

sonarcloud bot commented Sep 3, 2024

hexoscott left a comment

Enable a sequencer to re-sequence batches #894

Enable a sequencer to re-sequence batches #894

Conversation

cffls commented Aug 1, 2024 • edited Loading

Stefan-Ethernal left a comment

Choose a reason for hiding this comment

hexoscott commented Aug 5, 2024 • edited Loading

hexoscott commented Aug 5, 2024

hexoscott left a comment

Choose a reason for hiding this comment

cffls commented Aug 6, 2024

hexoscott commented Aug 7, 2024

cffls commented Aug 7, 2024

hexoscott left a comment

Choose a reason for hiding this comment

sonarcloud bot commented Sep 3, 2024

Quality Gate passed

hexoscott left a comment

Choose a reason for hiding this comment

cffls commented Aug 1, 2024 •

edited

Loading

hexoscott commented Aug 5, 2024 •

edited

Loading