Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define batch kernel inputs/outputs #919

Open
Fumuran opened this issue Oct 14, 2024 · 8 comments
Open

Define batch kernel inputs/outputs #919

Fumuran opened this issue Oct 14, 2024 · 8 comments
Assignees
Labels
kernels Related to transaction, batch, or block kernels
Milestone

Comments

@Fumuran
Copy link
Contributor

Fumuran commented Oct 14, 2024

What should be done?

Define the kernel inputs and outputs for the batch of transactions.

How should it be done?

To be clarified

When is this task done?

To be clarified

Additional context

No response

@bobbinth
Copy link
Contributor

The purpose of the batch kernel is to generate a single proof that we've verified some number of proven transactions. This would involve recursively verifying individual transaction proofs, which in turn, would require access to public inputs/outputs of the transaction kernel. As a reminder, these look like:

Inputs:  [BLOCK_HASH, account_id, INITIAL_ACCOUNT_HASH, INPUT_NOTES_COMMITMENT]
outputs: [OUTPUT_NOTES_COMMITMENT, FINAL_ACCOUNT_HASH, tx_expiration_block_num]

Where:

  • INPUT_NOTES_COMMITMENT is a sequential hash of (nullifier, empty_word_or_note_hash) tuples with empty_word_or_note_hash set to hash(note_id, note_metadata) for notes which are not authenticated by the transaction kernel.
  • OUTPUT_NOTES_COMMITMENT is a sequential hash of (note_id, note_metadata) tuples.

Batch kernel inputs

Probably the simplest approach is to set batch kernel inputs to a commitment of transactions included in the batch. This could be computed as a sequential hash of transaction_ids (a transaction ID is a commitment to initial/final account states and input/output note commitments). The kernel then would "unhash" these IDs non-deterministically into the underlying transaction inputs/outputs and verify the proofs against them.

Batch kernel outputs

Public outputs of the batch kernel need to include the following:

  • Commitment to all account modifications performed in this batch.
  • Commitment to all input notes consumed in this batch.
  • Commitment to all output notes produced by this batch.
  • batch expiration block number.

Account updates commitment

For each modified account, we need to capture the following information at the minimum:

  • Account ID.
  • Initial state before any transactions where applied to the account.
  • Final state after all the transactions were applied to the account.
  • List of transactions which affected the account.

However, it may be beneficial for forward-lookin purposes, keep track of which transaction resulted in which state. In this case, we could compute the commitment as follows:

hash(account_id || initial_state || tx_id_0 || state_after_tx_0 || tx_id_1 || state_after_tx_1 ...)

If we were to lay this out in terms of words, it would look something like this:

  • words 0, 1: [account_id, 0, 0, 0, INITIAL_ACCOUNT_HASH]
  • words 2, 3: [TX_0_ID, ACCOUNT_HASH_AFTER_TX_0]
  • words 4, 5: [TX_1_ID, ACCOUNT_HASH_AFTER_TX_1]

This arrangement also allows us to always hash double words which will be pretty efficient.

Input notes commitment

The simplest way to track input note commitments is to basically use the same scheme as used in the transaction kernel (i.e., hash of (nullifier, empty_word_or_note_hash) tuples). However, here too it may be beneficial to track ID of the transactions which consumed the notes. We could do this by laying out each not info as follows:

[TX_ID, NULLIIFER, NOTE_ID, NOTE_METADATA]

For authenticated notes NOTE_ID and NOTE_METADATA would be zeros.

Output notes comment

Commitment to the output notes must be a root of the batch note SMT - so, there isn't much flexibility here. However, if we wanted to track which transaction produce which output note, we'd also need a separate commitment to that. This could be a sequential hash of note data where each note is defined as:

[TX_ID, NOTE_ID]

Open questions

  • Is there a better way to track associations between transactions and account updates, input notes, and output notes?

@bobbinth
Copy link
Contributor

A potentially simpler option is to rely on the transaction ID commitments to link transactions to accounts, input notes, and output notes. Then, batch kernel inputs/outputs would look roughly like so:

Inputs:  [TRANSACTIONS_COMMITMENT]
Outputs: [INPUT_NOTES_COMMITMENT, OUTPUT_NOTES_SMT_ROOT, batch_expiration_block_num]

Where:

  • TRANSACTIONS_COMMITMENT is just a sequential hash of (transaction_id, account_id) tuples.
  • INPUT_NOTES_COMMITMENT is a sequential hash of (nullifier, empty_word_or_note_hash) tuples.

Conceptually, inputs into the batch kernel would be a set of ordered ProvenTransaction objects, and the output would be something like a ProvenBatch object. This object would contain a set of VerifiedTransaction objects which could look something like this:

pub struct VerifiedTransaction {
    id: TransactionId,
    account_update: TxAccountUpdate,
    input_notes: InputNotes<InputNoteCommitment>,
    output_notes: OutputNotes,
}

A couple of general notes:

  • One of the reasons why we care about keeping track of transaction level info is that in the future we may need to resolve conflicts between batches (e.g., when two batches contain duplicate or conflicting transactions).
  • INPUT_NOTES_COMMITMENT is a separate output because the set of notes committed by this hash could be different from the set of notes committed to via the TRANSACTIONS_COMMITMENT. The reason for this is that some output notes may be consumed inside the batch (via unauthenticated input notes) and such notes will be eliminated from the list of input notes of the batch.

@bobbinth bobbinth added this to the v0.7 milestone Nov 7, 2024
@bobbinth bobbinth modified the milestones: v0.7, v0.8 Jan 9, 2025
@PhilippGackstatter
Copy link
Contributor

PhilippGackstatter commented Jan 22, 2025

Inputs: [TRANSACTIONS_COMMITMENT]
Outputs: [INPUT_NOTES_COMMITMENT, OUTPUT_NOTES_SMT_ROOT, batch_expiration_block_num]

Is this missing ACCOUNT_UPDATES_COMMITMENT as an output or was that only in the first option you mentioned? If so, I don't quite understand how the batch output commits to the account updates.

Edit: Ah, this is covered implicitly through the inital and final account hash that goes into each transaction ID within the transactions commitment. That would cover the init/final account state before/after the tx, as well as a commitment to the input and output notes. It's only missing a direct commitment to the account ID, but the ID is committed to by the account hash, so it may be fine?

@PhilippGackstatter
Copy link
Contributor

Note Erasure

I want to make sure I understood the need for INPUT_NOTES_COMMITMENT as an output of the batch. So based on this message on Discord and the following quick example: Let's assume there are the following two transactions in the same batch:

TX A: Consumes unauthenticated note X.
TX B: Consumes authenticated note Z, produces note X.

The TRANSACTIONS_COMMITMENT commits to A and B, and therefore to input notes X (transiently committed to by TX ID of A) and Z (transiently committed to by TX ID of B).
However, the INPUT_NOTES_COMMITMENT would only commit to Z, because X was produced and consumed within the same batch and so it isn't an authenticated input note to the batch. The block kernel doesn't care about this note, because from the perspective of the block, i.e. "outside the batch", it's as if the note never existed in the sense that it was never part of any block header's note_root.
So in particular, the set of created nullifiers in the batch would not include note X, otherwise a major point of unauthenticated notes would be gone (i.e. they don't bloat the state of the chain). Is that correct?

And whether we want to have the same "note erasure" mechanism for the block kernel is a question left for when we implement that one.

@PhilippGackstatter
Copy link
Contributor

You suggested to have ProvenBatch contain a Vec<VerifiedTransaction> where VerifiedTransaction is defined as:

pub struct VerifiedTransaction {
    id: TransactionId,
    account_update: TxAccountUpdate,
    input_notes: InputNotes<InputNoteCommitment>,
    output_notes: OutputNotes,
}

In the ProposedBatch from #1095, various fields are aggregated versions of the underlying individual transactions.

  • The updated_accounts contains the aggregated account updates, i.e. a single account update potentially representing the state transition from $A \rightarrow C$ if transactions $A$, $B$ and $C$ were applied to it.
  • The input notes commitment of the proposed batch is not necessarily equivalent to the collected result of all verified transactions (due to unauthenticated notes being created and consumed within the same batch).
  • The output_notes in the verified transaction would differ from the batch output note SMT since the notes which were created and consumed in the same batch would be removed from the overall SMT but are present in the verified transaction's output_notes that created it.

So having VerifiedTransaction in the proposed batch seems like duplicate but also stale data. Do we still need to retain it alongside the above information because we need the concrete transaction data rather than just commitments?

If so, I guess ProposedBatch would have to look like in #1095 plus a field of Vec<VerifiedTransaction>.

@bobbinth
Copy link
Contributor

Inputs: [TRANSACTIONS_COMMITMENT]
Outputs: [INPUT_NOTES_COMMITMENT, OUTPUT_NOTES_SMT_ROOT, batch_expiration_block_num]

Is this missing ACCOUNT_UPDATES_COMMITMENT as an output or was that only in the first option you mentioned? If so, I don't quite understand how the batch output commits to the account updates.

Edit: Ah, this is covered implicitly through the inital and final account hash that goes into each transaction ID within the transactions commitment. That would cover the init/final account state before/after the tx, as well as a commitment to the input and output notes. It's only missing a direct commitment to the account ID, but the ID is committed to by the account hash, so it may be fine?

Yes, this was my thinking - though, there may also be reasons to still include ACCOUNT_UPDATES_COMMITMENT as a separate output. This really depends on what do we think would simplify batch aggregation in the block kernel. In general, one of the tasks of the batch kernel is to make subsequent work for block kernel easier.

I want to make sure I understood the need for INPUT_NOTES_COMMITMENT as an output of the batch. So based on this message on Discord and the following quick example: Let's assume there are the following two transactions in the same batch:

TX A: Consumes unauthenticated note X. TX B: Consumes authenticated note Z, produces note X.

The TRANSACTIONS_COMMITMENT commits to A and B, and therefore to input notes X (transiently committed to by TX ID of A) and Z (transiently committed to by TX ID of B). However, the INPUT_NOTES_COMMITMENT would only commit to Z, because X was produced and consumed within the same batch and so it isn't an authenticated input note to the batch. The block kernel doesn't care about this note, because from the perspective of the block, i.e. "outside the batch", it's as if the note never existed in the sense that it was never part of any block header's note_root. So in particular, the set of created nullifiers in the batch would not include note X, otherwise a major point of unauthenticated notes would be gone (i.e. they don't bloat the state of the chain). Is that correct?

Yes, correct!

And whether we want to have the same "note erasure" mechanism for the block kernel is a question left for when we implement that one.

Ideally, we want to have this at the block level as well - but this may have some additional complexities. So, we can evaluate this later.

You suggested to have ProvenBatch contain a Vec<VerifiedTransaction> where VerifiedTransaction is defined as:

pub struct VerifiedTransaction {
    id: TransactionId,
    account_update: TxAccountUpdate,
    input_notes: InputNotes<InputNoteCommitment>,
    output_notes: OutputNotes,
}

In the ProposedBatch from #1095, various fields are aggregated versions of the underlying individual transactions.

  • The updated_accounts contains the aggregated account updates, i.e. a single account update potentially representing the state transition from A → C if transactions A, B and C were applied to it.
  • The input notes commitment of the proposed batch is not necessarily equivalent to the collected result of all verified transactions (due to unauthenticated notes being created and consumed within the same batch).
  • The output_notes in the verified transaction would differ from the batch output note SMT since the notes which were created and consumed in the same batch would be removed from the overall SMT but are present in the verified transaction's output_notes that created it.

So having VerifiedTransaction in the proposed batch seems like duplicate but also stale data. Do we still need to retain it alongside the above information because we need the concrete transaction data rather than just commitments?

If so, I guess ProposedBatch would have to look like in #1095 plus a field of Vec<VerifiedTransaction>.

I'm thinking this is something we may need in the longer term. Basically, right now, a block does not contain enough info to figure out what happened in a given transaction. This is not something that is critical for building blocks, but it would be nice to have this info to show to the user. For example, currently, a transaction in the block explorer looks something like this:

Image

But it would be great to show which nullifiers were consumed in the transaction and which output notes were created. We will also probably care about this info anyway in the future when batches may contain duplicate or conflicting transactions and we'd need to manage this conflict resolution in the block kernel (this is not something that'll happen while we are running in a centralized setting - so, not an immediate problem).

@PhilippGackstatter
Copy link
Contributor

But it would be great to show which nullifiers were consumed in the transaction and which output notes were created.

Thanks for explaining the reasoning of VerifiedTransaction. Agreed, that would be nice to have!

Note and Block Authentication

To "authenticate" an unauthenticated note in the current transaction batch implementation in the node, we basically only check if a NoteInclusionProof exists, but we don't verify it, so I'm wondering how we should do that.

Authenticating a note should work similar as in TransactionInputs and the transaction kernel, right? TransactionInputs takes a ChainMmr and the BlockHeader against which the transaction is executed. It verifies that the block header and chain mmr are consistent and then validates the note inclusion proofs:

// check the authentication paths of the input notes.
for note in input_notes.iter() {
if let InputNote::Authenticated { note, proof } = note {
let note_block_num = proof.location().block_num();
let block_header = if note_block_num == block_num {
&block_header
} else {
block_chain
.get_block(note_block_num)
.ok_or(TransactionInputError::InputNoteBlockNotInChainMmr(note.id()))?
};
validate_is_in_block(note, proof, block_header)?;
}
}

I'm wondering how this should work in the batch kernel without a public BLOCK_HASH input. Don't we need some kind of commitment to the block or at least to a BlockHeader::chain_root? Otherwise, what are we verifying the NoteInclusionProofs against?
Relatedly, wouldn't we also need that to verify block inclusion proofs for all blocks that ProvenTransactions in the batch have been executed against?

I'm wondering if we should add a BLOCK_HASH or CHAIN_MMR_HASH input which is then "unhashed" using advice provider inputs. We then use that similar to the tx kernel to authenticate unauthenticated notes against that block reference. We also use it to authenticate block hashes from contained transactions.
One requirement of this block hash would be that its corresponding block height must be at least as high as the highest of all reference blocks used by transactions in the batch.

@bobbinth
Copy link
Contributor

I'm wondering if we should add a BLOCK_HASH or CHAIN_MMR_HASH input which is then "unhashed" using advice provider inputs. We then use that similar to the tx kernel to authenticate unauthenticated notes against that block reference. We also use it to authenticate block hashes from contained transactions.
One requirement of this block hash would be that its corresponding block height must be at least as high as the highest of all reference blocks used by transactions in the batch.

Good point! Yes, let's add BLOCK_HASH as a public input, and as you said, we'd use to to authenticate un-authenticated notes and also to make sure reference blocks of transactions are in the chain MMR.

In the future, we'll also probably need this info for fee purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernels Related to transaction, batch, or block kernels
Projects
Status: No status
Development

No branches or pull requests

3 participants