Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

onion messages #9039

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open

Conversation

accumulator
Copy link
Member

@accumulator accumulator commented May 6, 2024

This PR is ready for review.

  • added CLI commands
    • get_blinded_path_via to construct a blinded path
      • only with direct peers as introduction point currently
        • longer routes should be trivial to add
      • optionally add dummy hops to obfuscate true blinded path length
    • send_onion_message to send a text onionmsg_tlv.message payload to a node_id or blinded path
      • receiving electrum node will log these on info level
  • route finding over onion message capable nodes using channels as edges
    • tested against c-lightning
    • not tested against electrum peers, as electrum does not publish to the channel graph
    • known issue: valid paths with offline hops are a problem
  • direct connect to destination peer/introduction point if no path can be found over channel graph
  • OnionMessageManager class implementing queues for rate limiting
    • request-reply queue for matching replies to requests,
    • retries in case no reply within x seconds etc. (not needed for text onionmsg_tlv.message payloads, which are fire-and-forget, but eventually bolt12 will need this for e.g. 'invoice_request')
      • known issue: not trying alternate routes yet
    • forwarding queue, for limiting amount and pacing of onion message forwards
  • serialization and deserialization of subtype declarations in wire definitions

Testing

  • Testing done with sending onion messages across electrum instances Alice <-> Bob <-> Carol <-> Dave using node_ids and blinded paths as destination.
    • no route finding, see below
    • typical test:
      1. alice get_blinded_path_via $(bob nodeid)
      2. carol send_onion_message <blinded path> hello
  • Testing done with sending onion messages and invoice_requests to c-lightning instances Alice <-> cln01 <-> cln02 <-> cln03 <-> Bob
    • with routing finding (so offer with introduction point not a direct peer)
    • reply matching with request (using bolt12 branch to issue invoice_request from offer)

known issues

  • route finding over channel graph is unreliable, as nodes can be offline and no alternative routes are tried and no direct peer connection is attempted once a valid but unusable path is found.
    • it should be possible to determine from gossip if an edge is usable
  • direct peer connect to dest/introduction point is more reliable, but zero privacy if not using TOR.
  • retries in case of request-reply is not using alternative routes yet

@accumulator
Copy link
Member Author

this branch is now onion_messages only, all bolt12 stuff has been moved to the bolt12 branch, which builds on top of this branch/PR

@accumulator accumulator force-pushed the onion_messages branch 2 times, most recently from 4ce8580 to 9c4bec1 Compare June 6, 2024 12:08
@accumulator accumulator force-pushed the onion_messages branch 5 times, most recently from d7d81e9 to 20b8bd9 Compare July 3, 2024 16:40
@accumulator accumulator changed the title onion messages wip onion messages Jul 3, 2024
@accumulator accumulator marked this pull request as ready for review July 3, 2024 16:58
@accumulator accumulator force-pushed the onion_messages branch 2 times, most recently from 652ced0 to c38ba15 Compare July 6, 2024 09:19
electrum/lnmsg.py Outdated Show resolved Hide resolved
electrum/lnpeer.py Outdated Show resolved Hide resolved
@accumulator accumulator marked this pull request as ready for review November 13, 2024 14:59

def test_create_blinded_path(self):
pubkey = bfh('02eec7245d6b7d2ccb30380bfbe2a3648cd7a942653f5aa340edcea1f283686619') # alice
session_key = bfh('3030303030303030303030303030303030303030303030303030303030303030')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why session_key is '303030..' and not '030303...' here? is this a typo?

tests/test_onion_message.py Outdated Show resolved Hide resolved
@ecdsa
Copy link
Member

ecdsa commented Nov 17, 2024

The current version of get_shared_secrets_along_route seems unnecessarily complicated. We should be very careful about introducing complexity, as it may become difficult to get rid of it down the road.

Firstly, there is redundancy between session_key and the first ephemeral key override passed in payment_path_pubkeys_plus. I think this is ugly. We should not use a list of polymorphic objects just in order to a key that sometimes overrides an existing parameter (session_key). If we have to create a path that contains more than one override, we can just call that function twice (see below), instead of making it unnecessarily complicated.

Secondly, passing a boolean that affects the type of the returned result is also not very beautiful. I think this method should always return blinded_node_ids, even when we don't need them; there is no point trying to save the cost of computing blinded public keys here.

I suggest to rewrite get_shared_secrets_along_route as follows (this is a diff against master, not against this PR):

 def get_shared_secrets_along_route(payment_path_pubkeys: Sequence[bytes],
-                                   session_key: bytes) -> Sequence[bytes]:
+                                   session_key: bytes) -> Tuple[Sequence[bytes], Sequence[bytes]]:
     num_hops = len(payment_path_pubkeys)
     hop_shared_secrets = num_hops * [b'']
+    hop_blinded_node_ids = num_hops * [b'']
     ephemeral_key = session_key
     # compute shared key for each hop
     for i in range(0, num_hops):
         hop_shared_secrets[i] = get_ecdh(ephemeral_key, payment_path_pubkeys[i])
+        hop_blinded_node_ids[i] = get_blinded_node_id(payment_path_pubkeys[i], hop_shared_secrets[i])
         ephemeral_pubkey = ecc.ECPrivkey(ephemeral_key).get_public_key_bytes()
         blinding_factor = sha256(ephemeral_pubkey + hop_shared_secrets[i])
         blinding_factor_int = int.from_bytes(blinding_factor, byteorder="big")
         ephemeral_key_int = int.from_bytes(ephemeral_key, byteorder="big")
         ephemeral_key_int = ephemeral_key_int * blinding_factor_int % ecc.CURVE_ORDER
         ephemeral_key = ephemeral_key_int.to_bytes(32, byteorder="big")
-    return hop_shared_secrets
+    return hop_shared_secrets, hop_blinded_node_ids

This is a lot simpler than the current PR, and I believe this is sufficient. If we need to create a blinded path with an override (for the moment this is only needed in the unit tests AFAICT), then it is easy to do that by calling the above method twice, and concatenate the results:

        hop_shared_secrets1, blinded_node_ids1 = get_shared_secrets_along_route([ALICE_PUBKEY], BLINDING_SECRET)                                                                              
        hop_shared_secrets2, blinded_node_ids2 = get_shared_secrets_along_route([BOB_PUBKEY, CAROL_PUBKEY, DAVE_PUBKEY], BLINDING_OVERRIDE_SECRET)                                            
        hop_shared_secrets = hop_shared_secrets1 + hop_shared_secrets2                                                                                                                        
        blinded_node_ids = blinded_node_ids1 + blinded_node_ids2

@@ -211,6 +259,29 @@ def new_onion_packet(
hmac=next_hmac)


def encrypt_encrypted_data_tlv(*, shared_secret, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is unfortunate. It suggests that we are encrypting a second time something that was already encrypted. It is enough to call this method encrypt_data_tlv (and to call the next one decrypt_data_tlv). And, yes, I am aware that the name encrypted_data_tlv is part of the spec. That's not a reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this looks confusing. I would suggest encrypt_onionmsg_data_tlv then, to indicate it is an onion message record.

@@ -1462,7 +1461,7 @@ def send_node_announcement(self, alias:str):
addrlen=len(addresses),
addresses=addresses)
h = sha256d(raw_msg[64+2:])
signature = ecc.ECPrivkey(self.privkey).ecdsa_sign(h, sigencode=ecdsa_sig64_from_r_and_s)
signature = ECPrivkey(self.privkey).ecdsa_sign(h, sigencode=ecdsa_sig64_from_r_and_s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of cosmetic change can be committed to master independently of this pull request. It is not useful to include it in the pull request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done to be consistent with ECPubkey, which was already imported and used without ecc.

our_onion_private_key=self.privkey,
associated_data=payment_hash,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, no reason to overload a PR with parameter reordering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, associated_data was moved to a kwarg, so it makes sense here.

if peer_addr := lnwallet.channel_db.get_last_good_address(node_id):
peer = await lnwallet.add_peer(str(peer_addr))
await peer.initialized
return [PathEdge(short_channel_id=None, start_node=None, end_node=node_id)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed? Bolt-04 suggests that onion messages should use already existing connections, not create new ones:

Onion messages allow peers to use existing connections

I would rather not have methods that create extra peer connections as side-effect, unless there is a clear use case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the number of onion message supporting nodes is very limited. If the user has only channels with nodes that do not support onion messages, doing a bolt12 invoice_request will always fail.

In the future, when most of the network is onion message capable, this will not be an issue anymore, so it can be seen as a transitional feature.

Note1: we only open a peer connection if the client wants to perform a invoice_request and no route can be found over the existing channel peers, so there's a clear use-case.
Note2: C-Lightning currently opens new peer connections as well, if needed.

@ecdsa
Copy link
Member

ecdsa commented Nov 17, 2024

Another thing that needs to be improved is the unit tests provided in this PR. We should refrain from copy-pasting long hexadecimal strings in our code. Since there is a json file provided by the RFC, why not use that file directly? Here is a branch where I started to do that:
https://github.com/spesmilo/electrum/tree/onion_messages_unittest

@ecdsa
Copy link
Member

ecdsa commented Nov 17, 2024

Also note that some of the provided unit tests are not very useful (test_decode_onion_message_packet seems useless).
OTOH, some important functions are not tested. It would be nice to have a unit tests that covers send_onion_message_to

@ecdsa
Copy link
Member

ecdsa commented Nov 17, 2024

Renaming reqrpy to requestreply is an improvement. We could even use reply, as there is no reason to use two words here.

@ecdsa
Copy link
Member

ecdsa commented Nov 17, 2024

A general note on formatting: if we use multiple lines for the parameters passed to a function, we should avoid indenting them at the level of the opening parenthesis, but use an extra line instead. Example:

-                blinded_path = OnionWireSerializer._read_complex_field(fd=blinded_path_fd,
-                                                                       field_type='blinded_path',
-                                                                       count=1)
+                blinded_path = OnionWireSerializer._read_complex_field(
+                    fd=blinded_path_fd,
+                    field_type='blinded_path',
+                    count=1)

Why? because changing the method name will trigger reindentation of all the lines in the first case, not in the second case. This results in larger diffs. See for example the commit where submit_reqrpy is renamed to submit_requestreply

@accumulator
Copy link
Member Author

The current version of get_shared_secrets_along_route seems unnecessarily complicated. We should be very careful about introducing complexity, as it may become difficult to get rid of it down the road.

Yes I don't like it either.

I suggest to rewrite get_shared_secrets_along_route as follows (this is a diff against master, not against this PR):

 def get_shared_secrets_along_route(payment_path_pubkeys: Sequence[bytes],
-                                   session_key: bytes) -> Sequence[bytes]:
+                                   session_key: bytes) -> Tuple[Sequence[bytes], Sequence[bytes]]:
     num_hops = len(payment_path_pubkeys)
     hop_shared_secrets = num_hops * [b'']
+    hop_blinded_node_ids = num_hops * [b'']
     ephemeral_key = session_key
     # compute shared key for each hop
     for i in range(0, num_hops):
         hop_shared_secrets[i] = get_ecdh(ephemeral_key, payment_path_pubkeys[i])
+        hop_blinded_node_ids[i] = get_blinded_node_id(payment_path_pubkeys[i], hop_shared_secrets[i])
         ephemeral_pubkey = ecc.ECPrivkey(ephemeral_key).get_public_key_bytes()
         blinding_factor = sha256(ephemeral_pubkey + hop_shared_secrets[i])
         blinding_factor_int = int.from_bytes(blinding_factor, byteorder="big")
         ephemeral_key_int = int.from_bytes(ephemeral_key, byteorder="big")
         ephemeral_key_int = ephemeral_key_int * blinding_factor_int % ecc.CURVE_ORDER
         ephemeral_key = ephemeral_key_int.to_bytes(32, byteorder="big")
-    return hop_shared_secrets
+    return hop_shared_secrets, hop_blinded_node_ids

This is a lot simpler than the current PR, and I believe this is sufficient. If we need to create a blinded path with an override (for the moment this is only needed in the unit tests AFAICT), then it is easy to do that by calling the above method twice, and concatenate the results:

        hop_shared_secrets1, blinded_node_ids1 = get_shared_secrets_along_route([ALICE_PUBKEY], BLINDING_SECRET)                                                                              
        hop_shared_secrets2, blinded_node_ids2 = get_shared_secrets_along_route([BOB_PUBKEY, CAROL_PUBKEY, DAVE_PUBKEY], BLINDING_OVERRIDE_SECRET)                                            
        hop_shared_secrets = hop_shared_secrets1 + hop_shared_secrets2                                                                                                                        
        blinded_node_ids = blinded_node_ids1 + blinded_node_ids2

Yes this is much better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants