Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New opcode: base64decode #2961

Closed
algoanne opened this issue Sep 27, 2021 · 11 comments · Fixed by #3220 or #3288
Closed

New opcode: base64decode #2961

algoanne opened this issue Sep 27, 2021 · 11 comments · Fixed by #3220 or #3288
Assignees

Comments

@algoanne
Copy link
Contributor

algoanne commented Sep 27, 2021

Create a new opcode, base64decode, that decodes base 64 inputs back into binary.

note: how do we handle different base 64 alphabets? probably need to include a flag for which alphabet.

@brianolson
Copy link
Contributor

brianolson commented Sep 27, 2021

Unnecessary. All our SDKs and APIs already do the work to have a binary-clean data path.

@jannotti
Copy link
Contributor

jannotti commented Sep 27, 2021 via email

@jannotti
Copy link
Contributor

I do agree with Brian, however, that we should make it clear that this opcode has a very specific use case. DOn't go around b64 encoding your data that you send to a smart contract just because this exists.

@algoanne algoanne changed the title New opcode: b64decode New opcode: base64decode Sep 27, 2021
@tzaffi tzaffi self-assigned this Nov 9, 2021
@tzaffi
Copy link
Contributor

tzaffi commented Nov 9, 2021

OpCodes Notes

@tzaffi
Copy link
Contributor

tzaffi commented Nov 12, 2021

@jannotti @algoanne
Can you expound on the "different alphabets" capability mentioned above? In go I do see several types of encodings:

image

Should this PR aim to support all these with a user supplied flag?

Thanks!

@tzaffi
Copy link
Contributor

tzaffi commented Nov 12, 2021

Circling around my own question base64 has 2 alphabets in the RFC:

  • Standard base64
  • const encodeStd = “ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/“
  • URL Encoding / Filename Safe base64url
  • const encodeURL = “ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_”

These are already present in go

@jannotti
Copy link
Contributor

In the standard we are most interested in supporting, the alphabet used is called websafe-base64-encoded, which I imagine is that second one. I think it would be nice to support both though. You could look at the the verify_ecdsa to see how we normally deal with an opcodes that takes a distinguishing immediate argument to differentiate different versions. The the code those immediate are are often called "fields" (because the first and most common use is to fetch a band field is a transaction).

@jannotti
Copy link
Contributor

jannotti commented Nov 13, 2021

There is some argument for not needing a field. Since those alphabets don't literally disagree on the meaning of any character, I suppose we could define our base64decode to work on either type, without the code needing to specify. (We would need to specify in order to support encode. But I don't think we should support encode)

@tzaffi
Copy link
Contributor

tzaffi commented Nov 15, 2021

That makes sense. If we "autodetect" the encoding, we'd need to handle the case of ambiguity (basically when there is one of +/ AND one of -_). In the case of having an extra argument we would just assert the encoding is as expected. In terms of implementation, the with argument approach is more straightorward as I ought to be able to pass through go's API as well as its error handling.

@jannotti
Copy link
Contributor

In order to avoid writing any decoding ourselves, and to make it easy to spec, if we go with the "TEAL decodes either alphabet" decision, I would recommend that the spec actually just says we try decoding with each alphabet in turn, until one passes or both fail. (And the implementation does likewise, which should give the behavior we'd want in your case of ambiguity - panic)

@tzaffi
Copy link
Contributor

tzaffi commented Nov 15, 2021

In terms of implementation, I think your last approach is the cleanest. However, in terms of costs, we get a bit of extra non-determinism. I believe base64url is much more common in the field (but I have no stats) so we would start by attempting base64urlDecode and if failing, base64stdDecode. In the latter case the cost would be almost 2x the cost of the former. These could be trivial costs in the end of the day. But my hunch is that it's better to aim for uniform costs. Of course, there is a way to "have the cake and eat" at the same time and not incur extra costs. But this would require me to re-invent the wheel and implement the decoding from scratch.

@tzaffi tzaffi linked a pull request Nov 22, 2021 that will close this issue
22 tasks
@tzaffi tzaffi linked a pull request Dec 8, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants