Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support UTF-8 encode/decode for CircuitString #2020

Merged
merged 10 commits into from
Feb 25, 2025

Conversation

chiro-hiro
Copy link
Contributor

@chiro-hiro chiro-hiro commented Feb 10, 2025

We're working on zkDatabase project and found an issue with unicode characters, I think this patch could help.

const a = CircuitString.fromString('Xin chào O1js!');

console.log(a.toString(), a.length().toString());

Before this patch:

Xin cho O1js! 13

After this patch:

Xin chào O1js! 15

@chiro-hiro chiro-hiro requested review from a team as code owners February 10, 2025 08:51
@mitschabaude
Copy link
Contributor

mitschabaude commented Feb 10, 2025

lgtm!

I just want to mention that this is breaking for circuits that use constant CircuitStrings with non-ASCII characters.
Current handling of those strings could be viewed as a bug, though, in which case you could argue we don't have to preserve these (buggy) circuits

Co-authored-by: Gregor Mitscha-Baude <[email protected]>
@chiro-hiro
Copy link
Contributor Author

Actually, our teams also treats all unicode strings as invalid strings and work around with Bytes, which is also problematic. I agree that this code shouldn't be rolled out without the notice for developers.

@45930
Copy link
Contributor

45930 commented Feb 10, 2025

In the past when we made updates to stable APIs, we created new versions targeting the next major o1js version and deprecated the existing version (but didn't modiify it). I wonder if we could do the same thing here and make a new class called RawCircuitStringUTF8 with the new behavior.

I'm not sure how many are currently using circuitString, but breaking VKs is really annoying.

@chiro-hiro
Copy link
Contributor Author

In the past when we made updates to stable APIs, we created new versions targeting the next major o1js version and deprecated the existing version (but didn't modiify it). I wonder if we could do the same thing here and make a new class called RawCircuitStringUTF8 with the new behavior.

I'm not sure how many are currently using circuitString, but breaking VKs is really annoying.

@45930 What do you think about adding new encoding parameter in .fromString()?

e.g:

type CircuitStringEncoding = 'utf-8' | 'ascii';

fromString(str: string, encoding: CircuitStringEncoding = 'ascii');

toString(str: string, encoding: CircuitStringEncoding = 'ascii');

This change will maintain the backward compatibility.

@45930
Copy link
Contributor

45930 commented Feb 12, 2025

@chiro-hiro I like that as an API, but your current changes modify the toValue and fromValue code in the raw circuit string, so both branches would use your new code, so it would still be a breaking change.

If you can update the fromString and toString methods instead, and leave the value conversion methods unchanged, I think that would work and not be breaking.

@chiro-hiro
Copy link
Contributor Author

chiro-hiro commented Feb 13, 2025

@45930 I've updated my code to support both ASCII and UTF-8 following your suggestion (minimizing the modifications in RawCircuitString). I may need to add several tests to CircuitString to make sure it working properly.

Add .setEncoding() method to help developer set default encoding to prevent misconfiguration.

import { CircuitString } from 'o1js';

const a = CircuitString.fromString('Xin chào O1js!'); // Xin cho O1js! 14

console.log(a.toString(), a.length().toString());

CircuitString.setEncoding('utf-8');

const b = CircuitString.fromString('Xin chào O1js!'); // Xin chào O1js! 15

console.log(b.toString(), b.length().toString());

@chiro-hiro
Copy link
Contributor Author

@45930 Can you check this PR? is it work for u?.

Copy link
Contributor

@45930 45930 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me!

Can you add a changelog entry and fix the styling issues (https://github.com/o1-labs/o1js/blob/main/README-dev.md#style-guide)? Then I can merge.

@chiro-hiro
Copy link
Contributor Author

I updated the changelog following your suggestion.

$ npx oxlint --config=./.oxlintrc.json src/lib/provable/string.ts
Found 0 warnings and 0 errors.
Finished in 4ms on 1 file with 90 rules using 16 threads.
$ npx prettier --config ./.prettierrc.cjs  -c src/lib/provable/string.ts
Checking formatting...
All matched files use Prettier code style!

Checked the lint and format as well, probably a false positive. Github action shown errors and warnings in 250 files.

@boray
Copy link
Member

boray commented Feb 23, 2025

I updated the changelog following your suggestion.

$ npx oxlint --config=./.oxlintrc.json src/lib/provable/string.ts
Found 0 warnings and 0 errors.
Finished in 4ms on 1 file with 90 rules using 16 threads.
$ npx prettier --config ./.prettierrc.cjs  -c src/lib/provable/string.ts
Checking formatting...
All matched files use Prettier code style!

Checked the lint and format as well, probably a false positive. Github action shown errors and warnings in 250 files.

Runningnpm run format src/lib/provable/string.ts should fix the failing check.

…into feature/uft8_in_circuit_string

# Conflicts:
#	src/lib/mina/transaction-validation.ts
#	src/lib/mina/transaction.test.ts
#	src/lib/mina/transaction.ts
@chiro-hiro
Copy link
Contributor Author

Hi @boray, I updated the given code. Didn't aware that the prettier rules were changed.

@boray
Copy link
Member

boray commented Feb 24, 2025

You can follow README-dev.md#bindings-check-in-ci to fix the failing bindings check.

@chiro-hiro
Copy link
Contributor Author

@boray My branch is out date and it was generate a huge patch, just sync with main. Can you trigger the action so I can get the binding patch?.

@chiro-hiro
Copy link
Contributor Author

@boray I can't reproduce the issue in Github action.

Node.js v22.12.0
Ubuntu on ARM64

@45930 45930 merged commit 82d1bf1 into o1-labs:main Feb 25, 2025
30 of 32 checks passed
@45930
Copy link
Contributor

45930 commented Feb 25, 2025

Thanks @chiro-hiro, the change looks good to me! Since there are no bindings diffs, I think it's ok to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants