-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add varint to MDS #574
base: main
Are you sure you want to change the base?
Add varint to MDS #574
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some testing changes
for x in range(-700, 700, 7): | ||
y = mds_encode('varint', x) | ||
z = mds_decode('varint', y) | ||
print(x, y, z) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the print statements both here and below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can we test ints and their expected lengths that we would encode through tokenization? For example, the vocab_size
of many models customers train is between 50 and 150k. Testing ints in that range would be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your spurious print statement life debt is paid
Add
varint
,varuint
encodings to MDS.