-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ztoc v1.0 #465
base: main
Are you sure you want to change the base?
Add ztoc v1.0 #465
Conversation
Signed-off-by: Kern Walster <[email protected]>
@@ -0,0 +1,209 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file has the functions that used to be called flatbufToZtoc
and ztocToFlatbuffer
. The only change is all the methods are suffixed with 09
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the above change I mentioned early, we also only need a single ztoc_marshaller.go
that has:
ztocToFlatbufZtoc
flatbufZtocToZtoc
flatbufZtoc09ToZtoc
(3 instead of 4 is because we really don't need a ztocToFlatbufZtoc09
, since we don't want to produce old flatbuf anymore; we need flatbufZtoc09ToZtoc
because we need to be able to unmarsha old data to a Ztoc
that without buildertool).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do want ztocToFlatbuffer09
in the short term because we need to give some time for snapshotters to update before we produce ztoc 1.0 by default and in the long term because we're working with content addressable storage so it's important that roundtripping bytes -> ztoc -> bytes produces exactly the same bytes as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like the BuildToolIdentifier
will exist in the Ztoc
go struct forever?
Unless we have a Ztoc09
struct and Ztoc
struct in our go code, and the Ztoc
will be moved to the standalone Ztoc
library in the future while Ztoc09
stays in SOCI. Because otherwise it doesn't make sense in the standalone Ztoc
lib the Ztoc
struct has a deprecated BuildToolIdentifier
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. To get around this, I think we can just hard code the value or mark the field deprecated. Hardcoding should mean we continue to produce the same indices forever. Deprecating means everything gets new digests after the change (but at least the old ones still work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we remove the field in flatbuf, but keep the field in our Ztoc
struct (and comment that this field is deprecated; or hardcode this value)?
I think to really "have a clean v1.0 that doesn't have deprecated fields", we need to remove the field in both flatbuf (to work around we make a 09 flatbuf copy now) AND Ztoc
struct (e.g., this field also shouldn't appear in Ztoc
struct, either as deprecated or hardcoded, in the standalone ztoc lib.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean we have 1 Ztoc
struct without the field and we hardcode the value that goes into the flatbuffer in ztocToFlatbuffer09
. Thinking a little deeper, we have to hardcode it to what we currently have (which is ”AWS SOCI CLI v0.1”
) to make sure that re-serializing produces the same digests.
@@ -0,0 +1,190 @@ | |||
package ztoc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file has the modified flatbufToZtoc
and ztocToFlatbuffer
. The only change is that I removed all of the references to build_tool_identifier
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look at the change shortly, but I have a general question.
Maybe I'm wrong, but I remember the reason to use flatbuffer
is for its good forward/backward compatibility. But so far, I didn't see how that benefit us? (e.g. here to deprecreate a field, we need to make a copy). This seems we're handling forward/backward by ourselves through our code instead of any protocols.
Can we utilize the forward/backward functionality brought by these protocols, regardless protobuf or flatbuf?
For example, I remember protobuf fields are numbered, so you can delete the deprecated field directly but you can not reuse the number even that field is removed/deprecated. Similarly, flatbuf seems have the deprecated
keyword for similar purpose.
If adding deprecated
to this flatbuf line can resolve the compatibility issue and enable us to remove the field in SOCI code, I think it's worthwhile to do so, even we need to keep the single line in the fbs file (tag the field as deprecated)
Yes. This is not the normal way to handle this even with flatbuffers. The reason for doing this is specifically to get a clean v1.0 that can exist outside of SOCI. If we want ztocs to be useful in general, I think it hurts our case to have a v1.0 with deprecated fields. |
But still, if we want to completely remove this field in 1.0, we can still remove this The only different is that: Now: "keep 2 fbs files: one with the field, and one without the fileld" v.s. "make the field 1.0: "only keep 1 fbs file w/o field" (remove the other file) v.s. "remove the |
@@ -0,0 +1,50 @@ | |||
namespace v09; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we change this to ztoc09
?
I feel it's hard to figure out that ztoc
and v09
are the same kind of folders (except version)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will change.
table Ztoc { | ||
version : string; // The version of the Ztoc in format <major_version>.<minor_version>, e.g. 1.0 | ||
build_tool_identifier : string; | ||
compressed_archive_size : long; | ||
uncompressed_archive_size : long; | ||
toc : TOC; | ||
compression_info : CompressionInfo; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to just add a new table Ztoc09
(within the same zotc.fbs
instead of a completely new ztoc09.fbs
) that without the builder_tool_identifier
field? This can reduce a lot of duplicated code. And we're not changing anything else (e.g. table CompressionInfo
/TOC
/etc)
So the workflow will be:
for new data:
Ztoc struct
(defined by SOCI) <-> Ztoc struct
(generated by flatbuf from Ztoc flatbuf table
) <-> Ztoc flatbuf bytes
for old data (only unmarshal because we don't want to produce new data in old definition, i.e., Ztoc09
):
Ztoc09 flatbuf bytes
-> Ztoc09 struct
(generated by flatbuf from Ztoc09 flatbuf table
) -> Ztoc struct
(defined by SOCI / just need to remove the builder tool field in Ztoc09 struct
).
This also make it clean (only a new Ztoc09
due to removal of build_tool_identifier
) and clear(no changes on other tables like CompressionInfo
/etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want these in the same flatbuffer definition because the goal is to keep v0.9 inside soci but move v1.0 to another repository eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ztoc09 could be defined in a new file that includes this file.
@@ -0,0 +1,209 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the above change I mentioned early, we also only need a single ztoc_marshaller.go
that has:
ztocToFlatbufZtoc
flatbufZtocToZtoc
flatbufZtoc09ToZtoc
(3 instead of 4 is because we really don't need a ztocToFlatbufZtoc09
, since we don't want to produce old flatbuf anymore; we need flatbufZtoc09ToZtoc
because we need to be able to unmarsha old data to a Ztoc
that without buildertool).
compression_info : CompressionInfo; | ||
} | ||
|
||
root_type Ztoc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common contents of ztoc and ztoc09 could go in a third file that the two main files include, reducing duplication. Then ztoc and the third file could be moved to an external repo later and just the smaller ztoc09 kept here. Example: https://gist.github.com/sparr/eeb7e407220d8e8a9451312c1dd08616
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this work if ztoc09.fbs
existed here and ztoc_types.fbs
existed in another git repository? Would we have to make it a submodule to get the import path right?
also we need to have necessary unit (or integration) tests to test those expected conversions. |
if ztoc.Version == "0.9" { | ||
flatbuf, err = ztocToFlatbuffer09(ztoc) | ||
} else { | ||
flatbuf, err = ztocToFlatbuffer(ztoc) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use switch here and return err in default
case? I think that makes it more clear and won't have wrong value by accident (if happens, we'll have error).
This needs a lot of work to be mergeable, but I wanted to discuss the direction before putting more effort into it. What do you all think?
Issue #, if available:
N/A
Description of changes:
This change copies our existing ztoc flatbuffer definition to ztoc09.fbs and removes
build_tool_identifier
from ztoc.fbs.The problem I'm trying to solve is that
build_tool_identifier
is a versioned string (e.g.AWS SOCI CLI v0.1
) which means that different versions of the tool produce different outputs for the same inputs. Removing the field is a breaking change so the thought was to make that breaking change for ztoc 1.0, but keep the original definition in SOCI so that existing v0.9 ztocs still work. The only way to do that is to duplicate the ztoc serialization logic.Alternatively, we could keep the field and remove the version (which still produces different output from different tools, even if they use the same library) or we could mark the field deprecated and just ignore it.
If we want ztoc to eventually live outside of SOCI, I think it makes sense to try to have a clean v1.0 that doesn't have deprecated fields. If we accept this change, then when we eventually split off ztoc, the v0.9 definitions would stay in SOCI and only the v1.0 logic would be split. That way we get backwards compatibility in SOCI and a clean ztoc package. That might artificially restrict the ztoc API because we would need a way for an external package (in this case SOCI) to serialize and deserialize a ztoc.
Testing performed:
N/A
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.