Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added version comparison and tests #147

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

cjrolo
Copy link
Collaborator

@cjrolo cjrolo commented Jan 9, 2025

Closes #146

This PR addresses issue #146 in the following way:

  1. Adds Version information in the CompressorHeader (Extracted from Cargo.toml)
  2. On Decoding checks the version of the file against the version of the compressor.
    2.1 Doesn't allow decompression of higher version files by lower version compressor (Should it?)
  3. Add tests for the compression/decompression
  4. Minor version bump to start tagging the files where the header was introduced.

@cjrolo cjrolo requested review from rukai and worryg0d January 9, 2025 12:28
Copy link
Collaborator

@worryg0d worryg0d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I left some nits regarding some code duplications and little optimization.

2.1 Doesn't allow decompression of higher version files by lower version compressor (Should it?)

I think so. The frame format may be changed in the future, so parsing them with an outdated compressor version may lead to undefined behavior.

atsc/src/header.rs Outdated Show resolved Hide resolved
atsc/src/header.rs Outdated Show resolved Hide resolved
cjrolo and others added 2 commits January 10, 2025 14:34
Added review suggestion.
@cjrolo
Copy link
Collaborator Author

cjrolo commented Jan 21, 2025

@rukai Can you give a look at this PR please. Thanks!

@rukai
Copy link
Contributor

rukai commented Jan 22, 2025

I have two concerns:

  1. I am worried about coupling the file version to the crate version.
  2. Storing the version in bincode prevents us from actually reading older versions due to how the bincode protocol works.

1. Crate versioning vs file versioning

The needs for versioning the ATSC crate is:

Semantic versioning is important here since it enables interoperability across the crate ecosystem while allowing non-breaking changes to be easily pulled in via cargo update.

The needs for file format versioning is:

  • Allow making breaking changes to the file format

Semantic versioning is overkill here.
A simple incrementing version is all thats needed since all changes to the file format are breaking and ATSC will be required to automatically upgrade older versions to its native version.

Concerns with using the same version

  • What if we want to change the crate API without changing the file format?
  • What if we want to change the file format without changing the crate API?
  • During development we want to be able to bump the the file format version as file format changes occurs (possibly multiple times before a crate release occurs) so that developers can avoid breaking their test files. However the crate version is usually bumped only just before release of a new version.
  • What if we move the serialization/deserialization logic into its own crate? Then we wont be able to access the ATSC version via CARGO_PKG_VERSION like we do currently.

Proposed Solution

So instead I propose that we store the version of the file as a u32.
We would then keep a CURRENT_VERSION constant in ATSC that we increment any time a breaking change is made to the file format.
It would be easy enough to remember to do this since we would need to write a transform for each upgrade anyway. And these transforms should of course have corresponding tests written for them.

2. store the version out of bincode

Unlike protocols like json or cbor, bincode does not provide any mechanism for performing upgrades on old file formats.
I also believe that having ATSC perform upgrades on old file formats is an important requirement.

So, if we are to continue using bincode then we need to store the version out of band so that we can deserialize to the old bincode format and then upgrade the deserialized values to the new format.
This approach will also enable us to move off bincode in the future if we ever had the need.

I propose we do this by storing the u32 of the version as the first 4 bytes of the file and then begin decoding the bincode after reading those 4 bytes.

@rukai
Copy link
Contributor

rukai commented Jan 22, 2025

I took a look at #146 and can see how you ended up with this solution. The focus of the issue was on compatibility with the logic of the compressor rather than changes to the file format itself.

I still think that the approach I wrote up is worth considering since it enables changes to the file format as well.
If the logic of the compressor changes such that old files no longer decompress in a reasonable way then the file version should be bumped. (I wouldnt consider slight fluctuations in output values within margin of error to be worth a version bump)

But if the ability to change the file format is not important then we can disregard my previous comment.

Edit: Thinking about it, the addition of this version field demonstrates that we really do need versioning for file format changes, since its a breaking change to the existing file format.

@cjrolo
Copy link
Collaborator Author

cjrolo commented Jan 22, 2025

Amazing feedback, thanks!

I think it makes all the sense to allow header inspection before the bincode. I also think that the version as u32 is a great plan. Then then compressors don't need to tie to the compressor version, but to the file version, makes much more sense.

Going to rewrite this following this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add versioning to compressed files.
3 participants