Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Codec trait #5

Merged
merged 4 commits into from
Mar 14, 2024
Merged

feat: add Codec trait #5

merged 4 commits into from
Mar 14, 2024

Conversation

vmx
Copy link
Member

@vmx vmx commented Mar 8, 2024

With the Codec trait it's now possible to have a unified API to encode and decode data across different codecs.

Th accompanying Links trait allows to extract all the links (CIDs) from some encoded IPLD data.

For examples of its implementation see:

@vmx vmx requested review from Stebalien and rvagg March 8, 2024 23:10
@Stebalien
Copy link
Contributor

So, one annoying thing about this as it doesn't support "views". I.e., types I can only "encode" into IPLD, but can't "decode" IPLD into (e.g., types that borrow their values).

But I guess I can probably live with that? One option would be to add separate encode/decode traits, but I'm not entirely sure how to combine them.

}

/// Trait for returning the links of a serialized IPLD data.
pub trait Links {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not Links<T>?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I wonder if we want a relationship with Codec<T>? Are there any codecs where (a) we support links but (b) don't actually support enumerating them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a separate trait, so it doesn't have to be <T>. The idea is that you can extract links out of the encoded data. If it would be <T>, then it could/would be part of the Codec trait.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. You don't care about T.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.

  1. It's kind of odd that Links doesn't have the code. But I don't have a great solution.
  2. It would be kind of nice to have Codec<T>: Links. That way if I take a codec as C: Codec<T> I also, get links.

src/codec.rs Outdated Show resolved Hide resolved
src/codec.rs Outdated
/// Decode a slice into the desired type.
fn decode(bytes: &[u8]) -> Result<T, Self::Error>;
/// Encode a type into bytes.
fn encode(obj: &T) -> Result<Vec<u8>, Self::Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... can we define encode_to_vec (with a default implementation) and encode(obj, writer), or something like that? Having to allocate is annoying.

src/codec.rs Outdated Show resolved Hide resolved
@Stebalien

This comment was marked as off-topic.

@Stebalien
Copy link
Contributor

So... I'm not 100% sure this is a good idea but... GATs may be the answer here:

use std::marker::PhantomData;

// stubs
struct Cid;
trait Serialize {}
trait Deserialize {}

trait Codec {
    const CODE: u64;
    type Error;
    // encoder/decoder can be implemented independently.
    type Encoder<T>;
    type Decoder<T>;

    fn encode<T>(value: &T) -> Result<Vec<u8>, Self::Error>
    where
        Self::Encoder<T>: Encode<T, Error = Self::Error>,
    {
        Self::Encoder::<T>::encode(value)
    }
    fn decode<T>(bytes: &[u8]) -> Result<T, Self::Error>
    where
        Self::Decoder<T>: Decode<T, Error = Self::Error>,
    {
        Self::Decoder::<T>::decode(bytes)
    }
    // links can go on the codec itself!
    fn links(bytes: &[u8]) -> Result<Vec<Cid>, Self::Error>;
}

trait Encode<T> {
    type Error;
    fn encode(value: &T) -> Result<Vec<u8>, Self::Error>;
}

trait Decode<T> {
    type Error;
    fn decode(bytes: &[u8]) -> Result<T, Self::Error>;
}

struct CborCoder<T>(PhantomData<T>);

impl<T> Encode<T> for CborCoder<T>
where
    T: Serialize,
{
    type Error = ();
    fn encode(_: &T) -> Result<Vec<u8>, Self::Error> {
        todo!("implement")
    }
}

impl<T> Decode<T> for CborCoder<T>
where
    T: Deserialize,
{
    type Error = ();
    fn decode(_: &[u8]) -> Result<T, Self::Error> {
        todo!("implement")
    }
}

struct DagCbor;

impl Codec for DagCbor {
    const CODE: u64 = 0x71;
    type Error = ();
    type Encoder<T> = CborCoder<T>;
    type Decoder<T> = CborCoder<T>;

    fn links(bytes: &[u8]) -> Result<Vec<Cid>, Self::Error> {
        todo!()
    }
}

@Stebalien
Copy link
Contributor

Ah, actually, there's no reason Encoder and Decoder need to be separate associated types. We can have a single Coder (not a great name).

@Stebalien
Copy link
Contributor

Yeah, actually, this works pretty great because we can create a single helper struct for serde-based encoding:

use std::marker::PhantomData;

struct Cid;
trait Serialize {}
trait Deserialize {}

trait Codec {
    const CODE: u64;
    type Error;
    type Encoding<T>;

    fn encode<T>(value: &T) -> Result<Vec<u8>, Self::Error>
    where
        Self::Encoding<T>: Encode<T, Error = Self::Error>,
    {
        Self::Encoding::<T>::encode(value)
    }
    fn decode<T>(bytes: &[u8]) -> Result<T, Self::Error>
    where
        Self::Encoding<T>: Decode<T, Error = Self::Error>,
    {
        Self::Encoding::<T>::decode(bytes)
    }
    fn links(bytes: &[u8]) -> Result<Vec<Cid>, Self::Error>;
}

trait Encode<T> {
    type Error;
    fn encode(value: &T) -> Result<Vec<u8>, Self::Error>;
}

trait Decode<T> {
    type Error;
    fn decode(bytes: &[u8]) -> Result<T, Self::Error>;
}

struct SerdeEncoding<T>(PhantomData<T>);

impl<T> Encode<T> for SerdeEncoding<T>
where
    T: Serialize,
{
    type Error = ();
    fn encode(_: &T) -> Result<Vec<u8>, Self::Error> {
        todo!("implement")
    }
}

impl<T> Decode<T> for SerdeEncoding<T>
where
    T: Deserialize,
{
    type Error = ();
    fn decode(_: &[u8]) -> Result<T, Self::Error> {
        todo!("implement")
    }
}

struct DagCbor;

impl Codec for DagCbor {
    const CODE: u64 = 0x71;
    type Error = ();
    type Encoding<T> = SerdeEncoding<T>;

    fn links(bytes: &[u8]) -> Result<Vec<Cid>, Self::Error> {
        todo!()
    }
}

@Stebalien
Copy link
Contributor

The downside is that this slightly complicates generic use of codecs. I.e., with your current Codec<T> I can write:

fn do_thing<C, T>(value: T)
where
    C: Codec<T>,
{
    ...
}

With the GAT approach, I need to write:

fn do_thing<C, T>(value: T)
where
    C: Codec,
    C::Encoding<T>: Encoding<T>,
{
    ...
}

Assuming I have some Encoding trait (which does seem reasonable to have).

trait Encoding<T>: Encode<T> + Decode<T> {}
impl<C, T> Encoding<T> for C where C: Encode<T> + Decode<T> {}

@vmx
Copy link
Member Author

vmx commented Mar 11, 2024

Thanks @Stebalien for looking into GATs. For me that's exactly what I wanted to try out, but I didn't find how to do it.

Nonetheless, to me it seems like doing the Links as a supertrait is the easier and more straightforward solution to what we want to achieve.

I've pushed to branches to serde_ipld_dagcbor in case you want to see the differences.

  • Your GAT based version is at https://github.com/ipld/serde_ipld_dagcbor/compare/codec-trait2. Please note that it doesn't even compile due to:
    error[E0271]: type mismatch resolving `<<C as Codec>::Encoding<T> as Encode<T>>::Error == <C as Codec>::Error`
      --> tests/codec.rs:49:19
       |
    49 |         C::encode(&value).map_err(Into::into).unwrap()
       |         --------- ^^^^^^ expected `ipld_core::codec::Codec::Error`, found `ipld_core::codec::Encode::Error`
       |         |
       |         required by a bound introduced by this call
       |
       = note: expected associated type `<C as ipld_core::codec::Codec>::Error`
                  found associated type `<<C as ipld_core::codec::Codec>::Encoding<T> as ipld_core::codec::Encode<T>>::Error`
       = note: an associated type was expected, but a different one was found
    note: required by a bound in `ipld_core::codec::Codec::encode`
      --> /home/vmx/.cargo/git/checkouts/rust-ipld-core-6f704dfbd2a35d9b/7b079d0/src/codec.rs:17:38
       |
    15 |     fn encode<T>(value: &T) -> Result<Vec<u8>, Self::Error>
       |        ------ required by a bound in this associated function
    16 |     where
    17 |         Self::Encoding<T>: Encode<T, Error = Self::Error>,
       |                                      ^^^^^^^^^^^^^^^^^^^ required by this bound in `Codec::encode`
    
    I couldn't even be bothered to try to fix it as the other solutions seemed so much nicer.
  • At https://github.com/ipld/serde_ipld_dagcbor/compare/codec-trait3 is the "Links as supertrait" version. You can see in the test at https://github.com/ipld/serde_ipld_dagcbor/blob/1f520c91ae5505044c355d2b0f44e319ec365fd3/tests/codec.rs#L41-L55 that a generic function can be implement trivially without much trait bounds:
    fn encode_generic<C, T>(value: T) -> Vec<u8>
    where
       C: Codec<T>,
       C::Error: std::fmt::Debug,
    {
       C::encode(&value).unwrap()
    }

So I would go with the "Links as supertrait", wdyt @Stebalien?

@Stebalien
Copy link
Contributor

Yeah, Links as a supertrait seems reasonable. I'm still concerned that there isn't a great way to specify a codec for encoding only (e.g., for "view" types), but maybe that's not critical?

@Stebalien
Copy link
Contributor

But I have no idea what's going on with that error, it seems like a rust bug.

@vmx
Copy link
Member Author

vmx commented Mar 12, 2024

Ok, I've pushed new commits which address:

  • Links is a supertrait. Its error type is called LinksError so it doesn't clash with the Error of the Codec. That makes function signatures simpler for Codec implementations
  • Links are now returned as an iterator
  • encode and decode take a writer/reader and there are default implementations for convenience

I've implemented that new version of the Codec trait for serde_ipld_dagcbor, serde_ipld_dagjson and ipld-dagpb to make sure it actually works as expected (and it does).

Base automatically changed from fix-macro-alloc to main March 14, 2024 18:27
vmx added 4 commits March 14, 2024 19:31
With the `Codec` trait it's now possible to have a unified API to encode and
decode data across different codecs.

Th accompanying `Links` trait allows to extract all the links (CIDs) from some
encoded IPLD data.
This way a `Codec` must always implement the `Links` trait as well.
Return in iterator instead of a vector.
Instead of passing in slices and returning vectors, pass in readers
and writers instead. Add default implementation for a slice/vector
based API.
@vmx
Copy link
Member Author

vmx commented Mar 14, 2024

The last push was just a proper rebase (no idea why the automatic didn't work as expected). I'll now squash merge it.

@vmx vmx merged commit f7127c7 into main Mar 14, 2024
@vmx vmx deleted the codec-trait branch March 14, 2024 18:33
/// easily exchanged or combined.
pub trait Codec<T>: Links {
/// The multicodec code of the IPLD codec.
const CODE: u64;
Copy link
Member

@rvagg rvagg Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're going to want NAME eventually too, it's pretty helpful. This trait is almost the same as the interfaces we landed on in JS, although we ended up splitting encode and decode operations into two parts and combining them into a single BlockCodec: https://github.com/multiformats/js-multiformats/blob/master/src/codecs/interface.ts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want the name. People should really use the constants as identifiers and not strings that might change.

Splitting encode and decode: I'd keep it simple for now, if the need of splitting ever occurs (which I currently doubt), we can do it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants