feat: add Codec trait #5

vmx · 2024-03-08T23:10:38Z

With the Codec trait it's now possible to have a unified API to encode and decode data across different codecs.

Th accompanying Links trait allows to extract all the links (CIDs) from some encoded IPLD data.

For examples of its implementation see:

Stebalien · 2024-03-08T23:26:55Z

So, one annoying thing about this as it doesn't support "views". I.e., types I can only "encode" into IPLD, but can't "decode" IPLD into (e.g., types that borrow their values).

But I guess I can probably live with that? One option would be to add separate encode/decode traits, but I'm not entirely sure how to combine them.

Stebalien · 2024-03-08T23:27:27Z

src/codec.rs

+}
+
+/// Trait for returning the links of a serialized IPLD data.
+pub trait Links {


not Links<T>?

Also, I wonder if we want a relationship with Codec<T>? Are there any codecs where (a) we support links but (b) don't actually support enumerating them?

It's a separate trait, so it doesn't have to be <T>. The idea is that you can extract links out of the encoded data. If it would be <T>, then it could/would be part of the Codec trait.

Ah, makes sense. You don't care about T.

Hm.

It's kind of odd that Links doesn't have the code. But I don't have a great solution.

It would be kind of nice to have Codec<T>: Links. That way if I take a codec as C: Codec<T> I also, get links.

src/codec.rs

Stebalien · 2024-03-08T23:31:10Z

src/codec.rs

+    /// Decode a slice into the desired type.
+    fn decode(bytes: &[u8]) -> Result<T, Self::Error>;
+    /// Encode a type into bytes.
+    fn encode(obj: &T) -> Result<Vec<u8>, Self::Error>;


Hm... can we define encode_to_vec (with a default implementation) and encode(obj, writer), or something like that? Having to allocate is annoying.

src/codec.rs

Stebalien · 2024-03-09T00:52:18Z

So... I'm not 100% sure this is a good idea but... GATs may be the answer here:

use std::marker::PhantomData;

// stubs
struct Cid;
trait Serialize {}
trait Deserialize {}

trait Codec {
    const CODE: u64;
    type Error;
    // encoder/decoder can be implemented independently.
    type Encoder<T>;
    type Decoder<T>;

    fn encode<T>(value: &T) -> Result<Vec<u8>, Self::Error>
    where
        Self::Encoder<T>: Encode<T, Error = Self::Error>,
    {
        Self::Encoder::<T>::encode(value)
    }
    fn decode<T>(bytes: &[u8]) -> Result<T, Self::Error>
    where
        Self::Decoder<T>: Decode<T, Error = Self::Error>,
    {
        Self::Decoder::<T>::decode(bytes)
    }
    // links can go on the codec itself!
    fn links(bytes: &[u8]) -> Result<Vec<Cid>, Self::Error>;
}

trait Encode<T> {
    type Error;
    fn encode(value: &T) -> Result<Vec<u8>, Self::Error>;
}

trait Decode<T> {
    type Error;
    fn decode(bytes: &[u8]) -> Result<T, Self::Error>;
}

struct CborCoder<T>(PhantomData<T>);

impl<T> Encode<T> for CborCoder<T>
where
    T: Serialize,
{
    type Error = ();
    fn encode(_: &T) -> Result<Vec<u8>, Self::Error> {
        todo!("implement")
    }
}

impl<T> Decode<T> for CborCoder<T>
where
    T: Deserialize,
{
    type Error = ();
    fn decode(_: &[u8]) -> Result<T, Self::Error> {
        todo!("implement")
    }
}

struct DagCbor;

impl Codec for DagCbor {
    const CODE: u64 = 0x71;
    type Error = ();
    type Encoder<T> = CborCoder<T>;
    type Decoder<T> = CborCoder<T>;

    fn links(bytes: &[u8]) -> Result<Vec<Cid>, Self::Error> {
        todo!()
    }
}

Stebalien · 2024-03-09T00:54:38Z

Ah, actually, there's no reason Encoder and Decoder need to be separate associated types. We can have a single Coder (not a great name).

Stebalien · 2024-03-09T00:58:05Z

Yeah, actually, this works pretty great because we can create a single helper struct for serde-based encoding:

use std::marker::PhantomData;

struct Cid;
trait Serialize {}
trait Deserialize {}

trait Codec {
    const CODE: u64;
    type Error;
    type Encoding<T>;

    fn encode<T>(value: &T) -> Result<Vec<u8>, Self::Error>
    where
        Self::Encoding<T>: Encode<T, Error = Self::Error>,
    {
        Self::Encoding::<T>::encode(value)
    }
    fn decode<T>(bytes: &[u8]) -> Result<T, Self::Error>
    where
        Self::Encoding<T>: Decode<T, Error = Self::Error>,
    {
        Self::Encoding::<T>::decode(bytes)
    }
    fn links(bytes: &[u8]) -> Result<Vec<Cid>, Self::Error>;
}

trait Encode<T> {
    type Error;
    fn encode(value: &T) -> Result<Vec<u8>, Self::Error>;
}

trait Decode<T> {
    type Error;
    fn decode(bytes: &[u8]) -> Result<T, Self::Error>;
}

struct SerdeEncoding<T>(PhantomData<T>);

impl<T> Encode<T> for SerdeEncoding<T>
where
    T: Serialize,
{
    type Error = ();
    fn encode(_: &T) -> Result<Vec<u8>, Self::Error> {
        todo!("implement")
    }
}

impl<T> Decode<T> for SerdeEncoding<T>
where
    T: Deserialize,
{
    type Error = ();
    fn decode(_: &[u8]) -> Result<T, Self::Error> {
        todo!("implement")
    }
}

struct DagCbor;

impl Codec for DagCbor {
    const CODE: u64 = 0x71;
    type Error = ();
    type Encoding<T> = SerdeEncoding<T>;

    fn links(bytes: &[u8]) -> Result<Vec<Cid>, Self::Error> {
        todo!()
    }
}

Stebalien · 2024-03-09T01:12:52Z

The downside is that this slightly complicates generic use of codecs. I.e., with your current Codec<T> I can write:

fn do_thing<C, T>(value: T)
where
    C: Codec<T>,
{
    ...
}

With the GAT approach, I need to write:

fn do_thing<C, T>(value: T)
where
    C: Codec,
    C::Encoding<T>: Encoding<T>,
{
    ...
}

Assuming I have some Encoding trait (which does seem reasonable to have).

trait Encoding<T>: Encode<T> + Decode<T> {}
impl<C, T> Encoding<T> for C where C: Encode<T> + Decode<T> {}

vmx · 2024-03-11T19:46:22Z

Thanks @Stebalien for looking into GATs. For me that's exactly what I wanted to try out, but I didn't find how to do it.

Nonetheless, to me it seems like doing the Links as a supertrait is the easier and more straightforward solution to what we want to achieve.

I've pushed to branches to serde_ipld_dagcbor in case you want to see the differences.

Your GAT based version is at https://github.com/ipld/serde_ipld_dagcbor/compare/codec-trait2. Please note that it doesn't even compile due to:

error[E0271]: type mismatch resolving `<<C as Codec>::Encoding<T> as Encode<T>>::Error == <C as Codec>::Error`
  --> tests/codec.rs:49:19
   |
49 |         C::encode(&value).map_err(Into::into).unwrap()
   |         --------- ^^^^^^ expected `ipld_core::codec::Codec::Error`, found `ipld_core::codec::Encode::Error`
   |         |
   |         required by a bound introduced by this call
   |
   = note: expected associated type `<C as ipld_core::codec::Codec>::Error`
              found associated type `<<C as ipld_core::codec::Codec>::Encoding<T> as ipld_core::codec::Encode<T>>::Error`
   = note: an associated type was expected, but a different one was found
note: required by a bound in `ipld_core::codec::Codec::encode`
  --> /home/vmx/.cargo/git/checkouts/rust-ipld-core-6f704dfbd2a35d9b/7b079d0/src/codec.rs:17:38
   |
15 |     fn encode<T>(value: &T) -> Result<Vec<u8>, Self::Error>
   |        ------ required by a bound in this associated function
16 |     where
17 |         Self::Encoding<T>: Encode<T, Error = Self::Error>,
   |                                      ^^^^^^^^^^^^^^^^^^^ required by this bound in `Codec::encode`

I couldn't even be bothered to try to fix it as the other solutions seemed so much nicer.

At https://github.com/ipld/serde_ipld_dagcbor/compare/codec-trait3 is the "Links as supertrait" version. You can see in the test at https://github.com/ipld/serde_ipld_dagcbor/blob/1f520c91ae5505044c355d2b0f44e319ec365fd3/tests/codec.rs#L41-L55 that a generic function can be implement trivially without much trait bounds:
```
fn encode_generic<C, T>(value: T) -> Vec<u8>
where
   C: Codec<T>,
   C::Error: std::fmt::Debug,
{
   C::encode(&value).unwrap()
}
```

So I would go with the "Links as supertrait", wdyt @Stebalien?

Stebalien · 2024-03-12T02:16:42Z

Yeah, Links as a supertrait seems reasonable. I'm still concerned that there isn't a great way to specify a codec for encoding only (e.g., for "view" types), but maybe that's not critical?

Stebalien · 2024-03-12T02:20:56Z

But I have no idea what's going on with that error, it seems like a rust bug.

vmx · 2024-03-12T19:50:37Z

Ok, I've pushed new commits which address:

Links is a supertrait. Its error type is called LinksError so it doesn't clash with the Error of the Codec. That makes function signatures simpler for Codec implementations
Links are now returned as an iterator
encode and decode take a writer/reader and there are default implementations for convenience

I've implemented that new version of the Codec trait for serde_ipld_dagcbor, serde_ipld_dagjson and ipld-dagpb to make sure it actually works as expected (and it does).

With the `Codec` trait it's now possible to have a unified API to encode and decode data across different codecs. Th accompanying `Links` trait allows to extract all the links (CIDs) from some encoded IPLD data.

This way a `Codec` must always implement the `Links` trait as well.

Return in iterator instead of a vector.

Instead of passing in slices and returning vectors, pass in readers and writers instead. Add default implementation for a slice/vector based API.

vmx · 2024-03-14T18:32:57Z

The last push was just a proper rebase (no idea why the automatic didn't work as expected). I'll now squash merge it.

rvagg · 2024-03-15T01:31:41Z

src/codec.rs

+/// easily exchanged or combined.
+pub trait Codec<T>: Links {
+    /// The multicodec code of the IPLD codec.
+    const CODE: u64;


I think we're going to want NAME eventually too, it's pretty helpful. This trait is almost the same as the interfaces we landed on in JS, although we ended up splitting encode and decode operations into two parts and combining them into a single BlockCodec: https://github.com/multiformats/js-multiformats/blob/master/src/codecs/interface.ts

I don't want the name. People should really use the constants as identifiers and not strings that might change.

Splitting encode and decode: I'd keep it simple for now, if the need of splitting ever occurs (which I currently doubt), we can do it then.

vmx requested review from Stebalien and rvagg March 8, 2024 23:10

Stebalien reviewed Mar 8, 2024

View reviewed changes

This comment was marked as off-topic.

Sign in to view

Stebalien approved these changes Mar 14, 2024

View reviewed changes

Base automatically changed from fix-macro-alloc to main March 14, 2024 18:27

vmx added 4 commits March 14, 2024 19:31

feat: add Codec trait

17c3645

With the `Codec` trait it's now possible to have a unified API to encode and decode data across different codecs. Th accompanying `Links` trait allows to extract all the links (CIDs) from some encoded IPLD data.

feat: make Links a super trait of Codec

043b103

This way a `Codec` must always implement the `Links` trait as well.

feat: make links() return an iterator

9a72f87

Return in iterator instead of a vector.

feat: make the Codec trait take a reader/writer

f8bc925

Instead of passing in slices and returning vectors, pass in readers and writers instead. Add default implementation for a slice/vector based API.

vmx force-pushed the codec-trait branch from 726969a to f8bc925 Compare March 14, 2024 18:31

vmx merged commit f7127c7 into main Mar 14, 2024

vmx deleted the codec-trait branch March 14, 2024 18:33

rvagg reviewed Mar 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Codec trait #5

feat: add Codec trait #5

vmx commented Mar 8, 2024

Stebalien commented Mar 8, 2024

Stebalien Mar 8, 2024

Stebalien Mar 8, 2024

vmx Mar 8, 2024

Stebalien Mar 9, 2024

Stebalien Mar 9, 2024

Stebalien Mar 8, 2024

This comment was marked as off-topic.

Stebalien commented Mar 9, 2024

Stebalien commented Mar 9, 2024

Stebalien commented Mar 9, 2024

Stebalien commented Mar 9, 2024

vmx commented Mar 11, 2024

Stebalien commented Mar 12, 2024

Stebalien commented Mar 12, 2024

vmx commented Mar 12, 2024

vmx commented Mar 14, 2024

rvagg Mar 15, 2024 •

edited

Loading

vmx Mar 15, 2024

feat: add Codec trait #5

feat: add Codec trait #5

Conversation

vmx commented Mar 8, 2024

Stebalien commented Mar 8, 2024

Stebalien Mar 8, 2024

Choose a reason for hiding this comment

Stebalien Mar 8, 2024

Choose a reason for hiding this comment

vmx Mar 8, 2024

Choose a reason for hiding this comment

Stebalien Mar 9, 2024

Choose a reason for hiding this comment

Stebalien Mar 9, 2024

Choose a reason for hiding this comment

Stebalien Mar 8, 2024

Choose a reason for hiding this comment

This comment was marked as off-topic.

Stebalien commented Mar 9, 2024

Stebalien commented Mar 9, 2024

Stebalien commented Mar 9, 2024

Stebalien commented Mar 9, 2024

vmx commented Mar 11, 2024

Stebalien commented Mar 12, 2024

Stebalien commented Mar 12, 2024

vmx commented Mar 12, 2024

vmx commented Mar 14, 2024

rvagg Mar 15, 2024 • edited Loading

Choose a reason for hiding this comment

vmx Mar 15, 2024

Choose a reason for hiding this comment

rvagg Mar 15, 2024 •

edited

Loading