Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gsoc): subscribe #4727

Open
wants to merge 58 commits into
base: master
Choose a base branch
from
Open

feat(gsoc): subscribe #4727

wants to merge 58 commits into from

Conversation

nugaon
Copy link
Member

@nugaon nugaon commented Jul 17, 2024

This PR is partial implementation of the GSOC SWIP: ethersphere/SWIPs#41
It addresses only to have GSOC subscription feature which neglects GSOC retrieval related functionalities.
This can substitute the PSS feature with lower resource requirements.

Introducing /gsoc/subscribe/{address} endpoint which opens a websocket channel for reading incoming payloads of the Single Owner Chunk under the defined address.

Modifications were required in the core packages so that different payloads of a SOC can be synced.
Identity Address of a chunk is used as a key to distinguish different payloads of a SOC which is the hash of the SOC address and its wrapped address in case of SOC and it remains the CAC address in the case of CAC.

Closes #4333

Checklist

  • I have read the coding guide.
  • My change requires a documentation update, and I have done it.
  • I have added tests to cover my changes.
  • I have filled out the description and linked the related issues.

Description

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):

@nugaon nugaon marked this pull request as ready for review July 18, 2024 15:24
Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this functionality too work,

  • pullsync needs to process each instance of the soc on the same address.

    • if a soc has the same postage sstamp as an earlier one, the node it is pushsynced to might not even save it and therefore will not offer it via pullsync
    • with different postage stamp but same soc address the soc should be saved and offered via pullsync and the downstrream party should request and save it

    -pushsync should never skip pushing the same soc addresss with different payload

    these assumptions need to be doublechecked to be sure the functionality makes sense.
    it shooould also be properly going through a SWIP

pkg/gsoc/gsoc.go Outdated Show resolved Hide resolved
pkg/gsoc/gsoc.go Outdated Show resolved Hide resolved
pkg/gsoc/gsoc.go Outdated Show resolved Hide resolved
pkg/pullsync/pullsync.go Outdated Show resolved Hide resolved
pkg/pushsync/pushsync.go Outdated Show resolved Hide resolved
pkg/pushsync/pushsync_test.go Outdated Show resolved Hide resolved
pkg/api/gsoc_test.go Outdated Show resolved Hide resolved
pkg/api/gsoc_test.go Outdated Show resolved Hide resolved
pkg/storer/internal/reserve/reserve.go Show resolved Hide resolved
pkg/storer/internal/reserve/reserve.go Show resolved Hide resolved
pkg/storer/internal/reserve/reserve.go Show resolved Hide resolved
@@ -21,7 +21,9 @@ var (

// Stamper can issue stamps from the given address of chunk.
type Stamper interface {
Stamp(swarm.Address) (*Stamp, error)
// addr is the request address of the chunk and idAddr is the identity address of the chunk.
Stamp(addr, idAddr swarm.Address) (*Stamp, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have a new Stamp function, maybe called StampSOC that's specific to this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can have the original function definition untouched

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

later it may change to have only one param but with idAddress.
I do not see the reason to complicate this further there should be only one stamp funtion

pkg/pullsync/pullsync.go Outdated Show resolved Hide resolved
pkg/storer/internal/reserve/reserve.go Show resolved Hide resolved
@@ -124,6 +124,8 @@ func (s *Service) socUploadHandler(w http.ResponseWriter, r *http.Request) {
jsonhttp.NotFound(w, "batch with id not found")
case errors.Is(err, errInvalidPostageBatch):
jsonhttp.BadRequest(w, "invalid batch id")
case errors.Is(err, errUnsupportedDevNodeOperation):
jsonhttp.NotImplemented(w, "operation is not supported in dev mode")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this soc upload handler. If you allow for deferred uploads, then you also need to make the identity adress changes in the uploadstore. If you allow for pinning, again the same problem. The pinstore works by chunk adress being unique so lots of changes are needed there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the gsoc subscription PR, it does not care about the retrievals including the pinning.

can you elaborate where should I change things in the uploadstore please?

@istae
Copy link
Member

istae commented Nov 4, 2024

The cachestore is also problematic because it expect that the chunk adress is unique so if you cache one soc then try to fetch the same soc from the network expecting a different payloads, the cache will return the first soc over and over again.

@istae
Copy link
Member

istae commented Nov 4, 2024

What is the plan with retrieving these chunks through the regular Api and not the gsoc handler? As things stands, if you try to retrieve a soc chunk, it's mostly random which version or payload you will get. If the only mechanism for accessing these chunks is through the gsoc handler, then do we actually need to store them in the reserve?

@nugaon
Copy link
Member Author

nugaon commented Nov 4, 2024

This PR is one part of the SWIP ethersphere/SWIPs#41 and only solves the GSOC subscription feature.
In other words, all retrieval related things are neglected in the changes.

Ultimately (in the next GSOC PR), all payloads must be stored and bee should be able to retrieve them.

pkg/soc/utils.go Show resolved Hide resolved
@istae
Copy link
Member

istae commented Nov 4, 2024

Summarizing my comments, for me to approve the PR:

  1. the reserve changes must be reverted (and handled in a future PR that fixes the chunkstore to be able to store multiple payloads of a SOC).
  2. the SOC upload API must only be usable with direct and non-pinned uploads unless the pinstore and uploadstore is fixed to correctly store multiples payloads of a SOC.
  3. my other comments addressed that are not related to the above

@nugaon
Copy link
Member Author

nugaon commented Nov 5, 2024

  1. the reserve changes must be reverted (and handled in a future PR that fixes the chunkstore to be able to store multiple payloads of a SOC).

different payloads of a SOC need to be synced and it is likely that one uploads different payloads with one postage batch so this change is needed.
I will see what I can do with the chunkstore problem in this short time but it is better in this way than deciding what to store in the reserve based on batch ID.

2. the SOC upload API must only be usable with direct and non-pinned uploads unless the pinstore and uploadstore is fixed to correctly store multiples payloads of a SOC.

pinning GSOC is futile, it is used for information signaling. The way how you can request different payloads of a SOC will be handled in a different PR.
It is still unclear for me where and what I should tackle at uploadstore.

3. my other comments addressed that are not related to the above

which ones? : )

@nugaon nugaon requested a review from janos November 5, 2024 12:59
@nugaon
Copy link
Member Author

nugaon commented Nov 7, 2024

  1. the reserve changes must be reverted (and handled in a future PR that fixes the chunkstore to be able to store multiple payloads of a SOC).

chunkstore overwrites the previous GSOC payload since this PR does not address GSOC store.
pullSync will pull the newer payload of SOC. It calls ReserveHas(pkg/pullsync/pullsync.go:295) on offered items where StampHash is part of the ID of BatchRadiusItem -> all different SOC payloads will be stamped differently.

2. the SOC upload API must only be usable with direct and non-pinned uploads unless the pinstore and uploadstore is fixed to correctly store multiples payloads of a SOC.

As mentioned, this PR does not handle multiple GSOC store. that will be introduced in a different PR which will introduce postage stamping breaking change because that is required for multiple GSOC store in order to avoid from infinite store (detailed in the SWIP).
deferred upload is not supported in this phase because multiple GSOC upload in the same time will overwrite for the latest version within one pullsync time and the information signal will be lost.

I hope I addressed all other concerns.

@nugaon
Copy link
Member Author

nugaon commented Nov 11, 2024

@istae please do not force push to the PR and do not commit directly (create another PR based on this).

@istae istae mentioned this pull request Nov 11, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Subscribe to specific SOC address
6 participants