-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Introduce an atomic bulk append RPC function #264
base: master
Are you sure you want to change the base?
Conversation
Okay I agree atomic writes is something we should support. Answers:
A Problem: if you called publishAll with multiple message contents, in cases where the second message needs to refer to the first, you'll need to know the hash of the first message, because it goes in the content of the second message, but you don't know that until it's signed. So you can't do gatherings/create and invite attendees in one go. (a better solution would be if gatherings supported creating the invite and editing within the same initial message) |
an option though, would be to create the messages one at a time, without publishing them, and then pass them all to add in one go. |
Thanks @dominictarr :) I see your point regarding one of the use cases I gave where you have to link to message in the bulk appends. The use case I'm working towards doesn't require this (since the messages I'm posting in bulk have an 'id' field and a 'part' field that describes how they relate to each other.) Do you think I should try and support this use case (where you create the messages one at a time without publishing, then passing them in one go to allow you to link to a message that hasn't been published yet)? I think it'd be quite tricky / dangerous ... Or would you be happy to merge this pull request in its current form (but with a lot more tests)? |
I think an atomic publish would be great. Referencing them would be tricky, but I think that's a trade-off: you can publish lots of atomic messages, but they probably shouldn't refer to each other by their sigil hash (because you can't know what it will be up-front). I'm always down for more tests, but I don't dislike the direction of this change. |
I'm in favor of bulk append, but I do think it's a database feature, so support should be implemented in the database layer, not the server/application layer. It will be more potentially useful and also easier to test that way. |
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? |
Hey @Happy0 is this something you're still working on? I'm trying to reduce the number of open PRs and would love to merge if we can. |
This pull request introduces a
publishAll
RPC function which accepts an array of messages to be appended to the log atomically. Either all messages are written successfully, or none are written.Under the hood,
flume
'sappend
function already uses level DB'sbatch
function under the hood which writes the messages atomically. This pull request just exposes that functionality further up the stack.In level-db: https://github.com/Level/levelup#batch
In flume: https://github.com/flumedb/flumelog-level/blob/37edf4a94ceaad88b3bb3bf436517c185449b059/index.js#L25
I still have a lot of tests to write, but I thought I'd open this pull request early for discussion in case anyone has any comments / recommendations about my overall approach (or any reason we shouldn't have this feature.)
Motivation
Sometimes it is desirable to append many messages at once atomically. For example, if an application occasionally persists data to the log that exceeds the individual message size, it might split the message into parts (as an alternative to using the blob system.) In such cases, it can be tricky to recover from failure if only some of the writes succeed (due to a power failure or the application being closed non-gracefully.) This is something that I'm doing in the
scuttlebutt-akka-persistence
plugin that I've been building as part of my work for Openlaw and I'd like to be able to publish the parts to the log atomically (as recovering from having published a subset of the parts would be tricky.)An additional use case is being able to publish multiple related messages at the same time atomically. For example (off the top of my head) - 'create gathering' and 'add invitees' messages.
Questions
This change calls
timestamp()
to generate a monotonic timestamp for each message in the batch. This is because the current indexes (such asindexes/feed
) assume an increasing timestamp. Philosophically, these messages are really written at the same time. Is this approach okay? Or do we want to change the indexing?What is the difference between
sbot.add
andsbot.publish
? Do we need an equivalent forpublishAll
?What is
db.post
used for?ssb-validate
The required changes in ssb-validate can be found here: ssbc/ssb-validate#17
TODO