You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
git-annex makes it difficult to use multiple annexes, meaning the fork-and-pull-request model of contribution is awkward: a contributor would need to find their own annex hosting and temporarily add it (globally!) to the repo's git-annex metadata, and then the person who accepts the PR would need to remember to git annex copy --from contributor; git annex copy --to amazon everything.
So, we're going to cut a corner: we will grant contributors access to our S3 bucket directly. They'll still have to do a PR, but the PR will have already written to the S3 annex.
This thread will document how to go about doing this, and should become a part of #1.
Permissions
We don't want to grant full access, so the process needs to at least grant a restricted access token to our users.
Moreover, ideally our data would be write-once, since we want to archive it for SCIENCE; luckily it looks like recently S3 has gained full support for this; I am unsure how to integrate it with git-annex, though: to use this, you MUST turn it on at bucket creation time (e.g. awscli s3api create-bucket --bucket data2--spine-generic--neuropoly --object-lock-enabled-for-bucket --create-bucket-configuration LocationConstraint=ca-central-1), except I'm pretty sure git-annex insists on creating the bucket. So maybe there's going to be some problems there.
Except that some files that need further editing during review will end up leaving detritus all over the bucket. It won't be much but it'll be some. We can counteract that by writing scripts to compare what's in the repo with what's in the bucket to find orphaned files, and then use an account with extra permissions to clean out those files.
The text was updated successfully, but these errors were encountered:
git-annex makes it difficult to use multiple annexes, meaning the fork-and-pull-request model of contribution is awkward: a contributor would need to find their own annex hosting and temporarily add it (globally!) to the repo's
git-annex
metadata, and then the person who accepts the PR would need to remember togit annex copy --from contributor; git annex copy --to amazon
everything.So, we're going to cut a corner: we will grant contributors access to our S3 bucket directly. They'll still have to do a PR, but the PR will have already written to the S3 annex.
This thread will document how to go about doing this, and should become a part of #1.
Permissions
We don't want to grant full access, so the process needs to at least grant a restricted access token to our users.
Moreover, ideally our data would be write-once, since we want to archive it for SCIENCE; luckily it looks like recently S3 has gained full support for this; I am unsure how to integrate it with git-annex, though: to use this, you MUST turn it on at bucket creation time (e.g.
awscli s3api create-bucket --bucket data2--spine-generic--neuropoly --object-lock-enabled-for-bucket --create-bucket-configuration LocationConstraint=ca-central-1
), except I'm pretty sure git-annex insists on creating the bucket. So maybe there's going to be some problems there.Except that some files that need further editing during review will end up leaving detritus all over the bucket. It won't be much but it'll be some. We can counteract that by writing scripts to compare what's in the repo with what's in the bucket to find orphaned files, and then use an account with extra permissions to clean out those files.
The text was updated successfully, but these errors were encountered: