-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write a script to get delta of datasets and index them #1990
Comments
Are you rsyncing the original |
Thanks for this info. answers:
Since we are now no longer upgrading metacat "in place" before moving to k8s, we're rsyncing the original
Yes - all we need is the list of pids. I've updated the description above to make this more clear. Thanks |
Additional potentially-useful info, from k8s-cluster-config/MetacatQuickRef.md: API call to list objects in reverse order of modification:
API call to see if an object exists on the k8s instance yet:
To index a list of objects: $ TOKEN=$( kubectl get secret MYRELEASE-indexer-token \
-o jsonpath="{.data.DataONEauthToken}" | base64 -d )
## Assuming pidsToReindex.txt contains a list of the identifiers to be indexed...
$ for pid in $(cat pidsToReindex.txt); do \
curl -X PUT -H "Authorization: Bearer $TOKEN" \
"https://MYHOST/MYCONTEXT/d1/mn/v2/index?pid=$pid"; \
done |
Matthew:
My understanding is that the script only figures out the new added objects.
However, an object needs to be indexed as well even if it is not a newly
added object but its system metadata was changed. So maybe directly looking
up the modification date on the system metadata table is the fastest way.
Jing
…On Tue, Oct 22, 2024 at 9:42 AM Matthew B ***@***.***> wrote:
Additional potentially-useful info, from
k8s-cluster-config/MetacatQuickRef.md
<https://github.nceas.ucsb.edu/NCEAS/k8s-cluster-config/blob/04aa852d0c27b22ad233ccddb21363dd737ecdfd/MetacatQuickRef.md?plain=1#L140>
:
API call to list objects in reverse order of modification:
https://test.arcticdata.io/metacat/d1/mn/v2/object
API call to see if an object exists on the k8s instance yet:
https://arctic-dev.test.dataone.org/metacat/d1/mn/v2/object/<pid>
To index a list of objects:
$ TOKEN=$( kubectl get secret MYRELEASE-indexer-token \ -o jsonpath="{.data.DataONEauthToken}" | base64 -d )
## Assuming pidsToReindex.txt contains a list of the identifiers to be indexed...
$ for pid in $(cat pidsToReindex.txt); do \
curl -X PUT -H "Authorization: Bearer $TOKEN" \
"https://MYHOST/MYCONTEXT/d1/mn/v2/index?pid=$pid"; \
done
—
Reply to this email directly, view it on GitHub
<#1990 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5QQDEVQRROMJHRS6VZRLDZ4Z56XAVCNFSM6AAAAABQLK34PCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRZG43DKNJSHE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
See #1984
For moving deployments from legacy to k8s, we rsync the data to cephfs, then rsync again just before release, in order to get the datasets that were modified after the previous rsync.
To minimize downtime, we want to index this "delta" of datasets (instead of re-indexing all) - so we need need a script that:
pid
of each dataset modified since a given date/time, and thenpid
.The text was updated successfully, but these errors were encountered: