Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi-part backup #109

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Support multi-part backup #109

wants to merge 21 commits into from

Conversation

ushitora-anqou
Copy link
Contributor

No description provided.

@ushitora-anqou ushitora-anqou force-pushed the split-diff branch 2 times, most recently from cca83a9 to 0af1089 Compare February 5, 2025 06:18
@ushitora-anqou ushitora-anqou changed the title Devide diff data into small parts Divide diff data into small parts Feb 10, 2025
@ushitora-anqou ushitora-anqou force-pushed the split-diff branch 7 times, most recently from ff16d2d to 055f549 Compare February 14, 2025 08:02
@ushitora-anqou ushitora-anqou force-pushed the split-diff branch 4 times, most recently from 65a1818 to fd22a0f Compare February 19, 2025 02:01
@ushitora-anqou ushitora-anqou marked this pull request as ready for review February 20, 2025 02:19
@ushitora-anqou ushitora-anqou changed the title Divide diff data into small parts Support multi-part backup Feb 20, 2025
TransferPartSize *resource.Quantity `json:"transferPartSize,omitempty"`

// largestCompletedExportPartNum indicates the largest part number
// that has been successfully exported (i.e., `rbd export-diff`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// that has been successfully exported (i.e., `rbd export-diff`).
// that has been successfully exported by `rbd export-diff`.

LargestCompletedUploadPartNum *int `json:"largestCompletedUploadPartNum,omitempty"`

// largestCompletedImportPartNum indicates the largest part number
// that has been successfully imported (i.e., `rbd import-diff`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// that has been successfully imported (i.e., `rbd import-diff`).
// that has been successfully imported by `rbd import-diff`.

labelComponentExportJob,
MantleExportJobPrefix,
func(job *batchv1.Job, partNum int) error {
if backup.Status.LargestCompletedExportPartNum == nil || *backup.Status.LargestCompletedExportPartNum < partNum {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an assertion to verify *backup.Status.LargestCompletedExportPartNum +1 == partNum at the beginning of this block. By this assertion, we can detect the existence of multiple running export jobs for one MantleBackup. It implies serious bugs.

return nil
}

if err := r.patchStatusTransferPartSizeIfEmpty(ctx, targetBackup); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err := r.patchStatusTransferPartSizeIfEmpty(ctx, targetBackup); err != nil {
if err := r.addStatusTransferPartSizeIfEmpty(ctx, targetBackup); err != nil {

add is more suitable for in this case because add explains what this function directly. On the other hand, patch doesn't tell what kind patch will be applied.

return pv.Spec.CSI.VolumeAttributes["pool"], pv.Spec.CSI.VolumeAttributes["imageName"], nil
}

func (r *MantleBackupReconciler) getNumberOfParts(backup *mantlev1.MantleBackup) (int64, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about returining int value rathert than int64 value? The return value seems to be casted to int in all cases.

if backup.Status.LargestCompletedExportPartNum != nil {
partNum = *backup.Status.LargestCompletedExportPartNum
}
return partNum+1 >= int(limit), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't expect partNum+1 > int(limit). So how about adding an assertion to meet partNum + 1 == int(limit)

labelComponentUploadJob,
MantleUploadJobPrefix,
func(job *batchv1.Job, partNum int) error {
if backup.Status.LargestCompletedUploadPartNum == nil || *backup.Status.LargestCompletedUploadPartNum < partNum {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add an assertion similar to the above comment about handleCompletedExportJobs.

if backup.Status.LargestCompletedExportPartNum != nil {
partNum = *backup.Status.LargestCompletedExportPartNum
}
return partNum+1 >= int(limit), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please return true iff partNum + 1 == int(limit) because partNum + 1 > int(limit) implies serious bugs.

labelComponentUploadJob,
MantleUploadJobPrefix,
func(job *batchv1.Job, partNum int) error {
if backup.Status.LargestCompletedUploadPartNum == nil || *backup.Status.LargestCompletedUploadPartNum < partNum {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code might result in regarding all upload jobs has been completed by mistake.

Assuming the following cases.

  1. There is a RBD snapshot S
  2. Mantle divide this into four parts, S0,...,S3.
  3. All export jobs have been finished.
  4. Mantle creates four upload jobs UJ_S0,...,UJ_S3.
  5. Only UJ_S3 has been finished.
  6. Mantle calls handleCompletedUploadJobs(). Then it regards all upload jobs has been finished.

Probably what we need to do is just counting the number of completed jobs and whether the number is equal to the number of created jobs. In other words, status.largestCompletedUploadPartNum is not necessary. WDYT?

return nil
}

if err := r.createOrUpdateUploadJobs(ctx, targetBackup); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createOrUpdateUploadJobs might create multiple jobs. So we get the maximum number of new jobs we can create. We would get this value possibly from canNewUploadJobBeCreted().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants