Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(volumes): Resize (shrink) EBS volumes #2466

Merged
merged 25 commits into from
Feb 9, 2025
Merged

Conversation

adityahase
Copy link
Member

EBS volume cannot be shrunk, only extended. One way to work around this restriction is to copy data to a new (smaller) volume and throw away the old (larger) volume.

This tool automates that procedure.

The maximum possible IO performance on EBS (gp3)

  1. 1000 MB/s throughput
  2. Large (256k) IO size (as long as they're sequential)

fio can read/write at this throughput, with --ioengine=libaio, --bs=256k and --iodepth=8. But cp can barely manage ~250 MB/s.

I'll need to find faster ways, I'm considering

  1. Parallelize dd (with skip and count) and copy the partition as is
  2. Convert the volumes in RAID 1, and get RAID to copy the data online

Based on Virtual Machine Migration and Agent Update
New volume should be ~85% full after copying files
On single volume machines we assumed operations on volumes[0].

This is no longer true on machine with data volumes.
Since we'll be copying data after taking down everything. We want to copy data as fast as possible. gp3 allows max throughput of 1000 MB/s.

EBS tries to merge sequential read / writes to 256KB blocks. So best case IOPS scenario is (1000 MB/s) / (256KB) = 4000 iops.

We'll allow 20% buffer for inefficiencies.

Reference: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-io-characteristics.html

TODO: Adjust these numbers based on actual results.
Remove filesystem and volume child tables.

Track before and after status of the filesystem and volume in specific fields

This assumes that each volume will have only one filesystem.
Collect UUID of new volume from output of mkfs
Unmount both volumes
Replace UUID of old filesystem with new
Unmount both volumes
Replace UUID of old filesystem with new
Reduce performance of new volume (This should work since we haven't made any modification after creating the volume)

Delete old volume (We already have a snapshot of this volume)
@adityahase adityahase merged commit 49967ed into master Feb 9, 2025
5 checks passed
@adityahase adityahase deleted the feat-disk-resize branch February 9, 2025 07:53
Copy link

codecov bot commented Feb 9, 2025

Codecov Report

Attention: Patch coverage is 1.87166% with 367 lines in your changes missing coverage. Please review.

Project coverage is 36.91%. Comparing base (7438acb) to head (21b4ed1).
Report is 26 commits behind head on master.

Files with missing lines Patch % Lines
...doctype/virtual_disk_resize/virtual_disk_resize.py 0.00% 330 Missing ⚠️
...s/press/doctype/virtual_machine/virtual_machine.py 15.62% 27 Missing ⚠️
press/press/doctype/server/server.py 25.00% 6 Missing ⚠️
...g_recommendation/aws_rightsizing_recommendation.py 0.00% 4 Missing ⚠️

❌ Your patch status has failed because the patch coverage (1.87%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2466      +/-   ##
==========================================
- Coverage   37.32%   36.91%   -0.41%     
==========================================
  Files         402      403       +1     
  Lines       31665    32029     +364     
==========================================
+ Hits        11818    11823       +5     
- Misses      19847    20206     +359     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant