Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a small files compactor for Minio #577

Open
MrCreosote opened this issue Apr 10, 2022 · 1 comment
Open

Add a small files compactor for Minio #577

MrCreosote opened this issue Apr 10, 2022 · 1 comment

Comments

@MrCreosote
Copy link
Member

MrCreosote commented Apr 10, 2022

The workspace currently can save all object data in Minio, which saves each file individually, even if they're very small (this is not the case for GridFS).

Add a single process, single thread, standalone file compactor that periodically

  • Scans for files under some size (say 50KB)
  • When it finds enough documents to make a compacted file of some max size (say 1MB) or just enough files (say 100) it will:
  • Make a checkpoint in a special mongo collection that records the files to be compacted, their order, and their sizes and the target filename
  • Compact the files into a single file in Minio
  • Update the checkpoint state with the new file information
  • Update the records in the workspace s3 collection that point to the old non-compacted files to point to the new compacted file and add their offsets
  • Update the checkpoint state
  • Delete the old non-compacted files
  • Delete the checkpoint

If the compaction produces a file smaller than the 1MB size, that document should be first in the next compaction.

The workspace will need to be updated to take the file offsets into account before the compactor is ever run.

This makes file deletion more complicated since more objects depend on the same file, but since deletion isn't supported yet...

If the compactor starts and finds a checkpoint:

  • If the file has not yet been completely written delete the file and the checkpoint.
  • Otherwise, continue the compaction from based on the checkpoint state

Open question - how do we want to monitor the compactor?

@MrCreosote
Copy link
Member Author

One gotcha - would this prevent Minio side JSON parsing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant