Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incrementally deallocate temp file during reading using fallocate #227

Open
Mortal opened this issue Aug 4, 2017 · 0 comments
Open

Incrementally deallocate temp file during reading using fallocate #227

Mortal opened this issue Aug 4, 2017 · 0 comments

Comments

@Mortal
Copy link
Collaborator

Mortal commented Aug 4, 2017

In Linux one can use fallocate to "punch a hole" in a file, that is, deallocate parts of a large file. Could TPIE use this mechanism to delete temp files as they are read (given that the user does not need to read them again, e.g. in sorting) and thus save temp space?

The fallocate(1) user command says this:

-p, --punch-hole

Deallocates space (i.e., creates a hole) in the byte range starting at offset and continuing for length bytes. Within the specified range, partial filesystem blocks are zeroed, and whole filesystem blocks are removed from the file. After a successful call, subsequent reads from this range will return zeroes. This option may not be specified at the same time as the --zero-range option. Also, when using this option, --keep-size is implied.

Supported for XFS (since Linux 2.6.38), ext4 (since Linux 3.0), Btrfs (since Linux 3.7) and tmpfs (since Linux 3.5).

It sounds like the Thrill framework does this: cs/1608.05634v1 page 7

In Thrill we took pipelining of data processing one step further by enabling consumption of source DIA storage while pushing data to the next operation. DIA operations transform huge data sets, but a naive implementation would read all items from one DIA, push them all into the pipeline for processing, and then deallocate the data storage. Assuming the next operation also stores all items, this requires twice the amount of storage. However, with consume enabled, the preceding DIA operation’s storage is deallocated while processing the items, hence the storage for all items is needed only once, plus a small overlapping buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant