-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PagedReader performance #19
Comments
Do you have some performance numbers? I think in Velociraptor we use 8k blocks https://github.com/Velocidex/velociraptor/blob/0cffe0e9c9ef55db2866fe711511e17c1f0a624a/vql/windows/filesystems/ntfs_windows.go#L225 which may still be too small. The main purpose of PagedReader is to back the data so the generated parsers can be used efficiently. The parsers all use ReaderAt() which does a seek and read of very small 1-8 bytes and the paged reader stops it going to disk for 8 byte reads. NTFS has a block size of 0x400 so that is where the 1024 is coming from. Each MFT entry has to be read into memory, unfixed and then parsed out into the records. It might benefit reading more although it depends on access pattern and it is a memory tradeoff. Typically I would expect blocks to be in the file cache anyway so say 8kb reads should only bear the syscall overhead but depending on the OS this might be a lot. |
Using this example: https://gist.github.com/cugu/21ee9d1472ca0e2e74e78a3004754a97 I get the following result:
|
Anyway the example uses the 10MB test.ntfs.dd which is not really realistic. But real world example point into the same direction. |
So it looks like the optimal size is 16kb? |
Just for the 10MB test image. Larger images and files even perform better at larger sizes. I'm currently using 1MB. |
I did some testing with extraction of files from NTFS. I figured out, that
PagedReader
can be a bottleneck if configured incorrectly. A page size of 1024 is slowing down copies of large files.I'm now using
parser.NewPagedReader(r, 1024*1024, 100*1024*1024)
which yielded quite good results. I've not thoroughly tested implications on memory though.Is there any reason for 1024 being used basically everywhere (https://grep.app/search?q=NewPagedReader)?
The text was updated successfully, but these errors were encountered: