Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PagedReader performance #19

Open
cugu opened this issue May 15, 2020 · 5 comments
Open

PagedReader performance #19

cugu opened this issue May 15, 2020 · 5 comments

Comments

@cugu
Copy link
Contributor

cugu commented May 15, 2020

I did some testing with extraction of files from NTFS. I figured out, that PagedReader can be a bottleneck if configured incorrectly. A page size of 1024 is slowing down copies of large files.

I'm now using parser.NewPagedReader(r, 1024*1024, 100*1024*1024) which yielded quite good results. I've not thoroughly tested implications on memory though.

Is there any reason for 1024 being used basically everywhere (https://grep.app/search?q=NewPagedReader)?

@scudette
Copy link
Contributor

Do you have some performance numbers? I think in Velociraptor we use 8k blocks https://github.com/Velocidex/velociraptor/blob/0cffe0e9c9ef55db2866fe711511e17c1f0a624a/vql/windows/filesystems/ntfs_windows.go#L225 which may still be too small.

The main purpose of PagedReader is to back the data so the generated parsers can be used efficiently. The parsers all use ReaderAt() which does a seek and read of very small 1-8 bytes and the paged reader stops it going to disk for 8 byte reads. NTFS has a block size of 0x400 so that is where the 1024 is coming from. Each MFT entry has to be read into memory, unfixed and then parsed out into the records.

It might benefit reading more although it depends on access pattern and it is a memory tradeoff. Typically I would expect blocks to be in the file cache anyway so say 8kb reads should only bear the syscall overhead but depending on the OS this might be a lot.

@cugu
Copy link
Contributor Author

cugu commented May 15, 2020

Using this example: https://gist.github.com/cugu/21ee9d1472ca0e2e74e78a3004754a97

I get the following result:

page size cache size buffer size ns comment
16384 1024*102400 4096 1090497
16384 10000 4096 1097598
16384 1024*102400 1024*1024 1337468
16384 10000 1024*1024 1368212
1024*1024 1024*102400 4096 1606172
4096 10000 4096 1664303
1024*1024 10000 1024*1024 1675364
4096 1024*102400 4096 1694694
1024*1024 10000 4096 1718635
1024*1024 1024*102400 1024*1024 1750435
4096 1024*102400 1024*1024 1920238
4096 10000 1024*1024 1966033
1024 1024*102400 4096 3928892
1024 10000 1024*1024 4107346 cat.go default
1024 1024*102400 1024*1024 4171014
1024 10000 4096 4725464

@cugu
Copy link
Contributor Author

cugu commented May 15, 2020

Anyway the example uses the 10MB test.ntfs.dd which is not really realistic. But real world example point into the same direction.

@scudette
Copy link
Contributor

So it looks like the optimal size is 16kb?

@cugu
Copy link
Contributor Author

cugu commented May 16, 2020

Just for the 10MB test image. Larger images and files even perform better at larger sizes. I'm currently using 1MB.

@cugu cugu changed the title PagedReader performace PagedReader performance May 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants