PagedReader performance #19

cugu · 2020-05-15T13:50:14Z

I did some testing with extraction of files from NTFS. I figured out, that PagedReader can be a bottleneck if configured incorrectly. A page size of 1024 is slowing down copies of large files.

I'm now using parser.NewPagedReader(r, 1024*1024, 100*1024*1024) which yielded quite good results. I've not thoroughly tested implications on memory though.

Is there any reason for 1024 being used basically everywhere (https://grep.app/search?q=NewPagedReader)?

The text was updated successfully, but these errors were encountered:

scudette · 2020-05-15T14:00:41Z

Do you have some performance numbers? I think in Velociraptor we use 8k blocks https://github.com/Velocidex/velociraptor/blob/0cffe0e9c9ef55db2866fe711511e17c1f0a624a/vql/windows/filesystems/ntfs_windows.go#L225 which may still be too small.

The main purpose of PagedReader is to back the data so the generated parsers can be used efficiently. The parsers all use ReaderAt() which does a seek and read of very small 1-8 bytes and the paged reader stops it going to disk for 8 byte reads. NTFS has a block size of 0x400 so that is where the 1024 is coming from. Each MFT entry has to be read into memory, unfixed and then parsed out into the records.

It might benefit reading more although it depends on access pattern and it is a memory tradeoff. Typically I would expect blocks to be in the file cache anyway so say 8kb reads should only bear the syscall overhead but depending on the OS this might be a lot.

cugu · 2020-05-15T14:22:52Z

Using this example: https://gist.github.com/cugu/21ee9d1472ca0e2e74e78a3004754a97

I get the following result:

page size	cache size	buffer size	ns	comment
16384	1024*102400	4096	1090497
16384	10000	4096	1097598
16384	1024*102400	1024*1024	1337468
16384	10000	1024*1024	1368212
1024*1024	1024*102400	4096	1606172
4096	10000	4096	1664303
1024*1024	10000	1024*1024	1675364
4096	1024*102400	4096	1694694
1024*1024	10000	4096	1718635
1024*1024	1024*102400	1024*1024	1750435
4096	1024*102400	1024*1024	1920238
4096	10000	1024*1024	1966033
1024	1024*102400	4096	3928892
1024	10000	1024*1024	4107346	cat.go default
1024	1024*102400	1024*1024	4171014
1024	10000	4096	4725464

cugu · 2020-05-15T14:24:17Z

Anyway the example uses the 10MB test.ntfs.dd which is not really realistic. But real world example point into the same direction.

scudette · 2020-05-16T13:38:48Z

So it looks like the optimal size is 16kb?

cugu · 2020-05-16T13:44:43Z

Just for the 10MB test image. Larger images and files even perform better at larger sizes. I'm currently using 1MB.

cugu changed the title ~~PagedReader performace~~ PagedReader performance May 16, 2020

steve-offutt mentioned this issue Jul 28, 2021

Config paged reader forensicanalysis/fslib#85

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PagedReader performance #19

PagedReader performance #19

cugu commented May 15, 2020

scudette commented May 15, 2020

cugu commented May 15, 2020 •

edited

Loading

cugu commented May 15, 2020

scudette commented May 16, 2020

cugu commented May 16, 2020

PagedReader performance #19

PagedReader performance #19

Comments

cugu commented May 15, 2020

scudette commented May 15, 2020

cugu commented May 15, 2020 • edited Loading

cugu commented May 15, 2020

scudette commented May 16, 2020

cugu commented May 16, 2020

cugu commented May 15, 2020 •

edited

Loading