You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The running time of merge sort is determined by the number of runs generated in run generation and the merge fanout d. If the number of runs is in the range (d^k, d^(k+1)], then the merge phase requires k–1 scans of the data, and sorting in total requires k scans. Usually d is roughly 250 and k is less than 4.
In particular, sorting d+1=251 runs has the same number of scans as sorting d²=62500 runs!
Thus, in some cases, we would like to squeeze a little out of run generation to generate fewer runs, whereas other times (N/M=60000, say), the number of scans is unlikely to change even if we can decrease the total number of runs quite a bit.
We might also be interested in tuning the block size so that the fanout is just great enough to decrease k by one. Currently d is capped at 250 in TPIE for more or less unspecified performance reasons. Surely saving an entire scan outweighs slightly decreased I/O performance.
Are any of these ideas useful (increasing d, increasing run length)? Can they be implemented without impacting performance for the case where these efforts are doomed?
For more research on increasing the run length past the memory size, see Michael A. Bender, Samuel McCauley, Andrew McGregor, Shikha Singh and Hoa T. Vu: "Run generation revisited: what comes up, may or may not come down" (presented by Shikha at MASSIVE '15).
The text was updated successfully, but these errors were encountered:
I think the main issue putting d>250 is that we run out of fds on linux if we do several parallel merges. However since this is only an issue for pipe notes we could just treat fds like a resource and allocate them.
I've read most of the paper now and using up replacement selection seems interesting, since the runs generated have expected length 2M on random input. However, it requires we have a buffer, where we can fetch the smallest element greater or equal to the last item fetched. I can't think of a better way than using Red-black search trees. This would make the expected run length for small items alot smaller than the runs we're currently generating in tpie. I'll try to see if i can come up with something better.
The paper also proposes up-down replacement selection, but the algorithms rely on an extra large buffer or extra visibilty.
The running time of merge sort is determined by the number of runs generated in run generation and the merge fanout d. If the number of runs is in the range (d^k, d^(k+1)], then the merge phase requires k–1 scans of the data, and sorting in total requires k scans. Usually d is roughly 250 and k is less than 4.
In particular, sorting d+1=251 runs has the same number of scans as sorting d²=62500 runs!
Thus, in some cases, we would like to squeeze a little out of run generation to generate fewer runs, whereas other times (N/M=60000, say), the number of scans is unlikely to change even if we can decrease the total number of runs quite a bit.
We might also be interested in tuning the block size so that the fanout is just great enough to decrease k by one. Currently d is capped at 250 in TPIE for more or less unspecified performance reasons. Surely saving an entire scan outweighs slightly decreased I/O performance.
Are any of these ideas useful (increasing d, increasing run length)? Can they be implemented without impacting performance for the case where these efforts are doomed?
For more research on increasing the run length past the memory size, see Michael A. Bender, Samuel McCauley, Andrew McGregor, Shikha Singh and Hoa T. Vu: "Run generation revisited: what comes up, may or may not come down" (presented by Shikha at MASSIVE '15).
The text was updated successfully, but these errors were encountered: