-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize cross reference object offset validation by avoiding nested loop #935
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,7 +49,7 @@ internal CrossReferenceTable(CrossReferenceType type, IReadOnlyDictionary<Indire | |
Trailer = trailer ?? throw new ArgumentNullException(nameof(trailer)); | ||
CrossReferenceOffsets = crossReferenceOffsets ?? throw new ArgumentNullException(nameof(crossReferenceOffsets)); | ||
|
||
var result = new Dictionary<IndirectReference, long>(); | ||
var result = new Dictionary<IndirectReference, long>(capacity: objectOffsets.Count); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fly-by optimization since we're just copying the dictionary. |
||
foreach (var objectOffset in objectOffsets) | ||
{ | ||
result[objectOffset.Key] = objectOffset.Value; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,11 +24,12 @@ public static bool ValidateCrossReferenceOffsets(IInputBytes bytes, CrossReferen | |
return true; | ||
} | ||
|
||
var builderOffsets = new Dictionary<IndirectReference, long>(); | ||
|
||
var bruteForceOffsets = BruteForceSearcher.GetObjectLocations(bytes); | ||
if (bruteForceOffsets.Count > 0) | ||
{ | ||
// Pre-allocate capacity for at least the bruteForceOffsets, since we'll be adding all of them | ||
var builderOffsets = new Dictionary<IndirectReference, long>(bruteForceOffsets.Count); | ||
|
||
// find all object streams | ||
foreach (var entry in crossReferenceTable.ObjectOffsets) | ||
{ | ||
|
@@ -39,11 +40,11 @@ public static bool ValidateCrossReferenceOffsets(IInputBytes bytes, CrossReferen | |
// TODO: more validation of streams. | ||
builderOffsets[entry.Key] = entry.Value; | ||
} | ||
} | ||
|
||
foreach (var item in bruteForceOffsets) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moving this loo outside the foreach loop that starts on 33 was the real win. As far as I can tell, the two loops are independent, and we're still maintaining the order such that bruteForceOffsets are added after object offsets and thus could override them in builderOffsets. I don't see a reason why this needs to be an inner loop. |
||
{ | ||
builderOffsets[item.Key] = item.Value; | ||
} | ||
foreach (var item in bruteForceOffsets) | ||
{ | ||
builderOffsets[item.Key] = item.Value; | ||
} | ||
|
||
actualOffsets = builderOffsets; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing that dictionary insertion was the bottleneck initially made me think that Equals boxing might be the issue so I added IEquatable. However, this didn't make a noticeable difference in my particular case. Given how frequently this type is used as a dictionary key I think it is worth keeping (I notice that other PdfPig types do have this).