Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize cross reference object offset validation by avoiding nested loop #935

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions src/UglyToad.PdfPig.Core/IndirectReference.cs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
/// <summary>
/// Used to uniquely identify and refer to objects in the PDF file.
/// </summary>
public readonly struct IndirectReference
public readonly struct IndirectReference : IEquatable<IndirectReference>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing that dictionary insertion was the bottleneck initially made me think that Equals boxing might be the issue so I added IEquatable. However, this didn't make a noticeable difference in my particular case. Given how frequently this type is used as a dictionary key I think it is worth keeping (I notice that other PdfPig types do have this).

{
/// <summary>
/// A positive integer object number.
Expand All @@ -31,15 +31,15 @@ public IndirectReference(long objectNumber, int generation)
}

/// <inheritdoc />
public override bool Equals(object obj)
public bool Equals(IndirectReference other)
{
if (obj is IndirectReference reference)
{
return reference.ObjectNumber == ObjectNumber
&& reference.Generation == Generation;
}
return other.ObjectNumber == ObjectNumber && other.Generation == Generation;
}

return false;
/// <inheritdoc />
public override bool Equals(object obj)
{
return obj is IndirectReference other && Equals(other);
}

/// <inheritdoc />
Expand Down
2 changes: 1 addition & 1 deletion src/UglyToad.PdfPig/CrossReference/CrossReferenceTable.cs
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ internal CrossReferenceTable(CrossReferenceType type, IReadOnlyDictionary<Indire
Trailer = trailer ?? throw new ArgumentNullException(nameof(trailer));
CrossReferenceOffsets = crossReferenceOffsets ?? throw new ArgumentNullException(nameof(crossReferenceOffsets));

var result = new Dictionary<IndirectReference, long>();
var result = new Dictionary<IndirectReference, long>(capacity: objectOffsets.Count);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fly-by optimization since we're just copying the dictionary.

foreach (var objectOffset in objectOffsets)
{
result[objectOffset.Key] = objectOffset.Value;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,12 @@ public static bool ValidateCrossReferenceOffsets(IInputBytes bytes, CrossReferen
return true;
}

var builderOffsets = new Dictionary<IndirectReference, long>();

var bruteForceOffsets = BruteForceSearcher.GetObjectLocations(bytes);
if (bruteForceOffsets.Count > 0)
{
// Pre-allocate capacity for at least the bruteForceOffsets, since we'll be adding all of them
var builderOffsets = new Dictionary<IndirectReference, long>(bruteForceOffsets.Count);

// find all object streams
foreach (var entry in crossReferenceTable.ObjectOffsets)
{
Expand All @@ -39,11 +40,11 @@ public static bool ValidateCrossReferenceOffsets(IInputBytes bytes, CrossReferen
// TODO: more validation of streams.
builderOffsets[entry.Key] = entry.Value;
}
}

foreach (var item in bruteForceOffsets)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this loo outside the foreach loop that starts on 33 was the real win. As far as I can tell, the two loops are independent, and we're still maintaining the order such that bruteForceOffsets are added after object offsets and thus could override them in builderOffsets.

I don't see a reason why this needs to be an inner loop.

{
builderOffsets[item.Key] = item.Value;
}
foreach (var item in bruteForceOffsets)
{
builderOffsets[item.Key] = item.Value;
}

actualOffsets = builderOffsets;
Expand Down