Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid quadratic runtime when merging text nodes #136

Merged
merged 2 commits into from
Jan 20, 2025

Conversation

adamreichold
Copy link
Collaborator

@adamreichold adamreichold commented Jan 19, 2025

With this change, the newly added text stress test/benchmark finishes in about a minute instead of I-did-not-wait-that-long. However, it does also regress the real-world XML

 name              master ns/iter  branch ns/iter  diff ns/iter  diff %  speedup 
 huge_roxmltree    4,952,961       5,098,758            145,797   2.94%   x 0.97 
 large_roxmltree   2,316,864       2,477,716            160,852   6.94%   x 0.94 
 medium_roxmltree  577,389         573,742               -3,647  -0.63%   x 1.01 
 tiny_roxmltree    4,022           4,180                    158   3.93%   x 0.96 

which I am still looking into. I guess the common case of really just one borrowed text got slower.

The real-world XML files are basically flat

 name              master ns/iter  branch ns/iter  diff ns/iter  diff %  speedup 
 huge_roxmltree    4,940,395       4,848,394            -92,001  -1.86%   x 1.02 
 large_roxmltree   2,305,231       2,287,267            -17,964  -0.78%   x 1.01 
 medium_roxmltree  574,506         571,131               -3,375  -0.59%   x 1.01 
 tiny_roxmltree    3,998           4,002                      4   0.10%   x 1.00 

after I compressed the state space and made use of the fact that StringStorage::new_owned always creates a new allocation so that I can keep the original buffer around for if another text node need to be merged while still already leaving a fully formed text in the tree. (Note that these files never exercise the merging code path at all.)

@adamreichold adamreichold force-pushed the after-text-quadratic branch 5 times, most recently from 98f921f to e5b96f2 Compare January 19, 2025 15:21
@RazrFalcon
Copy link
Owner

The code looks fine, but we have to reduce bench files size. It's way too big. Test files should ideally be less than 500KB.
And we should exclude benches from the Cargo package. No need to publish then on crates.io

@adamreichold
Copy link
Collaborator Author

The code looks fine, but we have to reduce bench files size. It's way too big. Test files should ideally be less than 500KB.

Adjusted the generator parameters and the three new files are well below 300 kB now.

And we should exclude benches from the Cargo package. No need to publish then on crates.io

It is already exclude because it is a separate package, you can check by running cargo package -v which prints

warning: ignoring benchmark `xml` as `benches/xml.rs` is not included in the published package

among other things. (I did not exclude the tests as distribution often do like to be able to run tests for packaging.)

@RazrFalcon
Copy link
Owner

Good.

@adamreichold adamreichold merged commit 3b09447 into master Jan 20, 2025
4 checks passed
@adamreichold adamreichold deleted the after-text-quadratic branch January 20, 2025 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants