Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure Path _str takes UTF-8 String fast paths #72

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

saagarjha
Copy link

Some clients (cough cough Xcode) like to create Paths out of bridged NSStrings, which cannot be efficiently iterated. As most of our operations require doing this frequently, we might as well convert on construction to avoid the -characterAtIndex: slow path.

Copy link
Collaborator

@jakepetroules jakepetroules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea - I think this is likely the right choice, but we should try to get some evidence to confirm that this will improve overall performance.

Could you try running the performance tests and report back how the numbers look before and after this change?

@saagarjha
Copy link
Author

Which tests would you recommend? I looked at PathPerfTests but those all pass native Swift Strings into the constructor, so I wouldn't expect any major changes there. Would you like me to add a new test?

@aciidgh
Copy link
Collaborator

aciidgh commented Feb 3, 2025

Adding a new perf test for this will be great!

@mirza-garibovic
Copy link
Contributor

Client strings typically arrive via IPC and get constructed by SWBuild so I'm not sure if we really suffer from NSString bridging; did you profile a real build and see time spent there? By the way, there's some progress in Path towards migrating to System.FilePath as the underlying representation, I'm curious what performance impact that would have.

@saagarjha
Copy link
Author

Yes, for the Apple people here please take a look at FB11698739 (Path._isNormalized should be called less often, made more efficient, or cached)

@jmah
Copy link

jmah commented Feb 4, 2025

like to create Paths out of bridged NSStrings, which cannot be efficiently iterated.

They can be efficiently iterated but require more context than the index to "utf8 char at index" function provides.
Expanding this up-front will likely cost a significant amount of memory that the NSString subclass was introduced
to avoid.

This absolutely could be improved but with a deeper analysis of how to do this resiliently without causing a large
memory or time regression

@saagarjha
Copy link
Author

I've added a performance test that operates on NSPathStore2 (at least on macOS). This change makes it run about 9x faster. I'm not entirely sure what the point of measure is given it discards its timing information but I just copied whatever the other tests were doing (but running my own performance analysis).

@mirza-garibovic
Copy link
Contributor

Just noting that this problem is why we use ByteString in a bunch of places.

WRT the PR, the cost to makeContiguousUTF8 can be higher than the win depending on downstream usage, so it isn't straight forward to estimate what the impact is in a real build. The PR performance test can show it as a win or loss depending on how much work it does after the init, and it isn't representative of a real build.

I had a chat with @Catfish-Man about it and I came away thinking that this is a real problem but putting makeContiguousUTF8 here or in other central spots isn't quite the right thing, instead we want to find and eradicate NSStrings at the source. It would actually be interesting to sprinkle assert(str.isContiguousUTF8) around and watch where it asserts.

Why do we have NSStrings or non-contiguous/non-UTF8 strings? I think one reason could be that we use Foundation/NS APIs to read or decode data, and we should replace those with native APIs. @Catfish-Man also mentioned that random innocent looking String initializers also will give you non-contiguous/non-UTF8 strings, e.g. String(contentsOf:) does, but String(contentsOf:encoding:) does not IFF the encoding is ASCII or UTF8.

@saagarjha what do you think about that approach? Instead of converting, assert and find them at the source, then replace with native contiguous UTF8-producing APIs?

@mirza-garibovic
Copy link
Contributor

Btw I also did some work in fnmatch.swift a few months ago because straight-forward usage of Swift strings results in a bunch of intermediate allocations and unexpected NS/CF calls under the hood. Probably that suffered from the same problem of getting NSStrings as inputs.

@saagarjha
Copy link
Author

I’d be fine with that as long as it does get tracked down and fixed. I believe many of these are coming from Xcode itself, which I can’t actually make changes in, so someone else would have to make the fixes there.

@mirza-garibovic
Copy link
Contributor

@saagarjha the good thing is that SWBBuildService normally runs out of process from Xcode, and Xcode sends requests over IPC, so we don't actually take any NSString objects from Xcode and it's within our control how Strings are constructed and represented. I bet we have our own code paths (maybe something like extension PropertyListItem: Decodable?) that might be constructing NSString-based Strings.

@jakepetroules
Copy link
Collaborator

Property list and JSON decoding is one area where I definitely expect we're getting NSStrings back.

@mirza-garibovic
Copy link
Contributor

@saagarjha what do you think about that? If you have time and want to pursue this, for example you could change your PR to have Path.init(String) assert(s.isContiguousUTF8), run all the tests, break on that assertion and trace the String instance to where it's constructed, then either find an alternative way to produce that String directly as contiguous UTF8 or run makeContiguousUTF8 right after it's produced so that no-one downstream ever sees the NSString.

@saagarjha
Copy link
Author

Sure, that sounds reasonable. I'll take a look.

@saagarjha
Copy link
Author

These are the ones I could find. I chose to keep the assert because it should help prevent new clients from accidentally reintroducing this issue.

@mirza-garibovic
Copy link
Contributor

Awesome! Cleaner than I expected.

@mirza-garibovic
Copy link
Contributor

@swift-ci test

@saagarjha
Copy link
Author

Hmm, looks like I must have missed something. Let me take a look…

@saagarjha
Copy link
Author

Hmm, I can't reproduce the test failure. Is there a way to find out which test is failing?

@mirza-garibovic
Copy link
Contributor

@shahmishal @owenv, can we get result bundles or crash logs out of swift ci? This console log can't identify which test is hitting the assertion https://ci.swift.org/job/pr-swift-build-macos/110/

@neonichu
Copy link
Collaborator

Don't think we have a way to get crash logs from swift-ci, that's probably the only way to see which test triggered the assertion.

@neonichu
Copy link
Collaborator

BTW, my random guess would be that ValidationTests.selfBuild() might be the thing that triggers the assertion since CI uses an older Xcode which may have slightly different access patterns?

@mirza-garibovic
Copy link
Contributor

I can't reproduce either. @saagarjha you can land without the assert.

Path relies on fast iteration of its backing _str for most of its
operations. To avoid the performance hit of repeatedly calling
-characterAtIndex: when constructed with an unfortunate string, update
clients to avoid doing this.
@saagarjha
Copy link
Author

Ok, I've removed the assert.

@mirza-garibovic
Copy link
Contributor

@swift-ci test

@@ -737,7 +737,11 @@ class LocalFS: FSProxy, @unchecked Sendable {

// FIXME: This enumerator has unclear error handling. Does nextObject() return nil on error? That's wrong.
return try enumerator.compactMap { next in
let nextPath = path.join(next as? String)
// FileManager.DirectoryEnumerator produces Strings that are backed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it any different if we use the URL API instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saagarjha I'm good to land as is and track this as a future issue for someone to explore, but want to run by you to see if you agree or want to look at this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's up to you. I did some basic tests and it seems like this API might actually do the conversions for us but I need to investigate it further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, if it's up to me :) then I'm not in a rush to merge, let me know what you find here

@jakepetroules
Copy link
Collaborator

@swift-ci test

@aciidgh
Copy link
Collaborator

aciidgh commented Feb 13, 2025

@mirza-garibovic is this ready to land? Spoke with @mirza-garibovic offline, seems like we're still iterating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants