-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple TMPDIRs results in watchman exiting and restarting, multi-second pauses #5817
Comments
Does it get faster after running If you want to do more profiling, you could try doing that after clearing your OS's file system caches. I don't know how to do that, but someone on the internet probably does. |
Thanks for the quick response. This issue has occurred often enough I'm certain there's some real effect, and because I only see it with jj my suspicions point here. However I cannot reproduce it on demand, yet. I suspect that tracking the times it occurs would be the most useful thing, at this point, before doing more profiling. I don't have time for this today, but next week I should. I have questions around system state and also whether watchman is a factor, whether it occurs in every repo, whether I have something misconfugured, etc. |
I realized I could check watchman logs. I have watchman configured in my user configuration, yet the watchman state only applies to two repositories, even when I run commands in more than that. {
"watched": [
{
"triggers": [],
"path": "$HOME/$PROJECT_1_DIR"
},
{
"triggers": [],
"path": "$HOME/$PROJECT_2_DIR"
}
],
"version": "2025.01.06.00"
} From the logs, it looks like launchd is sending a 2025-02-27T15:01:53,960: [io 5528110328 $HOME/$PROJECT_1_DIR] crawl complete
Terminating due to signal 15 Terminated generated by pid=1 uid=0 (0)
0 watchman 0x0000000102fd60ac _ZL13crash_handleriP9__siginfoPv + 664
1 libsystem_platform.dylib 0x000000018a9e7a24 _sigtramp + 56
2 libsystem_pthread.dylib 0x000000018a9bb49c _pthread_join + 608
3 libc++.1.0.dylib 0x0000000106a89ad4 _ZNSt3__16thread4joinEv + 32
4 watchman 0x0000000102ff27cc _Z16w_start_listenerv + 1196
5 watchman 0x0000000102ffc4c4 _ZL11run_serviceON8watchman11ProcessLock6HandleEi + 628
6 watchman 0x0000000102ff8a08 _ZL25run_service_in_foregroundv + 244
7 watchman 0x0000000102ff81f0 main + 5988
8 dyld 0x000000018a63d0e0 start + 2360
2025-02-27T15:05:01,580: [sanitychecks] done with sanityCheckThread
2025-02-27T15:05:04,567: [listener] 2 roots were still live at exit
2025-02-27T15:05:04,567: [listener] Exiting from service with res=true
2025-02-27T15:08:12,635: [listener] Watchman 2025.01.06.00 <no build info set> starting up on db-macminim1-macos.local by command /sbin/launchd
2025-02-27T15:08:12,635: [listener] path $HOME/$PROJECT_1_DIR is on filesystem type apfs
2025-02-27T15:08:12,635: [listener] failed to use watcher eden: apfs is not a FUSE file system.
2025-02-27T15:08:12,635: [listener] failed to use watcher kqueue+fsevents: Not using the kqueue+fsevents watcher as the "prefer_split_fsevents_watcher" config isn't set.
2025-02-27T15:08:12,635: [listener] root $HOME/$PROJECT_1_DIR using watcher mechanism fsevents (auto was requested)
2025-02-27T15:08:12,636: [listener] path $HOME/$PROJECT_2_DIR is on filesystem type apfs
2025-02-27T15:08:12,636: [listener] failed to use watcher eden: apfs is not a FUSE file system.
2025-02-27T15:08:12,636: [listener] failed to use watcher kqueue+fsevents: Not using the kqueue+fsevents watcher as the "prefer_split_fsevents_watcher" config isn't set.
2025-02-27T15:08:12,636: [listener] root $HOME/$PROJECT_2_DIR using watcher mechanism fsevents (auto was requested)
2025-02-27T15:08:12,636: [listener] file limit is 2560 kern.maxfilesperproc=61440
2025-02-27T15:08:12,636: [listener] raised file limit to 61440
2025-02-27T15:08:12,636: [sanitychecks] starting sanityCheckThread
2025-02-27T15:08:13,793: [io 4906305592 $HOME/$PROJECT_2_DIR] PERF: {"user_time": 0.10391400000000001, "start_time": 1740668892.636652, "elapsed_time": 1.157162, "version": "2025.01.06.00", "system_time": 0.60784899999999997, "meta": {"root": {"recrawl_count": 0, "case_sensitive": false, "watcher": "fsevents", "path": "$HOME/$PROJECT_2_DIR"}}, "pid": 66517, "description": "full-crawl"}
2025-02-27T15:08:13,793: [client=2:stm=105553176075904:pid=66506] PERF: {"user_time": 0.103648, "start_time": 1740668892.6416321, "elapsed_time": 1.152185, "version": "2025.01.06.00", "system_time": 0.60607900000000003, "meta": {"client": {"pid": 66506}, "args": ["watch-project", "$HOME/$PROJECT_2_DIR"], "root": {"recrawl_count": 0, "case_sensitive": false, "watcher": "fsevents", "path": "$HOME/$PROJECT_2_DIR"}}, "pid": 66517, "description": "dispatch_command:watch-project"}
2025-02-27T15:08:13,793: [io 4906305592 $HOME/$PROJECT_2_DIR] crawl complete
2025-02-27T15:08:17,927: [io 4905281912 $HOME/$PROJECT_1_DIR] PERF: {"user_time": 0.34954600000000002, "start_time": 1740668892.6362569, "elapsed_time": 5.2908799999999996, "version": "2025.01.06.00", "system_time": 1.9613119999999999, "meta": {"root": {"recrawl_count": 0, "case_sensitive": false, "watcher": "fsevents", "path": "$HOME/$PROJECT_1_DIR"}}, "pid": 66517, "description": "full-crawl"}
2025-02-27T15:08:17,927: [io 4905281912 $HOME/$PROJECT_1_DIR] crawl complete This happened to be quick to do; I'll investigate further, later. |
Aha, I can reproduce! Sort of… Here's an example.
I think the pause is caused by restarting watchman? So the issue is that watchman is exiting, with an error that suggests it's receiving a SIGTERM. And With this repro, I can confirm that after running |
OK I've worked out what is going on. I have two copies of watchman running, because watchman as configured through nix puts the state file in a temporary directory, and the base for temporary directories differs in a default shell and in a nix shell. But they can't run at the same time, and so when I run the command from different types of shell (i.e in different tmux panes), the new watchman sends a SIGTERM to the old. I skimmed the jujutsu code to see how watchman is called. It looks like (The > bat /tmp/db-state/state
───────┬────────────────────────────────────────────────────────────────────────
│ File: /tmp/db-state/state
───────┼────────────────────────────────────────────────────────────────────────
1 │ {
2 │ "watched": [
3 │ {
4 │ "triggers": [],
5 │ "path": "$HOME/$PROJECT_1_DIR"
6 │ },
7 │ {
8 │ "triggers": [],
9 │ "path": "$HOME/$PROJECT_2_DIR"
10 │ }
11 │ ],
12 │ "version": "2025.01.06.00"
13 │ }
───────┴────────────────────────────────────────────────────────────────────────
> bat $TMPDIR/db-state/state
───────┬────────────────────────────────────────────────────────────────────────
│ File: $TMPDIR/db-state/state
───────┼────────────────────────────────────────────────────────────────────────
1 │ {
2 │ "watched": [
15 │ {
16 │ "triggers": [],
17 │ "path": "$HOME/$PROJECT_1_DIR"
18 │ },
3 │ {
4 │ "triggers": [],
5 │ "path": "$HOME/$PROJECT_3_DIR"
6 │ },
7 │ {
8 │ "triggers": [],
9 │ "path": "$HOME/$PROJECT_4_DIR"
10 │ },
11 │ {
12 │ "triggers": [],
13 │ "path": "$HOME/$PROJECT_5_DIR"
14 │ },
19 │ {
20 │ "triggers": [],
21 │ "path": "$HOME/$PROJECT_6_DIR"
22 │ }
23 │ ],
24 │ "version": "2025.01.06.00"
25 │ }
───────┴──────────────────────────────────────────────────────────────────────── I noticed because I checked I count 4 issues.
|
Thanks for debugging it! We'd appreciate PRs for your suggestions too. Just leaving it open until someone else has time is of course another option. |
I'm willing to PR but not rushing — at least, for now, if anybody else hits the same issue, they might encounter this. |
Jujutsu connects to watchman at a path based on the temporary directory prefix.
Only on instance of watchman can exist on a system (at least as configured by jj).
Where there are multiple temporary directory prefixes on a system, running in a shell with prefix B results in a SIGTERM to the watchman launched from a shell with prefix A, etc.
The new watchman instance takes a few seconds to launch and crawl the directories, and this delays the start of execution of the jujutsu command.
See #5817 (comment) below for what can be done — I think there are two changes to jujutsu that might be warranted.
jj debug watchman status
Original
Description
Running jj in a colocated git repo, having not run jj in that directory for a few hours, sometimes results in a long pause before any output is displayed.
I think I'll be able to pull duration distribution statistics from atuin, if useful? For now, I used samply to take some profiles (prefixing the jj commands with
samply record --save-only -o ~/Downloads/profile.json --
).Here's a profile where it works ok: https://share.firefox.dev/41hViSV
The following show the undesirable behaviour:
In the first, the issue is clearly
watchman
— I havecore.fsmonitor
set to"watchman"
. This repo has 2087 commits (according to the shortlog) and 15 branches, with no tag — not enough for the git import to be cause this issue, surely.I only see this on MacOS [edit: because my other systems have consistent temporary directory prefixes]. First noticed in 0.25, not present [I think — I have not confirmed this] in 0.23. I only use jj in colocated repos.
It's not clear to me that background snapshotting should be relevant here? If that is the solution, then the docs could do with an update!
Steps to Reproduce the Problem
Expected Behavior
JJ displays the log output (for example) near instantly (as it does when the bug doesn't occur).
Actual Behavior
After a pause of up to 20s, the output is displayed on screen.
Specifications
I've installed jujutsu through nix, and the store paths of this and dependencies are shown in the profile json.
The text was updated successfully, but these errors were encountered: