-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up fuse unmount workflow #18241
Conversation
Automated checks report:
Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks. |
Automated checks report:
All checks passed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java code looks good to me. Could you ping @ssz1997 to confirm these changes won't break the features on k8s side?
@@ -105,11 +105,10 @@ protected void startCommon(AlluxioConfiguration conf, | |||
try (FileSystem fs = createBaseFileSystem(fsContext, fuseOptions)) { | |||
AlluxioJniFuseFileSystem fuseFileSystem = createFuseFileSystem(fsContext, fs, fuseOptions); | |||
setupFuseFileSystem(fuseFileSystem); | |||
launchFuse(fuseFileSystem, fsContext, fuseOptions, true); // This will block until umount | |||
// This will block on serving until fuse gets destroyed | |||
launchFuse(fuseFileSystem, fsContext, fuseOptions, true); | |||
} catch (Throwable t) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope of this pr, but we should not swallow fatal error (such as jvm oom error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is actually catching throwable at starting fuse with fuse_main_real, jvm oom throws from some other threads, but will get the exit(-1) back
@@ -176,16 +179,23 @@ private void umountInternal() { | |||
} | |||
} else { | |||
try { | |||
exitCode = new ProcessBuilder("fusermount", "-u", "-z", mountPath).start().waitFor(); | |||
Pair<Integer, String> output = Environment.runCommandAndCapture( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm not a big fan to use Pair, it might be better to declare a class for the result explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this entire func is not used any more, these are some codes for debugging purposes. let me remove these changes.
} | ||
} | ||
} | ||
return Pair.of(exitCode, bout.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to add a class for the result of commandAndCapture, I have seen quite a few places using a pair for this one. your call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No break change on k8s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for improving the logics to gracefully shutdown fuse!
Deprecating this as we shouldn't call the umount/fusermount ourselves, it should come from | ||
user and stop the fuse_main_real from serving and then we exit from main thread | ||
*/ | ||
@Deprecated | ||
private void umountInternal() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any place that we are still using this method or it's fully deprecated from this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fully deprecated, shall i just get rid of it, no cmd of umount/fusermount should be run by us in java IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work! I added some comments on the java part. PTAL and hope that helps.
I don't know much about the C part of fuse so I'd leave that part of review to our experts :P
@@ -264,8 +262,12 @@ public static AlluxioJniFuseFileSystem launchFuse(AlluxioJniFuseFileSystem fuseF | |||
|
|||
String mountPoint = conf.getString(PropertyKey.FUSE_MOUNT_POINT); | |||
final boolean debugEnabled = conf.getBoolean(PropertyKey.FUSE_DEBUG_ENABLED); | |||
LOG.info("Installing signal handler for SIGTERM and SIGINT to handle " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wanna mention HUP as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, good catch : )
* any existing signal handlers installed up front. Since jvm always install | ||
* signal handlers therefore libfuse is never able to do so. | ||
* But disabling JVM's signal handlers by flag -Xrs, fuse acts differently | ||
* on receiving these signals on different platform(MacOs can receive/Linux can't) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
* on receiving these signals on different platform(MacOs can receive/Linux can't) | |
* on receiving these signals on different platform(MacOS can receive/Linux can't) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -41,12 +51,12 @@ public void handle(Signal signal) { | |||
LOG.info("Receive signal name {}, number {}, system exiting", | |||
signal.getName(), signal.getNumber()); | |||
int number = signal.getNumber(); | |||
if (number == 15) { | |||
// SIGTERM - 15 SIGINT - 2 SIGHUP - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// SIGTERM - 15 SIGINT - 2 SIGHUP - 1 | |
// SIGTERM - 15 | |
// SIGINT - 2 | |
// SIGHUP - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should catch SIGHUP and then ignore it here (a.k.a empty handler)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i merely follow what fuse_set_signal_handlers does, fuse acts to sighup and stop serving as well, any reason to ignore SIGHUP?
@@ -124,4 +132,45 @@ public static String getJniLibraryExtension() { | |||
} | |||
return isMac() ? ".jnilib" : ".so"; | |||
} | |||
|
|||
|
|||
public static Pair<Integer, String> runCommandAndCapture(String... command) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a ShellCommand
class long ago to do the same thing and redirect out/err, maybe use that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see i will use that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I can't without adding dependency on alluxio-core-common, i will just remove this as im anyway deprecating the function
err "Failed to umount fuse mount point ${mount_point}" | ||
err "Failed to umount fuse mount point ${mount_point}, try sending SIGTERM..." | ||
kill "${fuse_pid}" | ||
wait_for_fuse_process_killed "${fuse_pid}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echo the return code from wait_for_fuse_process_killed
before exiting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If kill -TERM
still does not kill the process, we have to use kill -KILL
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah if this does not work here, user will use alluxio-fuse -f unmount which automatically kill with 9
i'm modifying the logic of a graceful alluxio-fuse unmount shutdown, in any case user can always kill with 9, but the graceful shutdown shouldn't send 15 in the first place bcos the jvm signal handler thingy overwriting fuse set signal logics, that's what this PR aims to resolve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
hey @huanghua78 i will merge this in ok? |
2fec0ec
to
b597c61
Compare
@huanghua78 any comments about this PR? |
Sorry, I will take another look today. |
dora/integration/fuse/src/main/java/alluxio/fuse/FuseSignalHandler.java
Outdated
Show resolved
Hide resolved
de00e7a
to
8d29fae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
hey @elega i made some changes to FuseHdfsIntegrationTest/AbstractFuseHdfsIntegrationTest to generate mount point to level of test methods instead of entire class. |
alluxio-bot, merge this please. |
merge failed: |
2. removing umountInternal as we shouldn't call umount/fusermount from within AlluxioFuse program 3. modify alluxio-fuse umount, it shouldn't send SIGTERM it should instead using umount or fusermount to specifically instruct libfuse to stop serving and exit, 4. FuseSignalHandler should just clean up alluxio resources and stop under the condition that libfuse won't act to umount/fusermount
This reverts commit 819468b.
…every UT 2) use 'umount' cmd to unmount
ef131a5
to
856c1a1
Compare
alluxio-bot, merge this please. |
### What changes are proposed in this pull request? clean up unmount of fuse ### Why are the changes needed? alluxio-fuse may cause AlluxioFuse continues to hang, and also the current flow to unmount is scattered and all over the place. ### Does this PR introduce any user facing changes? no pr-link: Alluxio#18241 change-id: cid-0246bd831466396697efe1977547d365ad63ba3d
What changes are proposed in this pull request?
clean up unmount of fuse
Why are the changes needed?
alluxio-fuse may cause AlluxioFuse continues to hang, and also the current flow to unmount is scattered and all over the place.
Does this PR introduce any user facing changes?
no