Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up fuse unmount workflow #18241

Merged
merged 17 commits into from
Nov 14, 2023
Merged

Conversation

lucyge2022
Copy link
Contributor

What changes are proposed in this pull request?

clean up unmount of fuse

Why are the changes needed?

alluxio-fuse may cause AlluxioFuse continues to hang, and also the current flow to unmount is scattered and all over the place.

Does this PR introduce any user facing changes?

no

@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: FAIL
    • The title of the PR does not pass all the checks. Please fix the following issues:
      • First word of title ("Fuse") is not an imperative verb. Please use one of the valid words
  • Commits associated with Github account: PASS

Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.

@lucyge2022 lucyge2022 changed the title Fuse unmount cleanup Clean up fuse unmount workflow Oct 5, 2023
@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: PASS
  • Commits associated with Github account: PASS

All checks passed!

Copy link
Contributor

@beinan beinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java code looks good to me. Could you ping @ssz1997 to confirm these changes won't break the features on k8s side?

@@ -105,11 +105,10 @@ protected void startCommon(AlluxioConfiguration conf,
try (FileSystem fs = createBaseFileSystem(fsContext, fuseOptions)) {
AlluxioJniFuseFileSystem fuseFileSystem = createFuseFileSystem(fsContext, fs, fuseOptions);
setupFuseFileSystem(fuseFileSystem);
launchFuse(fuseFileSystem, fsContext, fuseOptions, true); // This will block until umount
// This will block on serving until fuse gets destroyed
launchFuse(fuseFileSystem, fsContext, fuseOptions, true);
} catch (Throwable t) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope of this pr, but we should not swallow fatal error (such as jvm oom error)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is actually catching throwable at starting fuse with fuse_main_real, jvm oom throws from some other threads, but will get the exit(-1) back

@@ -176,16 +179,23 @@ private void umountInternal() {
}
} else {
try {
exitCode = new ProcessBuilder("fusermount", "-u", "-z", mountPath).start().waitFor();
Pair<Integer, String> output = Environment.runCommandAndCapture(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm not a big fan to use Pair, it might be better to declare a class for the result explicitly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this entire func is not used any more, these are some codes for debugging purposes. let me remove these changes.

}
}
}
return Pair.of(exitCode, bout.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to add a class for the result of commandAndCapture, I have seen quite a few places using a pair for this one. your call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure will do

Copy link
Contributor

@ssz1997 ssz1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No break change on k8s

Copy link
Contributor

@LuQQiu LuQQiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for improving the logics to gracefully shutdown fuse!

Deprecating this as we shouldn't call the umount/fusermount ourselves, it should come from
user and stop the fuse_main_real from serving and then we exit from main thread
*/
@Deprecated
private void umountInternal() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any place that we are still using this method or it's fully deprecated from this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fully deprecated, shall i just get rid of it, no cmd of umount/fusermount should be run by us in java IMO

Copy link
Contributor

@jiacheliu3 jiacheliu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work! I added some comments on the java part. PTAL and hope that helps.

I don't know much about the C part of fuse so I'd leave that part of review to our experts :P

@@ -264,8 +262,12 @@ public static AlluxioJniFuseFileSystem launchFuse(AlluxioJniFuseFileSystem fuseF

String mountPoint = conf.getString(PropertyKey.FUSE_MOUNT_POINT);
final boolean debugEnabled = conf.getBoolean(PropertyKey.FUSE_DEBUG_ENABLED);
LOG.info("Installing signal handler for SIGTERM and SIGINT to handle "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wanna mention HUP as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, good catch : )

* any existing signal handlers installed up front. Since jvm always install
* signal handlers therefore libfuse is never able to do so.
* But disabling JVM's signal handlers by flag -Xrs, fuse acts differently
* on receiving these signals on different platform(MacOs can receive/Linux can't)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
* on receiving these signals on different platform(MacOs can receive/Linux can't)
* on receiving these signals on different platform(MacOS can receive/Linux can't)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -41,12 +51,12 @@ public void handle(Signal signal) {
LOG.info("Receive signal name {}, number {}, system exiting",
signal.getName(), signal.getNumber());
int number = signal.getNumber();
if (number == 15) {
// SIGTERM - 15 SIGINT - 2 SIGHUP - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// SIGTERM - 15 SIGINT - 2 SIGHUP - 1
// SIGTERM - 15
// SIGINT - 2
// SIGHUP - 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should catch SIGHUP and then ignore it here (a.k.a empty handler)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i merely follow what fuse_set_signal_handlers does, fuse acts to sighup and stop serving as well, any reason to ignore SIGHUP?

@@ -124,4 +132,45 @@ public static String getJniLibraryExtension() {
}
return isMac() ? ".jnilib" : ".so";
}


public static Pair<Integer, String> runCommandAndCapture(String... command)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a ShellCommand class long ago to do the same thing and redirect out/err, maybe use that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see i will use that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I can't without adding dependency on alluxio-core-common, i will just remove this as im anyway deprecating the function

err "Failed to umount fuse mount point ${mount_point}"
err "Failed to umount fuse mount point ${mount_point}, try sending SIGTERM..."
kill "${fuse_pid}"
wait_for_fuse_process_killed "${fuse_pid}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

echo the return code from wait_for_fuse_process_killed before exiting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If kill -TERM still does not kill the process, we have to use kill -KILL here.

Copy link
Contributor Author

@lucyge2022 lucyge2022 Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah if this does not work here, user will use alluxio-fuse -f unmount which automatically kill with 9
i'm modifying the logic of a graceful alluxio-fuse unmount shutdown, in any case user can always kill with 9, but the graceful shutdown shouldn't send 15 in the first place bcos the jvm signal handler thingy overwriting fuse set signal logics, that's what this PR aims to resolve

Copy link
Contributor

@jiacheliu3 jiacheliu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@lucyge2022
Copy link
Contributor Author

hey @huanghua78 i will merge this in ok?

@Xenorith
Copy link
Contributor

@huanghua78 any comments about this PR?

@huanghua78
Copy link
Contributor

@huanghua78 any comments about this PR?

Sorry, I will take another look today.

@lucyge2022 lucyge2022 force-pushed the fuse-unmount-cleanup branch from de00e7a to 8d29fae Compare November 9, 2023 19:52
Copy link
Contributor

@huanghua78 huanghua78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lucyge2022
Copy link
Contributor Author

hey @elega i made some changes to FuseHdfsIntegrationTest/AbstractFuseHdfsIntegrationTest to generate mount point to level of test methods instead of entire class.

@lucyge2022 lucyge2022 added the type-ease-of-use This issue is about of how to improve the ease of use of Alluxio product label Nov 13, 2023
@lucyge2022
Copy link
Contributor Author

alluxio-bot, merge this please.

@alluxio-bot
Copy link
Contributor

merge failed:
Merge refused because check evaluation fail Check PR-EMAIL in PR 18241 for ref ef131a5 has state failure which is not success because Email not linked to a valid Github account

2. removing umountInternal as we shouldn't call umount/fusermount from within
AlluxioFuse program
3. modify alluxio-fuse umount, it shouldn't send SIGTERM it should instead using
umount or fusermount to specifically instruct libfuse to stop serving and exit,
4. FuseSignalHandler should just clean up alluxio resources and stop under the
condition that libfuse won't act to umount/fusermount
@lucyge2022
Copy link
Contributor Author

alluxio-bot, merge this please.

@alluxio-bot alluxio-bot merged commit feb0116 into Alluxio:main Nov 14, 2023
12 checks passed
ssz1997 pushed a commit to ssz1997/alluxio that referenced this pull request Dec 15, 2023
### What changes are proposed in this pull request?

clean up unmount of fuse

### Why are the changes needed?

alluxio-fuse  may cause AlluxioFuse continues to hang, and also the current flow to unmount is scattered and all over the place.

### Does this PR introduce any user facing changes?

no

			pr-link: Alluxio#18241
			change-id: cid-0246bd831466396697efe1977547d365ad63ba3d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-ease-of-use This issue is about of how to improve the ease of use of Alluxio product
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants