Replace pid with flock for runtime config loading #5435

ncopa · 2025-01-14T12:40:37Z

Use lock file and flock(2) to ensure there is only a single instance of k0s running. This is more reliable than storing the pid in the runtime config.

This also solves false positives with k0s runtime config leftovers.

Fixes: #5399

Description

Fixes #5399

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

How Has This Been Tested?

Manual test
Auto test added

Checklist:

twz123 · 2025-01-14T13:15:54Z

Shouldn't we leave it as is and eventually remove the whole runtime config file instead?

ncopa · 2025-01-14T14:55:11Z

Shouldn't we leave it as is and eventually remove the whole runtime config file instead?

how do you mean?

github-actions · 2025-01-15T14:51:16Z

This pull request has merge conflicts that need to be resolved.

ncopa · 2025-01-16T14:35:53Z

Shouldn't we leave it as is and eventually remove the whole runtime config file instead?

That appears to be a fairly intrusive change. This is a cheap and non-intrusive way to fix the specific problem at hand that easily can be backported.

twz123 · 2025-01-16T16:43:31Z

internal/pkg/flock/flock_windows.go

+		return false, err
+	}
+
+	handle := windows.Handle(file.Fd())


Might be safer to go through the SyscallConn interface ...

conn, err := file.SyscallConn() if err != nil { return false, err } err = conn.Control(func(fd uintptr) { handle := windows.Handle(fd) ... }) ...

No, there is no reason to go through the SyscallConn interface. Lets keep it simple.

Are you sure that this doesn't interfere with Go's GC and finalization stuff in any case? Moreover, I don't think we need to turn the file descriptor into blocking mode.

internal/pkg/flock/flock_windows.go

pkg/config/runtime.go

twz123 · 2025-01-16T16:58:19Z

pkg/config/runtime_test.go

+	// create a temporary file for runtime config
+	tmpfile, err := os.CreateTemp("", "runtime-config")
+	require.NoError(t, err)
+	defer os.Remove(tmpfile.Name())
+


This is probably a remnant from a merge conflict resolution.

Suggested change

// create a temporary file for runtime config

tmpfile, err := os.CreateTemp("", "runtime-config")

require.NoError(t, err)

defer os.Remove(tmpfile.Name())

No. We now call dir.Init() early, to ensure that the directory exists and have the correct permissions before trying to create the lock file. If we don't create a subdirectory (eg /tmp/runtime-config) that the test process owns, it will try to change the permissions on the parent directory (eg /tmp) which will result in a permission error.

So we need to create a directory we have permission to chmod on for this test.

Hmm, are you talking about NewRuntimeConfig? This test is about LoadRuntimeConfig, which doesn't call dir.Init and passes just fine without the above block. Moreover, tmpfile is not used at all in the rest of the test.

pkg/config/runtime_test.go

internal/pkg/flock/flock_unix.go

github-actions · 2025-01-16T17:20:22Z

This pull request has merge conflicts that need to be resolved.

Use lock file and flock(2) to ensure there is only a single instance of k0s running. This is more reliable than storing the pid in the runtime config. This solves false positives with k0s runtime config leftovers. Fixes: k0sproject#5399 Signed-off-by: Natanael Copa <[email protected]>

twz123 · 2025-01-19T12:17:53Z

pkg/config/runtime.go

@@ -128,7 +136,7 @@ func NewRuntimeConfig(k0sVars *CfgVars) (*RuntimeConfig, error) {
 		Spec: &RuntimeConfigSpec{
 			NodeConfig: nodeConfig,
 			K0sVars:    k0sVars,
-			Pid:        os.Getpid(),


We might need to keep the PID until the next k0s minor release for backwards compatibility. Old versions will still try to use it to check for other instances.

twz123 · 2025-01-19T12:25:16Z

pkg/config/runtime.go

+		return fmt.Errorf("failed to close the runtime config file: %w", err)
+	}
+
+	if err := os.Remove(r.lockFile.Name()); err != nil {


As we are relying on the lock file name to delete it, we should ensure that we're using an absolute path to open it. Might be worth an extra filepath.Abs along with an explanatory comment just before calling tryLock.

twz123 · 2025-01-19T13:19:08Z

pkg/config/flock_unix.go

+}
+
+// locked checks if the lock is currently held by another process.
+func locked(path string) bool {


This could be implemented in terms of tryLock, for both UNIX and Windows:

func locked(path string) (bool, error) { f, err := tryLock(path) if err != nil { if errors.Is(err, ErrK0sAlreadyRunning) { return true, nil } return false, err } return false, f.Close() }

twz123 · 2025-01-19T13:28:21Z

pkg/config/runtime_test.go

 `)
 	require.NoError(t, os.WriteFile(rtConfigPath, content, 0644))

 	// try to load runtime config and check if it returns an error
 	spec, err := LoadRuntimeConfig(rtConfigPath)
 	assert.Nil(t, spec)
 	assert.ErrorIs(t, err, ErrK0sNotRunning)
+	t.Cleanup(func() { require.NoError(t, spec.Cleanup()) })


Wouldn't a call to spec.Cleanup only be necessary after a successful call to NewRuntimeConfig?

Moreover, since LoadRuntimeConfig returned an error, there shouldn't be the need to call Cleanup, especially as spec is nil. So if the goal is to test that Cleanup is a no-op when called on a nil receiver, I'd rather do this in a separate test case.

twz123 · 2025-01-19T13:32:53Z

pkg/config/runtime_test.go

@@ -86,4 +90,5 @@ func TestNewRuntimeConfig(t *testing.T) {
 	_, err = NewRuntimeConfig(k0sVars)
 	assert.Error(t, err)
 	assert.ErrorIs(t, err, ErrK0sAlreadyRunning)
+	t.Cleanup(func() { require.NoError(t, spec.Cleanup()) })


I'd move this to line 82, so that it's clear that it's cleaning up the NewRuntimeConfig call. Also, in test cleanup functions, there's no need to panic with require (IIRC it is even problematic), so better use assert instead of require.

twz123 · 2025-01-19T13:49:01Z

pkg/config/runtime_test.go

+	// create a temporary file for runtime config
+	tmpfile, err := os.CreateTemp("", "runtime-config")
+	require.NoError(t, err)
+	defer os.Remove(tmpfile.Name())
+


Hmm, are you talking about NewRuntimeConfig? This test is about LoadRuntimeConfig, which doesn't call dir.Init and passes just fine without the above block. Moreover, tmpfile is not used at all in the rest of the test.

ncopa requested review from a team as code owners January 14, 2025 12:40

ncopa requested review from kke and twz123 January 14, 2025 12:40

ncopa mentioned this pull request Jan 14, 2025

fix: prevent false positives in already running k0s instance detection #5400

Closed

16 tasks

ncopa force-pushed the lock-config branch from c992621 to 3ddb1ea Compare January 14, 2025 12:51

ncopa force-pushed the lock-config branch from 3ddb1ea to 9f1369f Compare January 14, 2025 13:30

github-actions bot added the merge-conflict label Jan 15, 2025

ncopa force-pushed the lock-config branch from 9f1369f to 7fdfc0a Compare January 16, 2025 10:12

github-actions bot removed the merge-conflict label Jan 16, 2025

ncopa force-pushed the lock-config branch 2 times, most recently from b58069c to 92be067 Compare January 16, 2025 13:56

twz123 reviewed Jan 16, 2025

View reviewed changes

github-actions bot added the merge-conflict label Jan 16, 2025

ncopa force-pushed the lock-config branch from 92be067 to 3fd5565 Compare January 17, 2025 16:08

github-actions bot removed the merge-conflict label Jan 17, 2025

ncopa force-pushed the lock-config branch 3 times, most recently from 98d92f2 to 58b09ae Compare January 17, 2025 17:07

ncopa force-pushed the lock-config branch from 58b09ae to 4b0f2f1 Compare January 17, 2025 17:09

twz123 reviewed Jan 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace pid with flock for runtime config loading #5435

Replace pid with flock for runtime config loading #5435

ncopa commented Jan 14, 2025

twz123 commented Jan 14, 2025

ncopa commented Jan 14, 2025

github-actions bot commented Jan 15, 2025

ncopa commented Jan 16, 2025

twz123 Jan 16, 2025

ncopa Jan 17, 2025

twz123 Jan 19, 2025 •

edited

Loading

twz123 Jan 16, 2025

ncopa Jan 17, 2025 •

edited

Loading

twz123 Jan 19, 2025

github-actions bot commented Jan 16, 2025

twz123 Jan 19, 2025

twz123 Jan 19, 2025

twz123 Jan 19, 2025

twz123 Jan 19, 2025

twz123 Jan 19, 2025

twz123 Jan 19, 2025

Replace pid with flock for runtime config loading #5435

Are you sure you want to change the base?

Replace pid with flock for runtime config loading #5435

Conversation

ncopa commented Jan 14, 2025

Description

Type of change

How Has This Been Tested?

Checklist:

twz123 commented Jan 14, 2025

ncopa commented Jan 14, 2025

github-actions bot commented Jan 15, 2025

ncopa commented Jan 16, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

twz123 Jan 19, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ncopa Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 16, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

twz123 Jan 19, 2025 •

edited

Loading

ncopa Jan 17, 2025 •

edited

Loading