Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pid with flock for runtime config loading #5435

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ncopa
Copy link
Collaborator

@ncopa ncopa commented Jan 14, 2025

Use lock file and flock(2) to ensure there is only a single instance of k0s running. This is more reliable than storing the pid in the runtime config.

This also solves false positives with k0s runtime config leftovers.

Fixes: #5399

Description

Fixes #5399

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

  • Manual test
  • Auto test added

Checklist:

  • My code follows the style guidelines of this project
  • My commit messages are signed-off
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

@twz123
Copy link
Member

twz123 commented Jan 14, 2025

Shouldn't we leave it as is and eventually remove the whole runtime config file instead?

@ncopa
Copy link
Collaborator Author

ncopa commented Jan 14, 2025

Shouldn't we leave it as is and eventually remove the whole runtime config file instead?

how do you mean?

Copy link
Contributor

This pull request has merge conflicts that need to be resolved.

@ncopa ncopa force-pushed the lock-config branch 2 times, most recently from b58069c to 92be067 Compare January 16, 2025 13:56
@ncopa
Copy link
Collaborator Author

ncopa commented Jan 16, 2025

Shouldn't we leave it as is and eventually remove the whole runtime config file instead?

That appears to be a fairly intrusive change. This is a cheap and non-intrusive way to fix the specific problem at hand that easily can be backported.

return false, err
}

handle := windows.Handle(file.Fd())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be safer to go through the SyscallConn interface ...

conn, err := file.SyscallConn()
if err != nil {
	return false, err
}
err = conn.Control(func(fd uintptr) {
	handle := windows.Handle(fd)
	...
})
...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is no reason to go through the SyscallConn interface. Lets keep it simple.

Copy link
Member

@twz123 twz123 Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that this doesn't interfere with Go's GC and finalization stuff in any case? Moreover, I don't think we need to turn the file descriptor into blocking mode.

internal/pkg/flock/flock_windows.go Outdated Show resolved Hide resolved
pkg/config/runtime.go Show resolved Hide resolved
pkg/config/runtime.go Outdated Show resolved Hide resolved
pkg/config/runtime.go Outdated Show resolved Hide resolved
Comment on lines +31 to +35
// create a temporary file for runtime config
tmpfile, err := os.CreateTemp("", "runtime-config")
require.NoError(t, err)
defer os.Remove(tmpfile.Name())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a remnant from a merge conflict resolution.

Suggested change
// create a temporary file for runtime config
tmpfile, err := os.CreateTemp("", "runtime-config")
require.NoError(t, err)
defer os.Remove(tmpfile.Name())

Copy link
Collaborator Author

@ncopa ncopa Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. We now call dir.Init() early, to ensure that the directory exists and have the correct permissions before trying to create the lock file. If we don't create a subdirectory (eg /tmp/runtime-config) that the test process owns, it will try to change the permissions on the parent directory (eg /tmp) which will result in a permission error.

So we need to create a directory we have permission to chmod on for this test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, are you talking about NewRuntimeConfig? This test is about LoadRuntimeConfig, which doesn't call dir.Init and passes just fine without the above block. Moreover, tmpfile is not used at all in the rest of the test.

pkg/config/runtime_test.go Outdated Show resolved Hide resolved
pkg/config/runtime_test.go Outdated Show resolved Hide resolved
internal/pkg/flock/flock_unix.go Outdated Show resolved Hide resolved
Copy link
Contributor

This pull request has merge conflicts that need to be resolved.

@ncopa ncopa force-pushed the lock-config branch 3 times, most recently from 98d92f2 to 58b09ae Compare January 17, 2025 17:07
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <[email protected]>
@@ -128,7 +136,7 @@ func NewRuntimeConfig(k0sVars *CfgVars) (*RuntimeConfig, error) {
Spec: &RuntimeConfigSpec{
NodeConfig: nodeConfig,
K0sVars: k0sVars,
Pid: os.Getpid(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to keep the PID until the next k0s minor release for backwards compatibility. Old versions will still try to use it to check for other instances.

return fmt.Errorf("failed to close the runtime config file: %w", err)
}

if err := os.Remove(r.lockFile.Name()); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are relying on the lock file name to delete it, we should ensure that we're using an absolute path to open it. Might be worth an extra filepath.Abs along with an explanatory comment just before calling tryLock.

}

// locked checks if the lock is currently held by another process.
func locked(path string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be implemented in terms of tryLock, for both UNIX and Windows:

func locked(path string) (bool, error) {
	f, err := tryLock(path)
	if err != nil {
		if errors.Is(err, ErrK0sAlreadyRunning) {
			return true, nil
		}
		return false, err
	}
	return false, f.Close()
}

`)
require.NoError(t, os.WriteFile(rtConfigPath, content, 0644))

// try to load runtime config and check if it returns an error
spec, err := LoadRuntimeConfig(rtConfigPath)
assert.Nil(t, spec)
assert.ErrorIs(t, err, ErrK0sNotRunning)
t.Cleanup(func() { require.NoError(t, spec.Cleanup()) })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a call to spec.Cleanup only be necessary after a successful call to NewRuntimeConfig?

Moreover, since LoadRuntimeConfig returned an error, there shouldn't be the need to call Cleanup, especially as spec is nil. So if the goal is to test that Cleanup is a no-op when called on a nil receiver, I'd rather do this in a separate test case.

@@ -86,4 +90,5 @@ func TestNewRuntimeConfig(t *testing.T) {
_, err = NewRuntimeConfig(k0sVars)
assert.Error(t, err)
assert.ErrorIs(t, err, ErrK0sAlreadyRunning)
t.Cleanup(func() { require.NoError(t, spec.Cleanup()) })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this to line 82, so that it's clear that it's cleaning up the NewRuntimeConfig call. Also, in test cleanup functions, there's no need to panic with require (IIRC it is even problematic), so better use assert instead of require.

Comment on lines +31 to +35
// create a temporary file for runtime config
tmpfile, err := os.CreateTemp("", "runtime-config")
require.NoError(t, err)
defer os.Remove(tmpfile.Name())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, are you talking about NewRuntimeConfig? This test is about LoadRuntimeConfig, which doesn't call dir.Init and passes just fine without the above block. Moreover, tmpfile is not used at all in the rest of the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

k0s error "an instance of k0s is already running" but it is not running
2 participants