Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncssh API change breaks parsing of config #54

Open
martijnwgnr opened this issue Dec 17, 2024 · 14 comments
Open

asyncssh API change breaks parsing of config #54

martijnwgnr opened this issue Dec 17, 2024 · 14 comments

Comments

@martijnwgnr
Copy link

martijnwgnr commented Dec 17, 2024

The v2.19.0 release of asyncssh has added support for hostname canonicalization. With this change, two new arguments were added to SSHClientConfig.load: canonical and final. We should also pass these parameters in sshfs, because right now the local_user and user are passed in their places, which breaks parsing the config in config.py:

sshfs/sshfs/config.py

Lines 23 to 31 in 34b52b2

return SSHClientConfig.load(
last_config,
config_files,
reload,
local_user,
user,
host,
port,
)

https://github.com/ronf/asyncssh/blob/95756fa4eb54826adb9156b87443d945fd1ed4f8/asyncssh/config.py#L428-L430

@shcheklein
Copy link
Collaborator

@ronf can we by chance make these new arguments into kwargs for example of find some other way to keep it backward compatible? I can bump the minimum required version, but if it's an easy change on your end would prefer to keep it less strict (it's important in some environments, e.g. it takes time for new packages to become available).

@ronf
Copy link

ronf commented Dec 18, 2024

SSHClientConfig is not a public API in AsyncSSH. It's used internally by AsyncSSH outside of the config module but it's not one of the symbols exposed in asyncssh/__init__.py. Importing from anything not present in the top-level asyncssh namespace could lead to the kind of issue you're seeing here.

If you want to preload some configuration options, the preferred approach is to use "options" rather than "config". If you do use config, the only supported mechanism is to pass in a list of path names of config files to load (which is handled as part of setting "options").

If I were to make these keyword args with default values, the new functionality of "match canonical" and "match final" would be broken as a result of not setting these values correctly. Calls to this "load" function really need to be coming from asyncssh/connection.py, which has enough information to set those new arguments properly.

Do you know why sshfs felt the need for this wrapper? How is it used?

@shcheklein
Copy link
Collaborator

thanks, @ronf

the preferred approach is to use "options" rather than "config".

can you point me to the class / code please?

Do you know why sshfs felt the need for this wrapper? How is it used?

I think it's an utility function that was needed for DVC (to preload some config indeed)

https://github.com/iterative/dvc-ssh/blob/main/dvc_ssh/__init__.py#L44

@ronf
Copy link

ronf commented Dec 19, 2024

From what I can see, that _prepare_credentials method is never called. It seems to be trying to pull some fields from its own keyword args but then pull other args from a user config file, but translating those into its own names for things rather than letting AsyncSSH read the user config directly. Unfortunately, using AsyncSSH's config parser outside of its options framework isn't supported, as the AsyncSSH options and config classes have a lot of tight interactions.

Generally speaking, the expectation is that you'll pass AsyncSSH a config="filename" argument on a top-level call like asyncssh.connect() if you want to read an OpenSSH config file, or it will default to reading from ~/.ssh/config if you don't specify something else or pass in None to disable that. Loading options in advance is possible but not recommended in the case of OpenSSH config files, as the values they set can change depending on things like the host, port, username, etc. in a request. So, the config file needs to be parsed each time to return the proper results.

I think it would be possible to take the existing keyword args in the DVC code and convert those to AsyncSSH "options" arguments (see the SSHClientConnectionOptions class), and allow one of those arguments for the user config to be passed in as a config argument (or just let AsyncSSH use the default) if you wanted to allow some values to come from an OpenSSH config file when they weren't already specified as options directly. You'd then pass the resulting SSHClientConnectionOptions instance as an options argument to a call like asynsssh.connect().

In OpenSSH 2.15.0 and later, there's a class method called construct() on SSHClientConnectionOptions which is the recommended way to construct the options object. This makes sure to do all the evaluation in an executor thread, to avoid blocking the asyncio event loop when evaluating some of the options.

You could also convert the keyword args to a new dictionary of AsyncSSH keyword args and just pass those in as keyword args to asyncssh.connect() directly, without constructing an SSHClientConnectionOptions object yourself. That's by far the simplest and cleanest of the approaches, I think. In some cases, some of the keywords might need to be renamed to match the AsyncSSH naming if you want to maintain the existing names DVC is using, but that should be straightforward.

@99-NinetyNine
Copy link

  1. "asyncssh<=2.18.0",
    was one solution as in pull: ssh remote fails, error in asyncssh 2.19.0, missing arguments iterative/dvc#10656
    this approach ensures backward compatibility.

  2. But for more patches or other reasons, if up to date async ssh is to be used, then I suggest changing the parse method directly. As I have done.

  3. Further, This code in https://github.com/fsspec/sshfs/blob/main/setup.cfg was updated 2 years ago. So I think eventually, any new pip install fails. But only previous installations keeps working since they had downloaded the earlier versions of async ssh.
    image

Conclusion:

  • So, checking version of async ssh in the code and calling method accordingly if we sepcify versions (<= or >=) manner.
  • Or, update config parse.py to a exact asyncssh version. (== in setup). I prefer this approach.

@99-NinetyNine
Copy link

I have provided the PR. Haha, test case is to be updated. The string is returned for config.get("ProxyCommand") but is compared against a list by using split(). After this I think it should work, :)

@99-NinetyNine
Copy link

image
that failed test is cause I think.. I changed test files :) lol. Anyways, it is clear now.
split() is causing issue.

@ronf
Copy link

ronf commented Jan 9, 2025

I strongly recommend against pinning a specific version of AsyncSSH here. That would mean that anyone using this package would not be able to pick up any security vulnerability fixes, or other bug fixes.

@shcheklein
Copy link
Collaborator

@ronf thanks for the detailed response.

I also don't see an immediate need (see some questions / caveats below) to DVC to try to build an effective config itself. The logic seems to go back many years ago to this ticket:

iterative/dvc#1613

and this PR:

iterative/dvc#1965

It was using back then paramiko to parse the config and do the connection. I wonder if paramiko didn't have a way to override values in connect / init - e.g. I want to pass my own (from DVC config) port number, or additional key file, etc. Thus we were parsing user ssh config and creating our own dict.

Now, we might not need all of this.


A few caveats that I can immediately see:

  1. We show user, hostname, port when we ask for a password / passphrase in prepare credentials. That won't be as precise at the very least (we can't show a proper hostname w/o parsing the config I guess). Can be annoying when you don't see proper, effective user / hostname when you asked for a password / passphrase.

  2. username falls back to the getpass.getuser() value in:

        login_info["username"] = (
            config.get("user")
            or config.get("username")
            or user_ssh_config.get("User")
            or getpass.getuser()
        )

will assyncssh do the same in connect?

  1. login_info["proxy_command"] = user_ssh_config.get("ProxyCommand") - just not sure why we have a separate line at all for this. I assume it should be also parsed automatically by connect? will it?

  2. Key files:

    if config.get("keyfile"):
           raw_keys.append(config.get("keyfile"))
    elif user_ssh_config.get("IdentityFile"):
           raw_keys.extend(user_ssh_config.get("IdentityFile"))

Can we pass an additional key file to asyncssh? Not override completely, but append to the ones in the SSH config?

@ronf if you have quick suggestions / thoughts on these items ^^ I would appreciate any help.

@ronf
Copy link

ronf commented Jan 22, 2025

Starting with the easiest, the username is gotten from one of the following (in order of precedence):

  • A 'username' keyword arg in the connect() call
  • A 'username' keyword arg in an SSHClientConnectionOptions
  • A 'User' config setting in an SSH config file
  • The local user, queried by getpass.getuser()

Really, there's a value local_username is always gotten from getpass.getuser() first and that is available as one of the fields you can match on in a config file, but this value will also become the 'User' value when parsing the config if it is not explicitly set. However, if a username is set directly in connect() or via options, any User value in config files is ignored.

As for ProxyCommand, it shouldn't need to be special-cased. It can come from keyword args, options, or config, with a similar precedence to the above, but with the default being unset. I'm guessing the "login_info" you show here is related to trying to convert from SSH config values to Paramiko values you mentioned above.

As for showing user/host/port, there are really two different sets of values here. There's the requested user, host, and port and there's the rewritten version of those after parsing config files, if you decide to support those. If your application passes values to connect() via keyword arguments or options, that will take precedence over any config files, which may or may not be what you want.

As for keys, you can have multiple IdentityFile lines in a config file that can be conditionally matched, building up a set of keys to load for all the config lines which match. If you specify keys via keyword arguments or options, though, you have to specify the complete list (doing any conditional logic yourself to build that list).

Note that you can load multiple config files. However, there isn't an easy way right now to state that you want to load the system default config if it exists and then some config file of your own (or vice-versa in the ordering). You'd have to check yourself to see if the system config file existed and then specify the complete list of files to load, being careful to pick the order, as most config settings will lock on the first config rule which matches, ignoring later lines that match. IdentityFile is an exception in that regard.

@shcheklein
Copy link
Collaborator

thanks @ronf !

If your application passes values to connect() via keyword arguments or options, that will take precedence over any config files, which may or may not be what you want.

in what case then the rewritten version of those after parsing config files kicks in? I thought that after we passed a host into connect() it would still normalizes it and potentially does some other magic via config. Is it the case?

If you specify keys via keyword arguments or options, though, you have to specify the complete list (doing any conditional logic yourself to build that list).

it means we can't replicate the existing logic if we don't want to use parse_config and rely on asyncssh, right?

Note that you can load multiple config files.

is it for example to override / add certain values? e.g. by creating a temporary file with an IdentityFile in it?

@ronf
Copy link

ronf commented Jan 23, 2025

in what case then the rewritten version of those after parsing config files kicks in? I thought that after we passed a host into connect() it would still normalizes it and potentially does some other magic via config. Is it the case?

If you want to see the final values of host, port, and username after all the configuration has been applied (keyword arguments, options, and the result of parsing user-specific or default config options), you can use conn.get_extra_info() with arguments of 'host', 'port', and 'username' respectively to pull out this information after the connect() call returns.

If you have your own SSHClient class with a callback for password_auth_requested() to prompt the user for a password, that call will pass in the username it is trying to authenticate as, and you can potentially get the host & port by calling conn.get_extra_info using the connection object passed to you in the connection_made() callback of your SSHClient. You could store host & port as members of your SSHClient in connection_made() and later use them in password_auth_requested().

Note that passing in a port number or username as explicit arguments in a connect() call will always use these values over anything set in a config file. The only thing that allows config to rewrite it is the host value you pass in, if there are Hostname rules in config that match. User and port can be set in the config as well, but only if you don't pass values for them in arguments to connect().

is it for example to override / add certain values? e.g. by creating a temporary file with an IdentityFile in it?

Yes - this should be possible. For things other than IdentityFile, you'd probably want to list your temporary file first and '~/.ssh/config' after that if you want your config to override the user's default config. For IdentityFile, it will append values set in either config file. This also applied to 'CertificateFile' and 'SendEnv', but that's not the case for other config settings. For those, the first thing that matches will be used and any later matching rules will be ignored.

Using your own config file like this should be a way to leverage the existing AsyncSSH logic and not have to replicate that yourself. However, if you just need to override a value and never want to use the version of that set in the user's config file, you can just pass it as a keyword argument to connect().

@shcheklein
Copy link
Collaborator

Note that passing in a port number or username as explicit arguments in a connect() call will always use these values over anything set in a config file. The only thing that allows config to rewrite it is the host value you pass in, if there are Hostname rules in config that match. User and port can be set in the config as well, but only if you don't pass values for them in arguments to connect().

yes, it seems to be fine and expected. If someone explicitly sets port in DVC config I think we should be using that exact value. And it is expected that hostname can be changed.

If you have your own SSHClient class with a callback for password_auth_requested() to prompt

it seems we are not using callbacks atm. We just ask explicitly passphrase / password before connect() and pass them directly.

I'll look into callbacks and if it's easy to migrate to it.

Yes - this should be possible.

yep, but sounds very complicated to me ...

if you just need to override a value and never want to use the version of that set in the user's config file,

the current logic is setup in a way that we are using all the available key files (?) ... so, it's better to keep it that way. Though I don't see an easy way to do this.

@ronf
Copy link

ronf commented Jan 24, 2025

it seems we are not using callbacks atm. We just ask explicitly passphrase / password before connect() and pass them directly.

If you don't mind displaying the hostname before any config file changes are applied to it, prompting for password first should work fine. It means you would be prompting even when a password might not be required, if some target hosts are using something else such as public key auth. If you provide callbacks via an SSHClient subclass, you can avoid prompting when authentication with public key succeeds, and you'd have access to both the original host passed in as well as the rewritten host after the default config is applied, though I actually think displaying just the original host might be better in this case. Often, the rewritten host ends up being an IP address or an internal hostname that the user might not recognize.

Other than host, port, username, and client keys, is there anything you'd want to be able to override from the defaults? You should be able to specify host, port, and username directly on the connect() call and get the behavior you want here, without involving config files or manual creation of SSHClientConnectionOptions objects.

As for client keys, you could write your own temporary config file where you add keys you'd like to use and also pick up keys explicitly specified in the user's default config file by adding an "include config" directive at the end of your temporary config to load the user's ~/.ssh/config. However, if either your config or the user's config specify keys, only those keys will be loaded. There's no easy way right now to get all the default key files in ~/.ssh (id_rsa, id_ecdsa, id_ed25519, etc.) and also add other keys to that. The assumption is that if you are specifying keys you don't want to also use the default keys, especially given that there's a VERY small number of attempts allowed before auth fails, and each key counts against that.

Nothing would stop you from keeping your own list of default key filenames and loading them explicitly, of course, but AsyncSSH will only do this for you when no client keys are specified in the keyword args, options, or config files, and you'd still have to deal with the limited number of auth attempts servers allow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants