Skip to content

Commit

Permalink
Default to checking Azure CLI authentication (#292)
Browse files Browse the repository at this point in the history
* Default to checking Azure CLI authentication

* add note
  • Loading branch information
kylebarron authored Feb 28, 2025
1 parent aaa8d6a commit e91336a
Show file tree
Hide file tree
Showing 6 changed files with 56 additions and 7 deletions.
2 changes: 1 addition & 1 deletion docs/authentication.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Note that many authentication variants are already supported natively.
- Workload identity OAuth2, using a `client_id`, `tenant_id`, and `federated_token_file` passed in by the user
- OAuth2, using a `client_id`, `client_secret`, and `tenant_id` passed in by the user
- A SAS key passed in by the user.
- Azure CLI
- Azure CLI. (If you want to ensure the IMDS authentication is used below, pass [`use_azure_cli=False`][obstore.store.AzureConfigInput.use_azure_cli] to `AzureStore`.)
- IMDS Managed Identity Provider.

(A transcription of [this underlying code](https://github.com/apache/arrow-rs/blob/a00f9f43a0530b9255e4f9940e43121deedb0cc7/object_store/src/azure/builder.rs#L942-L1019)).
Expand Down
15 changes: 15 additions & 0 deletions docs/dev/overridden-defaults.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Overridden Defaults

In general, we wish to follow the upstream `object_store` as closely as possible, which should reduce the maintenance overhead here.

However, there are occasionally places where we want to diverge from the upstream decision making, and we document those here.

## Azure CLI

We always check for Azure CLI authentication as a fallback.

If we stuck with the upstream `object_store` default, you would need to pass `use_azure_cli=True` to check for Azure CLI credentials.

The Azure CLI is the [second-to-last Azure authentication method checked](https://github.com/apache/arrow-rs/blob/9c92a50b6d190ca9d0c74c3ccc69e348393d9246/object_store/src/azure/builder.rs#L1015-L1016) checked. So this only changes the default behavior for people relying on instance authentication. For those people, they can still pass `use_azure_cli=False`.

See upstream discussion [here](https://github.com/apache/arrow-rs/issues/7204).
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ nav:
- Developer Docs:
- Contributing: dev/DEVELOP.md
- Functional API: dev/functional-api.md
- Overridden Defaults: dev/overridden-defaults.md
- dev/pickle.md
- CHANGELOG.md

Expand Down
20 changes: 16 additions & 4 deletions obstore/python/obstore/store/_azure.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,10 @@ class AzureConfigInput(TypedDict, total=False):
azure_tenant_id: str
"""Tenant id used in oauth flows"""
azure_use_azure_cli: bool
"""Use azure cli for acquiring access token"""
"""Use azure cli for acquiring access token.
Defaults to `True`.
"""
azure_use_fabric_endpoint: bool
"""Use object store with url scheme account.dfs.fabric.microsoft.com"""
bearer_token: str
Expand Down Expand Up @@ -186,7 +189,10 @@ class AzureConfigInput(TypedDict, total=False):
token: str
"""Bearer token"""
use_azure_cli: bool
"""Use azure cli for acquiring access token"""
"""Use azure cli for acquiring access token.
Defaults to `True`.
"""
use_emulator: bool
"""Use object store with azurite storage emulator"""
use_fabric_endpoint: bool
Expand Down Expand Up @@ -262,7 +268,10 @@ class AzureConfigInput(TypedDict, total=False):
AZURE_TENANT_ID: str
"""Tenant id used in oauth flows"""
AZURE_USE_AZURE_CLI: bool
"""Use azure cli for acquiring access token"""
"""Use azure cli for acquiring access token.
Defaults to `True`.
"""
AZURE_USE_FABRIC_ENDPOINT: bool
"""Use object store with url scheme account.dfs.fabric.microsoft.com"""
BEARER_TOKEN: str
Expand Down Expand Up @@ -310,7 +319,10 @@ class AzureConfigInput(TypedDict, total=False):
TOKEN: str
"""Bearer token"""
USE_AZURE_CLI: bool
"""Use azure cli for acquiring access token"""
"""Use azure cli for acquiring access token.
Defaults to `True`.
"""
USE_EMULATOR: bool
"""Use object store with azurite storage emulator"""
USE_FABRIC_ENDPOINT: bool
Expand Down
9 changes: 9 additions & 0 deletions pyo3-object_store/src/azure/store.rs
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ impl PyAzureStore {
kwargs: Option<PyAzureConfig>,
) -> PyObjectStoreResult<Self> {
let mut builder = MicrosoftAzureBuilder::from_env();
builder = PyAzureConfig::OVERRIDDEN_DEFAULTS().apply_config(builder);
let mut config = config.unwrap_or_default();
if let Some(container) = container.clone() {
// Note: we apply the bucket to the config, not directly to the builder, so they stay
Expand Down Expand Up @@ -255,6 +256,14 @@ impl PyAzureConfig {
Self(HashMap::new())
}

/// Default values that we opt into that differ from the upstream object_store defaults
#[allow(non_snake_case)]
fn OVERRIDDEN_DEFAULTS() -> Self {
let mut map = HashMap::with_capacity(1);
map.insert(AzureConfigKey::UseAzureCli.into(), true.into());
Self(map)
}

fn apply_config(self, mut builder: MicrosoftAzureBuilder) -> MicrosoftAzureBuilder {
for (key, value) in self.0.into_iter() {
builder = builder.with_config(key.0, value.0);
Expand Down
16 changes: 14 additions & 2 deletions pyo3-object_store/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ impl AsRef<str> for PyConfigValue {
impl<'py> FromPyObject<'py> for PyConfigValue {
fn extract_bound(ob: &Bound<'py, PyAny>) -> PyResult<Self> {
if let Ok(val) = ob.extract::<bool>() {
Ok(Self(val.to_string()))
Ok(val.into())
} else if let Ok(duration) = ob.extract::<Duration>() {
Ok(Self(format_duration(duration).to_string()))
Ok(duration.into())
} else {
Ok(Self(ob.extract()?))
}
Expand All @@ -42,3 +42,15 @@ impl From<PyConfigValue> for String {
value.0
}
}

impl From<bool> for PyConfigValue {
fn from(value: bool) -> Self {
Self(value.to_string())
}
}

impl From<Duration> for PyConfigValue {
fn from(value: Duration) -> Self {
Self(format_duration(value).to_string())
}
}

0 comments on commit e91336a

Please sign in to comment.