From 40725800218641780c9134fe20495c171f31809e Mon Sep 17 00:00:00 2001 From: Zoltan Kis Date: Wed, 13 Nov 2024 22:19:11 +0200 Subject: [PATCH 01/15] Add device selection explainer Signed-off-by: Zoltan Kis --- explainer-device-selection.md | 110 ++++++++++++++++++++++++++++++++++ 1 file changed, 110 insertions(+) create mode 100644 explainer-device-selection.md diff --git a/explainer-device-selection.md b/explainer-device-selection.md new file mode 100644 index 00000000..4c5a64a9 --- /dev/null +++ b/explainer-device-selection.md @@ -0,0 +1,110 @@ +# Device Selection Explainer + +## Introduction + +This explainer summarizes the discussion and background on [Web NN device selection](https://webmachinelearning.github.io/webnn/#programming-model-device-selection). + +The goal is to help making design decisions on whether and how to handle compute device selection for a Web NN [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext). + +A context represents the global state of Web NN model graph execution, including the compute devices (e.g. CPU, GPU, NPU) the [Web NN graph](https://webmachinelearning.github.io/webnn/#mlgraph) is executed on. + +When creating a context, an application may want to provide hints to the implementation on what device(s) are preferred for execution. + +Implementations, browsers and underlying OS may want to control the allocation of compute devices for various use cases and system conditions. + +The question is in what use cases who and how much should control the execution context. + +Currently this is captured by [context options](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions), such as [device type](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) and [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference). + +## History + +Previous discussion covered the following main topics (see References): +- who control of context: script vs. user agent (OS); +- CPU vs GPU device selection, including handling multiple GPUs; +- how to handle NPU execution, quantization/dequantization. + +In [[Simplify MLContext creation #322]](https://github.com/webmachinelearning/webnn/pull/322), the proposal was to always use an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) object to initialize a context and remove the `"gpu"` [context option](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions). +Also, remove the `'high-performance"` [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference), since it was used for the GPU option which became explicit. +Explicit GPU selection also provides clarity when there are multiple GPU devices, as implementations don't need to rely on hints such as power preference to deduce which GPU device to select. +A counter-argument was that it becomes more complex to use an implementation selected default GPU, and there was value in that simplicity. + +In [[API simplification: context types, context options #302]](https://github.com/webmachinelearning/webnn/issues/302), the [proposal](https://github.com/webmachinelearning/webnn/issues/302#issuecomment-1960407195) was to make delegating device selection to the implementation should be the default behaviour and remove [device type](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype). +However, keep the hints/options mechanism, with an improved mapping to use cases. +For instance, device selection is not about mandating where to execute, but e.g. tell what to avoid if possible (e.g. don't use the GPU). + +In [[WebNN should support NPU and QDQ operations #623]](https://github.com/webmachinelearning/webnn/issues/623), an explicit request to support NPU device selection was discussed, along with quantization use cases. Several [options](https://github.com/webmachinelearning/webnn/issues/623#issuecomment-2063954107) were proposed, and the simplest one was chosen, i.e. extending the [device type enum](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) with the `"npu"` value and update the relevant algorithms, as added in [PR #696](https://github.com/webmachinelearning/webnn/pull/696). +However, alternative policies for error handling and fallback scenarios remained open questions. + +Later the need for explicit device selection support was challenged in [[MLContextOptions.deviceType seems unnecessary outside of conformance testing #749]](https://github.com/webmachinelearning/webnn/issues/749), with the main arguments also summarized in a W3C TPAC group meeting [presentation](https://lists.w3.org/Archives/Public/www-archive/2024Sep/att-0006/MLDeviceType.pdf). + +The main points were the following: +- The [device type](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) option is hard to standardize because of the heterogeneity of the compute units across various platforms, and even across their versions, for instance `"npu"` might not be a standalone option available, only a combined form of `"npu"` and `"cpu"`. +- As for error management vs. fallback policies: fallback is preferable instead of failing, and implementations/the underlying platforms should determine the fallback type based on runtime information. +- Implementation / browser / OS have better grasp of the system/compute/runtime/apps state then websites, therefore control should be relished to them. For instance, if rendering performance degrades, the implementation/underlying platform can possibly fix it the best way, not the web app. + +## Key use cases and requirements + +Design decisions should take the following into account: + +1. Allow the underlying platform ultimately choose the compute device. + +2. Allow scripts to express hints/options when creating contexts, such as preference for low power consumption, or high performance, low latency, or stable sustained performance etc. + +3. Allow an easy way to create a context with a GPU device, i.e. without specifying an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice). + +4. Allow selection from available GPU devices, for instance by allowing specifying an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) obtained from available devices using the [WebGPU](https://gpuweb.github.io/gpuweb) mechanisms. + +5. Allow selection from available various AI accelerators, including NPUs or a combination of accelerators. This may happen using a (to be specified) algorithmic mapping from context options. Or, allow web apps to hint a preferred fallback order for the given context, for instance `["npu", "cpu"]`, meaning that implementations should try executing the graph on NPU as much as possible and try to avoid GPU. Basically `"cpu"` could even be omitted, as it could be the default fallback device, therefore specifying `"npu"` alone would mean the same. However, this can become complex with all possible device variations, so we must specify and standardize the supported fallback orders. + +6. Allow enumeration of [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary) before creating a context, so that web apps could select the best device which would work with the intended model. + +7. As a corollary to 6, allow creating a context using also options for [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary). + +## Considered alternatives + +1. Keep the current [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) as a context option, but improve the device type names and specify an algorithm for a mapping these names to various real adaptors (with their given characteristics). However, this would be more limited than being able to specify device specific limits to context creation. + +2. Follow this [proposal](https://github.com/webmachinelearning/webnn/issues/749#issuecomment-2429821928), also tracked in [[MLOpSupportLimits should be opt-in #759]](https://github.com/webmachinelearning/webnn/issues/759). + +## Scenarios, examples, design discussion + +Examples for user scenarios: + +```js +// simple context creation with implementation defaults +context = await navigator.ml.createContext(); + +// create a context that will likely map to NPU +context = + await navigator.ml.createContext({powerPreference: 'low-power'}); + +// create a context that will likely map to GPU +context = + await navigator.ml.createContext({powerPreference: 'high-performance'}); + +// enumerate devices and limits (as allowed by policy/implementation) +// and select one of them to create a context +const limitsMap = await navigator.ml.opSupportLimitsPerDevice(); +// analyze the map and select an op support limit set +// ... +const context = await navigator.ml.createContext({ + limits: deviceLimitsMap['npu1'] +}); + +// as an alternative, hint a preferred fallback order ["npu", "cpu"] +// i.e. try executing the graph on NPU and avoid GPU as much as possible +// but do as it's best fit with the rest of the context options +const context = await navigator.ml.createContext({ fallback: ['npu', 'cpu'] }); + +``` + +## Open questions + +[WebGPU](https://gpuweb.github.io/gpuweb/) provides a way to select a GPU device, called [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter). Should we align the naming between adapter and device? + +Should we expose a similar adapter API for NPUs? Or could NPUs be represented as [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter) (basically a few text attributes)? + +How should we extend the context options? +What exactly is best to pass as context options? Op support limits? Supported features, similar to [GPUSupportedFeatures](https://gpuweb.github.io/gpuweb/#gpusupportedfeatures)? Others? + +Update the security and privacy section. Would the proposals here increase the fingerprinting vector? If yes, what mitigations can be made? The current understanding is that any extra information exposed to web apps in these proposals could be obtained by other methods as well. However, security hardening and relevant mitigations are recommended. For instance, implementations could choose the level of information (e.g. op support limits) exposed to a given origin. From 6f73ebb38a4aa4670805cdc7e88eeb6223b387fe Mon Sep 17 00:00:00 2001 From: Zoltan Kis Date: Thu, 14 Nov 2024 11:50:09 +0200 Subject: [PATCH 02/15] Add more considerations to the device selection explainer Signed-off-by: Zoltan Kis --- explainer-device-selection.md | 37 +++++++++++++++++++++++++++++++++-- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/explainer-device-selection.md b/explainer-device-selection.md index 4c5a64a9..4508f113 100644 --- a/explainer-device-selection.md +++ b/explainer-device-selection.md @@ -74,7 +74,7 @@ Examples for user scenarios: // simple context creation with implementation defaults context = await navigator.ml.createContext(); -// create a context that will likely map to NPU +// create a context that will likely map to NPU, or NPU+CPU context = await navigator.ml.createContext({powerPreference: 'low-power'}); @@ -107,4 +107,37 @@ Should we expose a similar adapter API for NPUs? Or could NPUs be represented as How should we extend the context options? What exactly is best to pass as context options? Op support limits? Supported features, similar to [GPUSupportedFeatures](https://gpuweb.github.io/gpuweb/#gpusupportedfeatures)? Others? -Update the security and privacy section. Would the proposals here increase the fingerprinting vector? If yes, what mitigations can be made? The current understanding is that any extra information exposed to web apps in these proposals could be obtained by other methods as well. However, security hardening and relevant mitigations are recommended. For instance, implementations could choose the level of information (e.g. op support limits) exposed to a given origin. +Update the security and privacy section. Would the proposals here increase the fingerprinting surface? If yes, what mitigations can be made? The current understanding is that any extra information exposed to web apps in these proposals could be obtained by other methods as well. However, security hardening and relevant mitigations are recommended. For instance, implementations could choose the level of information (e.g. op support limits) exposed to a given origin. + +## Background thoughts + +### Representing NPUs + +Earlier there have been ideas to represent NPUs in a similar way as WebGPU [adapters](https://gpuweb.github.io/gpuweb/#gpuadapter), basically exposing basic string information, features, limits, and whether they can be used as a fallback device. + +However, this would likely be premature standardization, as NPUs are very heterogeneous in their implementations, for instance memory and processing unit architecture can be significantly different. Also, they can be either standalone devices (e.g. TPUs), or integrated as SoC modules, together with CPUs, and even GPUs. + +There is a fundamental difference vs. programming GPUs. From programming point of view, NPUs are very specific and need specialized drivers, which integrate into libraries and frameworks. Therefore they don't need explicitly exposed abstractions like in [WebGPU](https://gpuweb.github.io/gpuweb/), but they might have specific quantization requirements and limitations. + +The main use cases for NPUs is to offload more general purpose computing devices (CPU and even GPU) from machine learning compute loads. Power efficient performance is the main characteristic. + +Therefore, use cases that include NPUs could be euphemistically represented by the `"low-power"` [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference), which could mean the following (depending on the underlying platform): +- pure NPU execution, +- NPU preferred, fallback to CPU, +- combined [multiple] NPU and CPU execution controlled by the underlying platform. + +### Selecting from multiple [types] of NPUs + +The proposal above uses [Web GPU](https://gpuweb.github.io/gpuweb) mechanisms to select a GPU device for a context. This covers support for multiple GPUs, even with different type and capabilities. + +We lack such mechanisms to select NPUs. Earlier there have been ideas to use a similar approach as Web GPU. + +However, enumerating and managing adapters are not very webby designs. For instance, to avoid complexity and to minimize fingerprinting surfaces, the [Presentation API](https://www.w3.org/TR/presentation-api/) outsources selecting the target device to the user agent, so that the web app can achieve the use case without being exposed with platform specific details. + +In Web NN case, we cannot use such mechanisms, because the API is used by frameworks, not by web pages. + +Currently the handling of multiple NPUs (e.g. single model on multiple NPUs, or multiple models on multiple NPUs) is delegated to the implementation and underlying platform. + +### Hybrid execution scenarios using NPU, CPU and GPU + +Many platforms support various hybrid execution scenarios involving NPU, CPU, and GPU (e.g. NPU-CPU, NPU-GPU, NPU-CPU-GPU), but these are not explicitly exposed and controlled in Web NN. From de5e4f47c54b653a0180f5bf56661cbcc5469e2f Mon Sep 17 00:00:00 2001 From: Zoltan Kis Date: Thu, 14 Nov 2024 22:28:01 +0200 Subject: [PATCH 03/15] Add one more alternative, fixes and clarifications to the device selection explainer Signed-off-by: Zoltan Kis --- explainer-device-selection.md | 50 +++++++++++++++++------------------ 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/explainer-device-selection.md b/explainer-device-selection.md index 4508f113..074117ba 100644 --- a/explainer-device-selection.md +++ b/explainer-device-selection.md @@ -4,7 +4,7 @@ This explainer summarizes the discussion and background on [Web NN device selection](https://webmachinelearning.github.io/webnn/#programming-model-device-selection). -The goal is to help making design decisions on whether and how to handle compute device selection for a Web NN [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext). +The goal is to help making design decisions on how to handle compute device selection for a Web NN [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext). A context represents the global state of Web NN model graph execution, including the compute devices (e.g. CPU, GPU, NPU) the [Web NN graph](https://webmachinelearning.github.io/webnn/#mlgraph) is executed on. @@ -18,26 +18,24 @@ Currently this is captured by [context options](https://webmachinelearning.githu ## History -Previous discussion covered the following main topics (see References): -- who control of context: script vs. user agent (OS); +Previous discussion covered the following main topics: +- who controls the execution context: script vs. user agent (OS); - CPU vs GPU device selection, including handling multiple GPUs; -- how to handle NPU execution, quantization/dequantization. +- how to handle NPU devices, quantization/dequantization. In [[Simplify MLContext creation #322]](https://github.com/webmachinelearning/webnn/pull/322), the proposal was to always use an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) object to initialize a context and remove the `"gpu"` [context option](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions). -Also, remove the `'high-performance"` [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference), since it was used for the GPU option which became explicit. -Explicit GPU selection also provides clarity when there are multiple GPU devices, as implementations don't need to rely on hints such as power preference to deduce which GPU device to select. -A counter-argument was that it becomes more complex to use an implementation selected default GPU, and there was value in that simplicity. +Also, remove the `'high-performance"` [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference), since it was used for the GPU option, which now becomes explicit. +Explicit GPU selection also provides clarity when there are multiple GPU devices, as implementations need to use [WebGPU](https://gpuweb.github.io/gpuweb/) in order to select a [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter), from where they can request a [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) object. +A counter-argument was that it becomes more complex to use an implementation selected default GPU, as there is no simple way any more to tell implementations to use any GPU device for creating an [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext). This concern could eventually be alleviated by keeping the `'high-performance"` [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference). -In [[API simplification: context types, context options #302]](https://github.com/webmachinelearning/webnn/issues/302), the [proposal](https://github.com/webmachinelearning/webnn/issues/302#issuecomment-1960407195) was to make delegating device selection to the implementation should be the default behaviour and remove [device type](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype). +In [[API simplification: context types, context options #302]](https://github.com/webmachinelearning/webnn/issues/302), the [proposal](https://github.com/webmachinelearning/webnn/issues/302#issuecomment-1960407195) was that the default behaviour should be to delegate device selection to the implementation, and remove [device type](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype). However, keep the hints/options mechanism, with an improved mapping to use cases. For instance, device selection is not about mandating where to execute, but e.g. tell what to avoid if possible (e.g. don't use the GPU). In [[WebNN should support NPU and QDQ operations #623]](https://github.com/webmachinelearning/webnn/issues/623), an explicit request to support NPU device selection was discussed, along with quantization use cases. Several [options](https://github.com/webmachinelearning/webnn/issues/623#issuecomment-2063954107) were proposed, and the simplest one was chosen, i.e. extending the [device type enum](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) with the `"npu"` value and update the relevant algorithms, as added in [PR #696](https://github.com/webmachinelearning/webnn/pull/696). However, alternative policies for error handling and fallback scenarios remained open questions. -Later the need for explicit device selection support was challenged in [[MLContextOptions.deviceType seems unnecessary outside of conformance testing #749]](https://github.com/webmachinelearning/webnn/issues/749), with the main arguments also summarized in a W3C TPAC group meeting [presentation](https://lists.w3.org/Archives/Public/www-archive/2024Sep/att-0006/MLDeviceType.pdf). - -The main points were the following: +Later the need for explicit device selection support was challenged in [[MLContextOptions.deviceType seems unnecessary outside of conformance testing #749]](https://github.com/webmachinelearning/webnn/issues/749), with the main arguments also summarized in a W3C TPAC group meeting [presentation](https://lists.w3.org/Archives/Public/www-archive/2024Sep/att-0006/MLDeviceType.pdf). The main points were the following: - The [device type](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) option is hard to standardize because of the heterogeneity of the compute units across various platforms, and even across their versions, for instance `"npu"` might not be a standalone option available, only a combined form of `"npu"` and `"cpu"`. - As for error management vs. fallback policies: fallback is preferable instead of failing, and implementations/the underlying platforms should determine the fallback type based on runtime information. - Implementation / browser / OS have better grasp of the system/compute/runtime/apps state then websites, therefore control should be relished to them. For instance, if rendering performance degrades, the implementation/underlying platform can possibly fix it the best way, not the web app. @@ -48,15 +46,15 @@ Design decisions should take the following into account: 1. Allow the underlying platform ultimately choose the compute device. -2. Allow scripts to express hints/options when creating contexts, such as preference for low power consumption, or high performance, low latency, or stable sustained performance etc. +2. Allow scripts to express hints/options when creating contexts, such as preference for low power consumption, or high performance, low latency, stable sustained performance etc. 3. Allow an easy way to create a context with a GPU device, i.e. without specifying an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice). -4. Allow selection from available GPU devices, for instance by allowing specifying an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) obtained from available devices using the [WebGPU](https://gpuweb.github.io/gpuweb) mechanisms. +4. Allow selection from available GPU devices, for instance by allowing specifying an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) obtained from available [GPUAdapters](https://gpuweb.github.io/gpuweb/#gpuadapter) using the [WebGPU](https://gpuweb.github.io/gpuweb) mechanisms via [GPURequestAdapterOptions](https://gpuweb.github.io/gpuweb/#dictdef-gpurequestadapteroptions), such as feature level or power preference. 5. Allow selection from available various AI accelerators, including NPUs or a combination of accelerators. This may happen using a (to be specified) algorithmic mapping from context options. Or, allow web apps to hint a preferred fallback order for the given context, for instance `["npu", "cpu"]`, meaning that implementations should try executing the graph on NPU as much as possible and try to avoid GPU. Basically `"cpu"` could even be omitted, as it could be the default fallback device, therefore specifying `"npu"` alone would mean the same. However, this can become complex with all possible device variations, so we must specify and standardize the supported fallback orders. -6. Allow enumeration of [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary) before creating a context, so that web apps could select the best device which would work with the intended model. +6. Allow enumeration of [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary) before creating a context, so that web apps could select the best device which would work with the intended model. This needs more developer input and examples. 7. As a corollary to 6, allow creating a context using also options for [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary). @@ -64,7 +62,9 @@ Design decisions should take the following into account: 1. Keep the current [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) as a context option, but improve the device type names and specify an algorithm for a mapping these names to various real adaptors (with their given characteristics). However, this would be more limited than being able to specify device specific limits to context creation. -2. Follow this [proposal](https://github.com/webmachinelearning/webnn/issues/749#issuecomment-2429821928), also tracked in [[MLOpSupportLimits should be opt-in #759]](https://github.com/webmachinelearning/webnn/issues/759). +2. Remove [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) as explicit qualifier, but define a set of [context options](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions) that map well to GPU adapter/device selection and also to NPU device selection. + +3. Follow this [proposal](https://github.com/webmachinelearning/webnn/issues/749#issuecomment-2429821928), also tracked in [[MLOpSupportLimits should be opt-in #759]](https://github.com/webmachinelearning/webnn/issues/759). That is, allow listing op support limits outside of a context, which would return all available devices with their op support limits. Then the web app could choose one of them to initialize a context with. ## Scenarios, examples, design discussion @@ -100,7 +100,7 @@ const context = await navigator.ml.createContext({ fallback: ['npu', 'cpu'] }); ## Open questions -[WebGPU](https://gpuweb.github.io/gpuweb/) provides a way to select a GPU device, called [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter). Should we align the naming between adapter and device? +[WebGPU](https://gpuweb.github.io/gpuweb/) provides a way to select a GPU device, called [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter). Should we align the naming between GPU adapter and WebNN device? Should we expose a similar adapter API for NPUs? Or could NPUs be represented as [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter) (basically a few text attributes)? @@ -117,27 +117,27 @@ Earlier there have been ideas to represent NPUs in a similar way as WebGPU [adap However, this would likely be premature standardization, as NPUs are very heterogeneous in their implementations, for instance memory and processing unit architecture can be significantly different. Also, they can be either standalone devices (e.g. TPUs), or integrated as SoC modules, together with CPUs, and even GPUs. -There is a fundamental difference vs. programming GPUs. From programming point of view, NPUs are very specific and need specialized drivers, which integrate into libraries and frameworks. Therefore they don't need explicitly exposed abstractions like in [WebGPU](https://gpuweb.github.io/gpuweb/), but they might have specific quantization requirements and limitations. +There is a fundamental difference between programming NPUs vs. programming GPUs. From programming point of view, NPUs are very specific and need specialized drivers, which integrate into libraries and frameworks. Therefore they don't need explicitly exposed abstractions like in [WebGPU](https://gpuweb.github.io/gpuweb/), but they might have specific quantization requirements and limitations. -The main use cases for NPUs is to offload more general purpose computing devices (CPU and even GPU) from machine learning compute loads. Power efficient performance is the main characteristic. +The main use case for NPUs currently is mainly to offload more general purpose computing devices (CPU and even GPU) from machine learning compute loads. Power efficient performance is the main characteristic. -Therefore, use cases that include NPUs could be euphemistically represented by the `"low-power"` [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference), which could mean the following (depending on the underlying platform): +Therefore, use cases that include NPUs could be euphemistically represented by the `"low-power"` [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference), which could mean the following mappings (controlled by the underlying platform): - pure NPU execution, - NPU preferred, fallback to CPU, -- combined [multiple] NPU and CPU execution controlled by the underlying platform. +- combined [multiple] NPU and CPU execution. ### Selecting from multiple [types] of NPUs The proposal above uses [Web GPU](https://gpuweb.github.io/gpuweb) mechanisms to select a GPU device for a context. This covers support for multiple GPUs, even with different type and capabilities. -We lack such mechanisms to select NPUs. Earlier there have been ideas to use a similar approach as Web GPU. +We don't have such mechanisms to select NPUs. Earlier there have been ideas to use a similar, if not the same approach as Web GPU. -However, enumerating and managing adapters are not very webby designs. For instance, to avoid complexity and to minimize fingerprinting surfaces, the [Presentation API](https://www.w3.org/TR/presentation-api/) outsources selecting the target device to the user agent, so that the web app can achieve the use case without being exposed with platform specific details. +However, enumerating and managing adapters are not very web'ish designs. For instance, to avoid complexity and to minimize fingerprinting surfaces, the [Presentation API](https://www.w3.org/TR/presentation-api/) outsourced selecting the target device to the user agent, so that the web app can achieve the use case without being exposed with platform specific details. -In Web NN case, we cannot use such mechanisms, because the API is used by frameworks, not by web pages. +In the Web NN case, we cannot use such mechanisms, because the API is used by frameworks, not by web pages. -Currently the handling of multiple NPUs (e.g. single model on multiple NPUs, or multiple models on multiple NPUs) is delegated to the implementation and underlying platform. +As such, currently the handling of multiple NPUs (e.g. single model on multiple NPUs, or multiple models on multiple NPUs) is delegated to the implementations and underlying platforms. ### Hybrid execution scenarios using NPU, CPU and GPU -Many platforms support various hybrid execution scenarios involving NPU, CPU, and GPU (e.g. NPU-CPU, NPU-GPU, NPU-CPU-GPU), but these are not explicitly exposed and controlled in Web NN. +Many platforms support various hybrid execution scenarios involving NPU, CPU, and GPU (e.g. NPU-CPU, NPU-GPU, NPU-CPU-GPU), but these are not explicitly exposed and controlled in Web NN. They are best selected and controlled by the implementations. However, we should distillate the main use cases behind hybrid execution and define a hinting/mapping mechanism, such as the power preference mentioned earlier. From 88a858d6afdf2f5ff549e6e1f49dee52b443d460 Mon Sep 17 00:00:00 2001 From: Zoltan Kis Date: Thu, 14 Nov 2024 22:30:01 +0200 Subject: [PATCH 04/15] Rename the device selection explainer to align with others Signed-off-by: Zoltan Kis --- explainer-device-selection.md => device-selection-explainer.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename explainer-device-selection.md => device-selection-explainer.md (100%) diff --git a/explainer-device-selection.md b/device-selection-explainer.md similarity index 100% rename from explainer-device-selection.md rename to device-selection-explainer.md From f45266fb1223988894b0ccad7701c41aa753c5f1 Mon Sep 17 00:00:00 2001 From: Zoltan Kis Date: Wed, 18 Dec 2024 15:51:01 +0200 Subject: [PATCH 05/15] Add more considerations and a section for minimum viable solution Signed-off-by: Zoltan Kis --- device-selection-explainer.md | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index 074117ba..fe3fb1aa 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -18,7 +18,7 @@ Currently this is captured by [context options](https://webmachinelearning.githu ## History -Previous discussion covered the following main topics: +Previous discussion covered the following main topics:MLContext - who controls the execution context: script vs. user agent (OS); - CPU vs GPU device selection, including handling multiple GPUs; - how to handle NPU devices, quantization/dequantization. @@ -28,9 +28,11 @@ Also, remove the `'high-performance"` [power preference](https://webmachinelearn Explicit GPU selection also provides clarity when there are multiple GPU devices, as implementations need to use [WebGPU](https://gpuweb.github.io/gpuweb/) in order to select a [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter), from where they can request a [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) object. A counter-argument was that it becomes more complex to use an implementation selected default GPU, as there is no simple way any more to tell implementations to use any GPU device for creating an [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext). This concern could eventually be alleviated by keeping the `'high-performance"` [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference). +In [[Need to understand how WebNN supports implementation that involves multiple devices and timelines #350]](https://github.com/webmachinelearning/webnn/issues/350) it was pointed out that [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext) supports only a single device, while there are frameworks that support working with a single graph over multiple devices (e.g. CoreML). The proposal was to create a _default_ context that has no explicitly associated device (it could be also named a _generic_ context), where the implementation may choose the underlying device(s). + In [[API simplification: context types, context options #302]](https://github.com/webmachinelearning/webnn/issues/302), the [proposal](https://github.com/webmachinelearning/webnn/issues/302#issuecomment-1960407195) was that the default behaviour should be to delegate device selection to the implementation, and remove [device type](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype). However, keep the hints/options mechanism, with an improved mapping to use cases. -For instance, device selection is not about mandating where to execute, but e.g. tell what to avoid if possible (e.g. don't use the GPU). +For instance, device selection is not about mandating where to execute, but e.g. tell what to avoid if possible (e.g. don't use the GPU). In this case, the [context options](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions), such as [device type](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) and [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference) could be used for mapping user hints into device selection logic by implementations. The list of options could be extended based on future needs. Note that the current hints don't guarantee the selection of a particular device type (such as GPU) or a given combination of devices (such as CPU+NPU). For instance using the `"high-performance"` [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference) may not guarantee GPU execution, depending on the underlying platform. In [[WebNN should support NPU and QDQ operations #623]](https://github.com/webmachinelearning/webnn/issues/623), an explicit request to support NPU device selection was discussed, along with quantization use cases. Several [options](https://github.com/webmachinelearning/webnn/issues/623#issuecomment-2063954107) were proposed, and the simplest one was chosen, i.e. extending the [device type enum](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) with the `"npu"` value and update the relevant algorithms, as added in [PR #696](https://github.com/webmachinelearning/webnn/pull/696). However, alternative policies for error handling and fallback scenarios remained open questions. @@ -46,7 +48,7 @@ Design decisions should take the following into account: 1. Allow the underlying platform ultimately choose the compute device. -2. Allow scripts to express hints/options when creating contexts, such as preference for low power consumption, or high performance, low latency, stable sustained performance etc. +2. Allow scripts to express hints/options when creating contexts, such as preference for low power consumption, or high performance (throughput), low latency, stable sustained performance etc. 3. Allow an easy way to create a context with a GPU device, i.e. without specifying an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice). @@ -62,7 +64,7 @@ Design decisions should take the following into account: 1. Keep the current [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) as a context option, but improve the device type names and specify an algorithm for a mapping these names to various real adaptors (with their given characteristics). However, this would be more limited than being able to specify device specific limits to context creation. -2. Remove [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) as explicit qualifier, but define a set of [context options](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions) that map well to GPU adapter/device selection and also to NPU device selection. +2. Remove [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype), but define a set of [context options](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions) that map well to GPU adapter/device selection and also to NPU device selection. 3. Follow this [proposal](https://github.com/webmachinelearning/webnn/issues/749#issuecomment-2429821928), also tracked in [[MLOpSupportLimits should be opt-in #759]](https://github.com/webmachinelearning/webnn/issues/759). That is, allow listing op support limits outside of a context, which would return all available devices with their op support limits. Then the web app could choose one of them to initialize a context with. @@ -141,3 +143,14 @@ As such, currently the handling of multiple NPUs (e.g. single model on multiple ### Hybrid execution scenarios using NPU, CPU and GPU Many platforms support various hybrid execution scenarios involving NPU, CPU, and GPU (e.g. NPU-CPU, NPU-GPU, NPU-CPU-GPU), but these are not explicitly exposed and controlled in Web NN. They are best selected and controlled by the implementations. However, we should distillate the main use cases behind hybrid execution and define a hinting/mapping mechanism, such as the power preference mentioned earlier. + +As an example for handling hybrid execution as well as the underlying challenges, take a look at [OpenVINO device selection](https://blog.openvino.ai/blog-posts/automatic-device-selection-and-configuration). + +## Minimum Viable Solution + +Based on the discussion above, the best starting point would be a simple solution that can be extended and refined later. Namely, +- Remove [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) as explicit [context option](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions). +- Update [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext) so that it becomes device agnostic, or _default_/_generic_ context. Allow supporting multiple devices with one context. +- Add notes to implementations on how to map [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference) to devices. +- Improve the device selection hints in [context options](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions) and define their implementation mappings. For instance, should we also include `"low-latency"` as a performance option, or should we rename `"default"` to `"auto"` (alluding to an underlying process, rather than a default setting). +- Document the valid use cases for requesting a certain device type or combination of devices, and within what error conditions. Currently, after these changes there remains explicit support for GPU-only context when an [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext) is created from a [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) in [createContext()](https://webmachinelearning.github.io/webnn/#api-ml-createcontext). \ No newline at end of file From acf8b514b82f65972eaa144bbab2f6e84029b7f7 Mon Sep 17 00:00:00 2001 From: Zoltan Kis Date: Thu, 16 Jan 2025 15:14:36 +0200 Subject: [PATCH 06/15] Improve the Minimum Viable Solution section, move the Considered Alternatives section Signed-off-by: Zoltan Kis --- device-selection-explainer.md | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index fe3fb1aa..5729cbd0 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -60,13 +60,6 @@ Design decisions should take the following into account: 7. As a corollary to 6, allow creating a context using also options for [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary). -## Considered alternatives - -1. Keep the current [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) as a context option, but improve the device type names and specify an algorithm for a mapping these names to various real adaptors (with their given characteristics). However, this would be more limited than being able to specify device specific limits to context creation. - -2. Remove [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype), but define a set of [context options](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions) that map well to GPU adapter/device selection and also to NPU device selection. - -3. Follow this [proposal](https://github.com/webmachinelearning/webnn/issues/749#issuecomment-2429821928), also tracked in [[MLOpSupportLimits should be opt-in #759]](https://github.com/webmachinelearning/webnn/issues/759). That is, allow listing op support limits outside of a context, which would return all available devices with their op support limits. Then the web app could choose one of them to initialize a context with. ## Scenarios, examples, design discussion @@ -111,6 +104,7 @@ What exactly is best to pass as context options? Op support limits? Supported fe Update the security and privacy section. Would the proposals here increase the fingerprinting surface? If yes, what mitigations can be made? The current understanding is that any extra information exposed to web apps in these proposals could be obtained by other methods as well. However, security hardening and relevant mitigations are recommended. For instance, implementations could choose the level of information (e.g. op support limits) exposed to a given origin. + ## Background thoughts ### Representing NPUs @@ -146,11 +140,23 @@ Many platforms support various hybrid execution scenarios involving NPU, CPU, an As an example for handling hybrid execution as well as the underlying challenges, take a look at [OpenVINO device selection](https://blog.openvino.ai/blog-posts/automatic-device-selection-and-configuration). -## Minimum Viable Solution +## Considered alternatives + +1. Keep the current [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) as a context option, but improve the device type names and specify an algorithm for a mapping these names to various real adaptors (with their given characteristics). However, this would be more limited than being able to specify device specific limits to context creation. (This is the current approach). + +2. Remove [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype), but define a set of [context options](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions) that map well to GPU adapter/device selection and also to NPU device selection. (This is the proposed first approach.) + +3. Follow this [proposal](https://github.com/webmachinelearning/webnn/issues/749#issuecomment-2429821928), also tracked in [[MLOpSupportLimits should be opt-in #759]](https://github.com/webmachinelearning/webnn/issues/759). That is, allow listing op support limits outside of a context, which would return all available devices with their op support limits. Then the web app could choose one of them to initialize a context with. (This is a suggested longer term discussion topic.) -Based on the discussion above, the best starting point would be a simple solution that can be extended and refined later. Namely, + +## Proposed Minimum Viable Solution + +Based on the discussion above, the best starting point would be a simple solution that can be extended and refined later. A first contribution could include the following changes: - Remove [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) as explicit [context option](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions). - Update [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext) so that it becomes device agnostic, or _default_/_generic_ context. Allow supporting multiple devices with one context. -- Add notes to implementations on how to map [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference) to devices. -- Improve the device selection hints in [context options](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions) and define their implementation mappings. For instance, should we also include `"low-latency"` as a performance option, or should we rename `"default"` to `"auto"` (alluding to an underlying process, rather than a default setting). -- Document the valid use cases for requesting a certain device type or combination of devices, and within what error conditions. Currently, after these changes there remains explicit support for GPU-only context when an [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext) is created from a [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) in [createContext()](https://webmachinelearning.github.io/webnn/#api-ml-createcontext). \ No newline at end of file +- Add algorithmic steps or notes to implementations on how to map [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference) to devices. + +Also, the following topics could be discussed already now, but decided later: +- Improve the device selection hints in [context options](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions) and define their implementation mappings. For instance, discuss whether should we also include a `"low-latency"` performance option. Also, discuss whether to rename `"default"` to `"auto"` (alluding to an underlying process, rather than a default setting). +- Document the valid use cases for requesting a certain device type or combination of devices, and within what error conditions. Currently, after these changes there remains explicit support for GPU-only context when an [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext) is created from a [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) in [createContext()](https://webmachinelearning.github.io/webnn/#api-ml-createcontext). +- Discuss option #3 from [Considered alternatives](#considered-alternatives). \ No newline at end of file From c327c967fc652c21f8ba5ba0c2409bc9fccab267 Mon Sep 17 00:00:00 2001 From: Zoltan Kis Date: Fri, 17 Jan 2025 17:41:10 +0200 Subject: [PATCH 07/15] Add feedback from Dwayne about NPU being faster than GPU in certain devices Signed-off-by: Zoltan Kis --- device-selection-explainer.md | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index 5729cbd0..92092a5a 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -23,10 +23,10 @@ Previous discussion covered the following main topics:MLContext - CPU vs GPU device selection, including handling multiple GPUs; - how to handle NPU devices, quantization/dequantization. -In [[Simplify MLContext creation #322]](https://github.com/webmachinelearning/webnn/pull/322), the proposal was to always use an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) object to initialize a context and remove the `"gpu"` [context option](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions). -Also, remove the `'high-performance"` [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference), since it was used for the GPU option, which now becomes explicit. +In [[Simplify MLContext creation #322]](https://github.com/webmachinelearning/webnn/pull/322), the proposal was to always use an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) object to initialize a context and remove the `"gpu"` [context option](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions). Also, remove the `'high-performance"` [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference), since it was used for the GPU option, which now becomes explicit. + Explicit GPU selection also provides clarity when there are multiple GPU devices, as implementations need to use [WebGPU](https://gpuweb.github.io/gpuweb/) in order to select a [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter), from where they can request a [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) object. -A counter-argument was that it becomes more complex to use an implementation selected default GPU, as there is no simple way any more to tell implementations to use any GPU device for creating an [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext). This concern could eventually be alleviated by keeping the `'high-performance"` [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference). +A counter-argument was that it becomes more complex to use an implementation selected default GPU, as there is no simple way any more to tell implementations to use any GPU device for creating an [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext). This concern could eventually be alleviated by keeping the `'high-performance"` [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference), but on some devices the NPU might be faster than the GPU. In [[Need to understand how WebNN supports implementation that involves multiple devices and timelines #350]](https://github.com/webmachinelearning/webnn/issues/350) it was pointed out that [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext) supports only a single device, while there are frameworks that support working with a single graph over multiple devices (e.g. CoreML). The proposal was to create a _default_ context that has no explicitly associated device (it could be also named a _generic_ context), where the implementation may choose the underlying device(s). @@ -95,42 +95,40 @@ const context = await navigator.ml.createContext({ fallback: ['npu', 'cpu'] }); ## Open questions -[WebGPU](https://gpuweb.github.io/gpuweb/) provides a way to select a GPU device, called [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter). Should we align the naming between GPU adapter and WebNN device? +- [WebGPU](https://gpuweb.github.io/gpuweb/) provides a way to select a GPU device, called [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter). Should we align the naming between GPU adapter and WebNN device? -Should we expose a similar adapter API for NPUs? Or could NPUs be represented as [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter) (basically a few text attributes)? +- Should we expose a similar adapter API for NPUs? Or could NPUs be represented as [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter) (basically a few text attributes)? -How should we extend the context options? +- How should we extend the context options? What exactly is best to pass as context options? Op support limits? Supported features, similar to [GPUSupportedFeatures](https://gpuweb.github.io/gpuweb/#gpusupportedfeatures)? Others? -Update the security and privacy section. Would the proposals here increase the fingerprinting surface? If yes, what mitigations can be made? The current understanding is that any extra information exposed to web apps in these proposals could be obtained by other methods as well. However, security hardening and relevant mitigations are recommended. For instance, implementations could choose the level of information (e.g. op support limits) exposed to a given origin. +- Concerning security and privacy, would the proposals here increase the fingerprinting surface? If yes, what mitigations can be made? The current understanding is that any extra information exposed to web apps in these proposals could be obtained by other methods as well. However, security hardening and relevant mitigations are recommended. For instance, implementations could choose the level of information (e.g. op support limits) exposed to a given origin. ## Background thoughts ### Representing NPUs -Earlier there have been ideas to represent NPUs in a similar way as WebGPU [adapters](https://gpuweb.github.io/gpuweb/#gpuadapter), basically exposing basic string information, features, limits, and whether they can be used as a fallback device. +There have been ideas to represent NPUs in a similar way as WebGPU [adapters](https://gpuweb.github.io/gpuweb/#gpuadapter), basically exposing basic string information, features, limits, and whether they can be used as a fallback device. However, this would likely be premature standardization, as NPUs are very heterogeneous in their implementations, for instance memory and processing unit architecture can be significantly different. Also, they can be either standalone devices (e.g. TPUs), or integrated as SoC modules, together with CPUs, and even GPUs. -There is a fundamental difference between programming NPUs vs. programming GPUs. From programming point of view, NPUs are very specific and need specialized drivers, which integrate into libraries and frameworks. Therefore they don't need explicitly exposed abstractions like in [WebGPU](https://gpuweb.github.io/gpuweb/), but they might have specific quantization requirements and limitations. +There is a fundamental difference between programming NPUs vs. programming GPUs. From programming point of view, NPUs are very specific and need specialized drivers, which integrate into AI libraries and frameworks. Therefore they don't need explicitly exposed abstractions like in [WebGPU](https://gpuweb.github.io/gpuweb/), but they might have specific quantization requirements and limitations. -The main use case for NPUs currently is mainly to offload more general purpose computing devices (CPU and even GPU) from machine learning compute loads. Power efficient performance is the main characteristic. +Currently the main use case for NPUs is to offload the more general purpose computing devices (CPU and GPU) from machine learning compute loads. Power efficient performance is the main characteristic. Therefore, use cases that include NPUs could be euphemistically represented by the `"low-power"` [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference), which could mean the following mappings (controlled by the underlying platform): - pure NPU execution, - NPU preferred, fallback to CPU, -- combined [multiple] NPU and CPU execution. +- combined [multiple] NPU and CPU or GPU execution. ### Selecting from multiple [types] of NPUs The proposal above uses [Web GPU](https://gpuweb.github.io/gpuweb) mechanisms to select a GPU device for a context. This covers support for multiple GPUs, even with different type and capabilities. -We don't have such mechanisms to select NPUs. Earlier there have been ideas to use a similar, if not the same approach as Web GPU. - -However, enumerating and managing adapters are not very web'ish designs. For instance, to avoid complexity and to minimize fingerprinting surfaces, the [Presentation API](https://www.w3.org/TR/presentation-api/) outsourced selecting the target device to the user agent, so that the web app can achieve the use case without being exposed with platform specific details. +We don't have such mechanisms to select NPUs. Also, enumerating and managing adapters are not very web'ish designs. For instance, in order to avoid this complexity and also to minimize fingerprinting surface, the [Presentation API](https://www.w3.org/TR/presentation-api/) outsourced selecting the target device to the user agent, so that the web app can achieve the use case without being exposed with platform specific details. -In the Web NN case, we cannot use such mechanisms, because the API is used by frameworks, not by web pages. +In the Web NN case, we cannot use such selection mechanisms delegated to the user agent, because the API is used by frameworks, not by web pages. As such, currently the handling of multiple NPUs (e.g. single model on multiple NPUs, or multiple models on multiple NPUs) is delegated to the implementations and underlying platforms. From 4b7688040cda9ffc736376c146e4ec3c8d1117cf Mon Sep 17 00:00:00 2001 From: Dwayne Robinson Date: Fri, 17 Jan 2025 16:29:14 -0800 Subject: [PATCH 08/15] "Web NN" -> "WebNN" consistent with spec --- device-selection-explainer.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index 92092a5a..2fac1668 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -2,11 +2,11 @@ ## Introduction -This explainer summarizes the discussion and background on [Web NN device selection](https://webmachinelearning.github.io/webnn/#programming-model-device-selection). +This explainer summarizes the discussion and background on [WebNN device selection](https://webmachinelearning.github.io/webnn/#programming-model-device-selection). -The goal is to help making design decisions on how to handle compute device selection for a Web NN [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext). +The goal is to help making design decisions on how to handle compute device selection for a WebNN [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext). -A context represents the global state of Web NN model graph execution, including the compute devices (e.g. CPU, GPU, NPU) the [Web NN graph](https://webmachinelearning.github.io/webnn/#mlgraph) is executed on. +A context represents the global state of WebNN model graph execution, including the compute devices (e.g. CPU, GPU, NPU) the [WebNN graph](https://webmachinelearning.github.io/webnn/#mlgraph) is executed on. When creating a context, an application may want to provide hints to the implementation on what device(s) are preferred for execution. @@ -128,13 +128,13 @@ The proposal above uses [Web GPU](https://gpuweb.github.io/gpuweb) mechanisms to We don't have such mechanisms to select NPUs. Also, enumerating and managing adapters are not very web'ish designs. For instance, in order to avoid this complexity and also to minimize fingerprinting surface, the [Presentation API](https://www.w3.org/TR/presentation-api/) outsourced selecting the target device to the user agent, so that the web app can achieve the use case without being exposed with platform specific details. -In the Web NN case, we cannot use such selection mechanisms delegated to the user agent, because the API is used by frameworks, not by web pages. +In the WebNN case, we cannot use such selection mechanisms delegated to the user agent, because the API is used by frameworks, not by web pages. As such, currently the handling of multiple NPUs (e.g. single model on multiple NPUs, or multiple models on multiple NPUs) is delegated to the implementations and underlying platforms. ### Hybrid execution scenarios using NPU, CPU and GPU -Many platforms support various hybrid execution scenarios involving NPU, CPU, and GPU (e.g. NPU-CPU, NPU-GPU, NPU-CPU-GPU), but these are not explicitly exposed and controlled in Web NN. They are best selected and controlled by the implementations. However, we should distillate the main use cases behind hybrid execution and define a hinting/mapping mechanism, such as the power preference mentioned earlier. +Many platforms support various hybrid execution scenarios involving NPU, CPU, and GPU (e.g. NPU-CPU, NPU-GPU, NPU-CPU-GPU), but these are not explicitly exposed and controlled in WebNN. They are best selected and controlled by the implementations. However, we should distillate the main use cases behind hybrid execution and define a hinting/mapping mechanism, such as the power preference mentioned earlier. As an example for handling hybrid execution as well as the underlying challenges, take a look at [OpenVINO device selection](https://blog.openvino.ai/blog-posts/automatic-device-selection-and-configuration). From 6d736c9d1ba96ad19f068bef6fdc2c479f3aedf7 Mon Sep 17 00:00:00 2001 From: Zoltan Kis Date: Mon, 20 Jan 2025 10:19:07 +0200 Subject: [PATCH 09/15] Apply suggestions from code review Corrections from Dwayne Co-authored-by: Dwayne Robinson --- device-selection-explainer.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index 2fac1668..0998c1e5 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -10,7 +10,7 @@ A context represents the global state of WebNN model graph execution, including When creating a context, an application may want to provide hints to the implementation on what device(s) are preferred for execution. -Implementations, browsers and underlying OS may want to control the allocation of compute devices for various use cases and system conditions. +Implementations, browsers, and the underlying OS may want to control the allocation of compute devices for various use cases and system conditions. The question is in what use cases who and how much should control the execution context. @@ -40,15 +40,15 @@ However, alternative policies for error handling and fallback scenarios remained Later the need for explicit device selection support was challenged in [[MLContextOptions.deviceType seems unnecessary outside of conformance testing #749]](https://github.com/webmachinelearning/webnn/issues/749), with the main arguments also summarized in a W3C TPAC group meeting [presentation](https://lists.w3.org/Archives/Public/www-archive/2024Sep/att-0006/MLDeviceType.pdf). The main points were the following: - The [device type](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) option is hard to standardize because of the heterogeneity of the compute units across various platforms, and even across their versions, for instance `"npu"` might not be a standalone option available, only a combined form of `"npu"` and `"cpu"`. - As for error management vs. fallback policies: fallback is preferable instead of failing, and implementations/the underlying platforms should determine the fallback type based on runtime information. -- Implementation / browser / OS have better grasp of the system/compute/runtime/apps state then websites, therefore control should be relished to them. For instance, if rendering performance degrades, the implementation/underlying platform can possibly fix it the best way, not the web app. +- Implementation / browser / OS have better grasp of the system/compute/runtime/apps state than websites, and therefore control should be relished to them. For instance, if rendering performance degrades, the implementation/underlying platform can possibly fix it the best way, not the web app. ## Key use cases and requirements Design decisions should take the following into account: -1. Allow the underlying platform ultimately choose the compute device. +1. Allow the underlying platform to ultimately choose the compute device. -2. Allow scripts to express hints/options when creating contexts, such as preference for low power consumption, or high performance (throughput), low latency, stable sustained performance etc. +2. Allow scripts to express hints/options when creating contexts, such as preference for low power consumption, or high performance (throughput), low latency, stable sustained performance, accuracy, etc. 3. Allow an easy way to create a context with a GPU device, i.e. without specifying an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice). From a5acab4eae180abb11f87ddff1cef3d13dc5f1ef Mon Sep 17 00:00:00 2001 From: Zoltan Kis Date: Mon, 20 Jan 2025 10:43:07 +0200 Subject: [PATCH 10/15] Addressing feedback from Dwayne Signed-off-by: Zoltan Kis --- device-selection-explainer.md | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index 0998c1e5..bf889535 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -70,12 +70,10 @@ Examples for user scenarios: context = await navigator.ml.createContext(); // create a context that will likely map to NPU, or NPU+CPU -context = - await navigator.ml.createContext({powerPreference: 'low-power'}); +context = await navigator.ml.createContext({powerPreference: 'low-power'}); // create a context that will likely map to GPU -context = - await navigator.ml.createContext({powerPreference: 'high-performance'}); +context = await navigator.ml.createContext({powerPreference: 'high-performance'}); // enumerate devices and limits (as allowed by policy/implementation) // and select one of them to create a context @@ -83,7 +81,7 @@ const limitsMap = await navigator.ml.opSupportLimitsPerDevice(); // analyze the map and select an op support limit set // ... const context = await navigator.ml.createContext({ - limits: deviceLimitsMap['npu1'] + limits: limitsMap['npu1'] }); // as an alternative, hint a preferred fallback order ["npu", "cpu"] @@ -95,9 +93,7 @@ const context = await navigator.ml.createContext({ fallback: ['npu', 'cpu'] }); ## Open questions -- [WebGPU](https://gpuweb.github.io/gpuweb/) provides a way to select a GPU device, called [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter). Should we align the naming between GPU adapter and WebNN device? - -- Should we expose a similar adapter API for NPUs? Or could NPUs be represented as [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter) (basically a few text attributes)? +- [WebGPU](https://gpuweb.github.io/gpuweb/) provides a way to select a GPU device via [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter). Should we expose a similar adapter API for NPUs? - How should we extend the context options? What exactly is best to pass as context options? Op support limits? Supported features, similar to [GPUSupportedFeatures](https://gpuweb.github.io/gpuweb/#gpusupportedfeatures)? Others? @@ -153,8 +149,10 @@ Based on the discussion above, the best starting point would be a simple solutio - Remove [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) as explicit [context option](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions). - Update [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext) so that it becomes device agnostic, or _default_/_generic_ context. Allow supporting multiple devices with one context. - Add algorithmic steps or notes to implementations on how to map [power preference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference) to devices. +- Also, to align with [GPUPowerPreference](https://gpuweb.github.io/gpuweb/#enumdef-gpupowerpreference), we should remove the `"default"` [MLPowerPreference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference), i.e. the lack of hints will result in creating a generic context. Also, the following topics could be discussed already now, but decided later: -- Improve the device selection hints in [context options](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions) and define their implementation mappings. For instance, discuss whether should we also include a `"low-latency"` performance option. Also, discuss whether to rename `"default"` to `"auto"` (alluding to an underlying process, rather than a default setting). +- Improve the device selection hints in [context options](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions) and define their implementation mappings. For instance, discuss whether should we also include a `"low-latency"` performance option. + - Document the valid use cases for requesting a certain device type or combination of devices, and within what error conditions. Currently, after these changes there remains explicit support for GPU-only context when an [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext) is created from a [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) in [createContext()](https://webmachinelearning.github.io/webnn/#api-ml-createcontext). - Discuss option #3 from [Considered alternatives](#considered-alternatives). \ No newline at end of file From b71c679c14e8ff332bc1acf416aa2b298d91da1e Mon Sep 17 00:00:00 2001 From: Anssi Kostiainen Date: Tue, 21 Jan 2025 12:00:17 +0200 Subject: [PATCH 11/15] Remove stray MLContext --- device-selection-explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index bf889535..57009307 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -18,7 +18,7 @@ Currently this is captured by [context options](https://webmachinelearning.githu ## History -Previous discussion covered the following main topics:MLContext +Previous discussion covered the following main topics: - who controls the execution context: script vs. user agent (OS); - CPU vs GPU device selection, including handling multiple GPUs; - how to handle NPU devices, quantization/dequantization. From 1e29e883e18572d52bfcc884d9f4d96687c0704c Mon Sep 17 00:00:00 2001 From: Anssi Kostiainen Date: Tue, 21 Jan 2025 12:00:54 +0200 Subject: [PATCH 12/15] Grammar consistency: adaptor -> adapter --- device-selection-explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index 57009307..197e27d8 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -136,7 +136,7 @@ As an example for handling hybrid execution as well as the underlying challenges ## Considered alternatives -1. Keep the current [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) as a context option, but improve the device type names and specify an algorithm for a mapping these names to various real adaptors (with their given characteristics). However, this would be more limited than being able to specify device specific limits to context creation. (This is the current approach). +1. Keep the current [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype) as a context option, but improve the device type names and specify an algorithm for a mapping of these names to various real adapters (with their given characteristics). However, this would be more limited than being able to specify device specific limits to context creation. (This is the current approach). 2. Remove [MLDeviceType](https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype), but define a set of [context options](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions) that map well to GPU adapter/device selection and also to NPU device selection. (This is the proposed first approach.) From 0275a4fcbc4be4623fdb6212b81b4cd5aead25c6 Mon Sep 17 00:00:00 2001 From: Anssi Kostiainen Date: Tue, 21 Jan 2025 12:01:23 +0200 Subject: [PATCH 13/15] Grammar nit: noun -> verb --- device-selection-explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index 197e27d8..3934a33c 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -130,7 +130,7 @@ As such, currently the handling of multiple NPUs (e.g. single model on multiple ### Hybrid execution scenarios using NPU, CPU and GPU -Many platforms support various hybrid execution scenarios involving NPU, CPU, and GPU (e.g. NPU-CPU, NPU-GPU, NPU-CPU-GPU), but these are not explicitly exposed and controlled in WebNN. They are best selected and controlled by the implementations. However, we should distillate the main use cases behind hybrid execution and define a hinting/mapping mechanism, such as the power preference mentioned earlier. +Many platforms support various hybrid execution scenarios involving NPU, CPU, and GPU (e.g. NPU-CPU, NPU-GPU, NPU-CPU-GPU), but these are not explicitly exposed and controlled in WebNN. They are best selected and controlled by the implementations. However, we should distill the main use cases behind hybrid execution and define a hinting/mapping mechanism, such as the power preference mentioned earlier. As an example for handling hybrid execution as well as the underlying challenges, take a look at [OpenVINO device selection](https://blog.openvino.ai/blog-posts/automatic-device-selection-and-configuration). From 7a76758e30e0f41fa628b8095b94d640c8c2bcec Mon Sep 17 00:00:00 2001 From: Anssi Kostiainen Date: Tue, 21 Jan 2025 12:01:37 +0200 Subject: [PATCH 14/15] Add "Participate" header --- device-selection-explainer.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index 3934a33c..09ed97ae 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -1,5 +1,12 @@ # Device Selection Explainer +## Participate + +Feedback on this explainer is welcome via the issue tracker: + +- https://github.com/webmachinelearning/webnn/labels/device%20selection - existing discussions about device selection mechanisms +- see also [all issues](https://github.com/webmachinelearning/webnn/issues) and feel free to open a new issue as appropriate + ## Introduction This explainer summarizes the discussion and background on [WebNN device selection](https://webmachinelearning.github.io/webnn/#programming-model-device-selection). From 15cd469014bdd13ced2cade12642b547ad0e661f Mon Sep 17 00:00:00 2001 From: Anssi Kostiainen Date: Tue, 21 Jan 2025 12:02:02 +0200 Subject: [PATCH 15/15] Grammar: simplify language --- device-selection-explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index 09ed97ae..9d3d6ac7 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -159,7 +159,7 @@ Based on the discussion above, the best starting point would be a simple solutio - Also, to align with [GPUPowerPreference](https://gpuweb.github.io/gpuweb/#enumdef-gpupowerpreference), we should remove the `"default"` [MLPowerPreference](https://webmachinelearning.github.io/webnn/#enumdef-mlpowerpreference), i.e. the lack of hints will result in creating a generic context. Also, the following topics could be discussed already now, but decided later: -- Improve the device selection hints in [context options](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions) and define their implementation mappings. For instance, discuss whether should we also include a `"low-latency"` performance option. +- Improve the device selection hints in [context options](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions) and define their implementation mappings. For instance, discuss whether to also include a `"low-latency"` performance option. - Document the valid use cases for requesting a certain device type or combination of devices, and within what error conditions. Currently, after these changes there remains explicit support for GPU-only context when an [MLContext](https://webmachinelearning.github.io/webnn/#mlcontext) is created from a [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) in [createContext()](https://webmachinelearning.github.io/webnn/#api-ml-createcontext). - Discuss option #3 from [Considered alternatives](#considered-alternatives). \ No newline at end of file