From 7291a5807e59b45623b669875baec8a038512712 Mon Sep 17 00:00:00 2001
From: Etienne Dechamps <etienne@edechamps.fr>
Date: Fri, 7 Dec 2018 22:23:56 +0000
Subject: [PATCH] Adjust buffer size and suggested latency defaults, and update
 docs.

Following experiments with full-duplex operation in REW, it has been
observed that the best way to get glitch-free operation is to use an
ASIO buffer size of at least 20 ms and a suggested latency of at
least 3 times the buffer size. This commit makes these the new
defaults, and updates the documentation with the findings.
---
 BACKENDS.md               | 11 +++---
 CONFIGURATION.md          | 82 ++++++++++++++++++++++-----------------
 FAQ.md                    | 17 ++++++--
 README.md                 |  2 +-
 src/FlexASIO/flexasio.cpp |  7 ++--
 5 files changed, 71 insertions(+), 48 deletions(-)

diff --git a/BACKENDS.md b/BACKENDS.md
index 2203788a..6d14bf19 100644
--- a/BACKENDS.md
+++ b/BACKENDS.md
@@ -35,7 +35,8 @@ choice of backend can have a large impact on the behaviour of the overall audio
 pipeline. In particular, choice of backend can affect:
 
 - **Reliability:** some backends can be more likely to work than others, because
-  of the quality and maturity of the code involved.
+  of the quality and maturity of the code involved. Some might be more likely
+  to introduce audio discontinuities (glitches).
 - **Ease of use:** Some backends are more "opinionated" than others about when
   it comes to which audio formats (i.e. sample rate, number of channels) they
   will accept. They might refuse to work if the audio format is not exactly the
@@ -93,8 +94,8 @@ conversion, mixing, and APOs. As a result it is extremely permissive when it
 comes to audio formats. It should be expected to behave just like a typical
 Windows application would. Its latency should be expected to be mediocre at
 best, as MME was never designed for low-latency operation. This is compounded by
-the fact that MME appears to [behave very poorly][issue30] with small (<40 ms)
-buffer sizes.
+the fact that MME appears to [behave very poorly][issue30] with small buffer
+sizes.
 
 Latency numbers reported by MME do not seem to take the Windows audio pipeline
 into account. This means the reported latency is underestimated by at least 20
@@ -118,8 +119,8 @@ Windows pipeline, converting as necessary.
 
 One would expect latency to be somewhat better than MME, though it's not clear
 if that's really the case in practice. The DirectSound backend has been observed
-to [behave very poorly][issue29] with small (<=30 ms) buffer sizes on the input
-side, making it a poor choice for low-latency capture use cases.
+to [behave very poorly][issue29] with small buffer sizes on the input side,
+making it a poor choice for low-latency capture use cases.
 
 Modern versions of Windows implement the DirectSound API by using WASAPI
 internally, making this backend a "second-class citizen" compared to WASAPI and
diff --git a/CONFIGURATION.md b/CONFIGURATION.md
index 74a65c79..1ec41896 100644
--- a/CONFIGURATION.md
+++ b/CONFIGURATION.md
@@ -52,7 +52,7 @@ channels = 6
 wasapiExclusiveMode = true
 ```
 
-Experimentally, the following set of options have been shown to be a good
+Experimentally, the following set of options has been shown to be a good
 starting point for low latency operation:
 
 ```toml
@@ -60,9 +60,11 @@ backend = "Windows WASAPI"
 bufferSizeSamples = 480
 
 [input]
+suggestedLatencySeconds = 0.0
 wasapiExclusiveMode = true
 
 [output]
+suggestedLatencySeconds = 0.0
 wasapiExclusiveMode = true
 ```
 
@@ -103,37 +105,34 @@ The default behaviour is to use DirectSound.
 *Integer*-typed option that determines which ASIO buffer size (in samples)
 FlexASIO will suggest to the ASIO Host application.
 
-This option can have a major impact on reliability and latency. Smaller buffers
-will reduce latency but will increase the likelihood of glitches/discontinuities
-(buffer overflow/underrun) if the audio pipeline is not fast enough.
+This option, in combination with
+[`suggestedLatencySeconds`][suggestedLatencySeconds],
+can have a major impact on reliability and latency. Smaller buffers will reduce
+latency but will increase the likelihood of glitches/discontinuities (buffer
+overflow/underrun) if the audio pipeline is not fast enough.
 
 Note that some host applications might already provide a user-controlled buffer
 size setting; in this case, there should be no need to use this option. It is
 useful only when the application does not provide a way to customize the buffer
 size.
 
-The ASIO buffer size is also used as the PortAudio buffer size, as FlexASIO
-bridges the two. Note that, for various technical reasons and depending on the
-backend and settings used (especially the
+The ASIO buffer size is also used as the PortAudio "front" (user) buffer size,
+as FlexASIO bridges the two. Note that, for various technical reasons and
+depending on the backend and settings used (especially the
 [`suggestedLatencySeconds` option][suggestedLatencySeconds]), there are many
 scenarios where additional buffers will be inserted in the audio pipeline
 (either by PortAudio or by Windows itself), *in addition* to the ASIO buffer.
 This can result in overall latency being higher than what the ASIO buffer size
 alone would suggest.
 
-**Note:** each [backend][BACKENDS] has its own inherent limitations when it
-comes to buffer sizes. It has been observed that some backends (especially
-DirectSound and MME) simply cannot work properly with small buffer sizes (e.g.
-30 ms or less).
-
 Example:
 
 ```toml
-bufferSizeSamples = 3840 # 80 ms at 48 kHz
+bufferSizeSamples = 1920 # 40 ms at 48 kHz
 ```
 
 The default behaviour is to advertise minimum, preferred and maximum buffer
-sizes of 1 ms, 40 ms and 1 s, respectively. The resulting sizes in samples are
+sizes of 1 ms, 20 ms and 1 s, respectively. The resulting sizes in samples are
 computed based on whatever sample rate the driver is set to when the application
 enquires.
 
@@ -163,9 +162,14 @@ FlexASIO will fail to initialize.
 
 If the option is set to the empty string (`""`), no device will be used; that
 is, the input or output side of the stream will be disabled, and all other
-options in the section will be ignored. Note that making your ASIO Host
-Application unselect all input channels or all output channels will achieve the
-same result.
+options in the section will be ignored. Making your ASIO Host Application
+unselect all input channels or all output channels will achieve the same result.
+
+**Note:** using both input and output devices (full duplex mode) puts additional
+constraints on the [backend][BACKENDS] due to the need to synchronize buffer
+delivery. It makes discontinuities (glitches) more likely and increases the
+lowest achievable latency. It is recommended to only use a single device (half
+duplex mode) if possible.
 
 Example:
 
@@ -266,41 +270,46 @@ default.
 #### Option `suggestedLatencySeconds`
 
 *Floating-point*-typed option that determines the amount of audio latency (in
-seconds) that FlexASIO will "suggest" to PortAudio. In some cases this can
-influence the amount of additional buffering that will be introduced in the
-audio pipeline in addition to the ASIO buffer itself. As a result, this option
-can have a major impact on reliability and latency.
-
-**Note:** it rarely makes sense to use this option; the default value should be
-appropriate for most use cases. It usually makes more sense to adjust the ASIO
-buffer size (see the [`bufferSizeSamples` option][bufferSizeSamples]) instead.
+seconds) that FlexASIO will "suggest" to PortAudio. Typically, this has the
+effect of increasing the amount of additional buffering that PortAudio will
+introduce in the audio pipeline in addition to the ASIO buffer itself (see
+[`bufferSizeSamples`][bufferSizeSamples]). As a result, this option can have a
+major impact on reliability and latency.
 
 The value of this option is only a hint; the resulting latency can be very
 different from the value of this option. PortAudio [backends][BACKENDS]
-interpret this setting in complicated and confusing ways, so it is recommended
-to experiment with various values.
+interpret this setting in complicated and confusing ways, and it interacts
+strongly with the ASIO buffer size, so it is recommended to experiment with
+various values.
+
+Setting this option to `0.0` will request the lowest possible latency that
+FlexASIO will provide.
 
-**Note:** with WASAPI Exclusive, when using only a single device (i.e. input
-*or* output), it is recommended to leave this option to its default value of
-`0.0`. Other values have been observed to make latency much worse.
+**Note:** using both input and output devices (full duplex mode) puts more
+buffering constraints on the backend due to synchronization requirements. Using
+a low suggested latency value in this case is likely to cause audio
+discontinuities (glitches). This is less of a problem when using a single device
+(half duplex mode).
 
-**Note:** the TOML parser that FlexASIO uses require all floating point values
-to have a decimal point. So, for example, `1` will not work, but `1.0` will.
+**Note:** when using the [WASAPI backend][WASAPI] in Exclusive mode and a single
+device (input *or* output), a zero suggested latency is
+[handled specially][portaudio287] and makes the backend use a different
+buffering scheme, dramatically reducing latency.
 
 Example:
 
 ```toml
 [output]
-suggestedLatencySeconds = 0.010
+suggestedLatencySeconds = 0.050 # 50 ms
 ```
 
-The default value is `0.0`.
+The default value is 3 times the ASIO buffer length.
 
 #### Option `wasapiExclusiveMode`
 
 *Boolean*-typed option that determines if the stream should be opened in
 *WASAPI Shared* or in *WASAPI Exclusive* mode. For more information, see
-the [WASAPI backend documentation][].
+the [WASAPI backend documentation][WASAPI].
 
 This option is ignored if the backend is not WASAPI. See the
 [`backend` option][backend].
@@ -325,7 +334,8 @@ The default behaviour is to open the stream in *shared* mode.
 [INI files]: https://en.wikipedia.org/wiki/INI_file
 [logging]: README.md#logging
 [official TOML documentation]: https://github.com/toml-lang/toml#toml
+[portaudio287]: https://app.assembla.com/spaces/portaudio/tickets/287-wasapi-interprets-a-zero-suggestedlatency-in-surprising-ways
 [PortAudioDevices]: README.md#device-list-program
 [suggestedLatencySeconds]: #option-suggestedLatencySeconds
 [TOML]: https://en.wikipedia.org/wiki/TOML
-[WASAPI backend documentation]: BACKENDS.md#wasapi-backend
+[WASAPI]: BACKENDS.md#wasapi-backend
diff --git a/FAQ.md b/FAQ.md
index e32cdad4..a6c71bce 100644
--- a/FAQ.md
+++ b/FAQ.md
@@ -98,12 +98,18 @@ two typical causes:
    poorly optimized, and the pipeline is unable to keep up.
  - **Overly tight scheduling constraints**, which make it impossible to run the
    audio streaming event code (callback) in time.
-   - This is especially likely to occur when using very small buffer sizes (e.g.
-     10 ms or less).
+   - This is especially likely to occur when using very small buffer sizes
+     (smaller than the default values). See the
+     [`bufferSizeSamples`][bufferSizeSamples] and
+     [`suggestedLatencySeconds`][suggestedLatencySeconds] options.
    - Small buffer sizes require audio streaming events to fire with very tight
      deadlines, which can put too much pressure on the Windows thread scheduler
      or other parts of the system, especially when combined with expensive
      processing (see above).
+   - Scheduling constraints are tighter when using both input and output
+     devices (full duplex mode), even if both devices are backed by the same
+     hardware. Problems are less likely to occur when using only the input, or
+     only the output (half duplex mode).
  - **[FlexASIO logging][logging] is enabled**.
    - FlexASIO writes to the log using blocking file I/O from critical real-time
      code paths. This can easily lead to missed deadlines, especially with small
@@ -130,6 +136,9 @@ ASIO Host Applications make it possible to change the ASIO Buffer Size in the
 application itself. For those that don't, use the
 [`bufferSizeSamples` option][bufferSizeSamples].
 
+Finally, the [`suggestedLatencySeconds`][suggestedLatencySeconds] option should
+be set to the smallest possible value that works.
+
 In the end, a typical low-latency configuration might look something like this:
 
 ```toml
@@ -137,9 +146,11 @@ backend = "Windows WASAPI"
 bufferSizeSamples = 480 # 10 ms at 48 kHz
 
 [input]
+suggestedLatencySeconds = 0.0
 wasapiExclusiveMode = true
 
 [output]
+suggestedLatencySeconds = 0.0
 wasapiExclusiveMode = true
 ```
 
@@ -160,7 +171,7 @@ There are also a couple of things to keep in mind when interpreting latency
 numbers:
 
  - The reported latency **can be higher than the ASIO buffer size**.
-   - This is perfectly normal with some configurations, because some backends
+   - This is perfectly normal in most configurations, because some backends
      will add additional buffering on top of the ASIO buffer itself.
  - The reported latency usually **does not take the underlying hardware into
    account**.
diff --git a/README.md b/README.md
index 26de1f1d..7d892225 100644
--- a/README.md
+++ b/README.md
@@ -49,7 +49,7 @@ The default settings are as follows:
  - DirectSound [backend][BACKENDS]
  - Uses the Windows default recording and playback audio devices
  - 32-bit float sample type
- - 40 ms "preferred" buffer size
+ - 20 ms "preferred" buffer size
 
 All of the above can be customized using a [configuration file][CONFIGURATION].
 
diff --git a/src/FlexASIO/flexasio.cpp b/src/FlexASIO/flexasio.cpp
index be62c937..bf952d9e 100644
--- a/src/FlexASIO/flexasio.cpp
+++ b/src/FlexASIO/flexasio.cpp
@@ -370,7 +370,7 @@ namespace flexasio {
 			Log() << "Calculating default buffer size based on " << sampleRate << " Hz sample rate";
 			*minSize = long(sampleRate * 0.001); // 1 ms, there's basically no chance we'll get glitch-free streaming below this
 			*maxSize = long(sampleRate); // 1 second, more would be silly
-			*preferredSize = long(sampleRate * 0.04); // 40 ms - see https://github.com/dechamps/FlexASIO/issues/29
+			*preferredSize = long(sampleRate * 0.02); // 20 ms
 			*granularity = 1; // Don't care
 		}
 		Log() << "Returning: min buffer size " << *minSize << ", max buffer size " << *maxSize << ", preferred buffer size " << *preferredSize << ", granularity " << *granularity;
@@ -474,6 +474,7 @@ namespace flexasio {
 		PaStreamParameters common_parameters = { 0 };
 		common_parameters.sampleFormat = paNonInterleaved;
 		common_parameters.hostApiSpecificStreamInfo = NULL;
+		common_parameters.suggestedLatency = 3 * framesPerBuffer / sampleRate;
 
 		PaWasapiStreamInfo common_wasapi_stream_info = { 0 };
 		if (hostApi.info.type == paWASAPI) {
@@ -490,7 +491,7 @@ namespace flexasio {
 			input_parameters.device = inputDevice->index;
 			input_parameters.channelCount = GetInputChannelCount();
 			input_parameters.sampleFormat |= inputSampleType->pa;
-			input_parameters.suggestedLatency = config.input.suggestedLatencySeconds.has_value() ? *config.input.suggestedLatencySeconds : defaultSuggestedLatency;
+			if (config.input.suggestedLatencySeconds.has_value()) input_parameters.suggestedLatency = *config.input.suggestedLatencySeconds;
 			if (hostApi.info.type == paWASAPI)
 			{
 				const auto inputChannelMask = GetInputChannelMask();
@@ -514,7 +515,7 @@ namespace flexasio {
 			output_parameters.device = outputDevice->index;
 			output_parameters.channelCount = GetOutputChannelCount();
 			output_parameters.sampleFormat |= outputSampleType->pa;
-			output_parameters.suggestedLatency = config.output.suggestedLatencySeconds.has_value() ? *config.output.suggestedLatencySeconds : defaultSuggestedLatency;
+			if (config.output.suggestedLatencySeconds.has_value()) output_parameters.suggestedLatency = *config.output.suggestedLatencySeconds;
 			if (hostApi.info.type == paWASAPI)
 			{
 				const auto outputChannelMask = GetOutputChannelMask();