-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QEMU Audio support in VNC Viewer (currently Windows only) #1478
Open
mkupchik
wants to merge
6
commits into
TigerVNC:master
Choose a base branch
from
mkupchik:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
31aa3f8
QEMU Audio support on Windows
mkupchik fc2c633
Revert "QEMU Audio support on Windows"
mkupchik 8c032e2
Merge branch 'TigerVNC:master' into master
mkupchik 91ca066
QEMU Audio support on Windows
mkupchik 7e7483d
Merge branch 'TigerVNC:master' into master
mkupchik 23b2e4f
Switched to 48 kHz output sample rate
mkupchik File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This complexity is something that doesn't really fit well in
CMsgReader
. How large are these audio chunks? Can't we let the input stream handle the buffering?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Audio chunks may be large, there are no protocol limits besides UINT32_MAX. Bad-behaved VNC server may outgrow MAX_BUF_SIZE in common/rdr/BufferedInStream.cxx. That's why I prefer not to leave audio samples in the input buffer of BufferedInStream, and move them to circular playback buffer of Win32AudioOutput as soon as they arrive. Calls to memcpy happen in Win32AudioOutput::addSamples(), usually once, or twice if wraparound in the target circular buffer happens, and caller of that interface must provide a pointer and length of continuous chunk of source data. This continuous chunk of source data is owned by BufferedInStream.
When audio frame is fully received, Win32AudioOutput::submitSamples() makes one or two heap allocations of struct Win32AudioOutput::HdrInSlist, then calls waveOutPrepareHeader() WinAPI function (which boils down to allocation of struct OVERLAPPED on heap, creation of kernel event object and some pointer bookkeeping), then it calls waveOutWrite() WinAPI function (which boils down to submitting asyncronous I/O request to the NT kernel device driver, the audio mixer). There's no need to move bytes around in this context, asyncronous NT I/O request in flight refers to the audio samples in circular buffer owned by Win32AudioOutput which were added there earlier, by Win32AudioOutput::addSamples(). So while audio playback I/O request is still in flight, original socket buffer owned by BufferedInStream may contain different data, may be reallocated, may be empty etc.
When kernel completes asyncronous I/O request, it signals via kernel event object and awakes internal worker thread started by winmm.dll/wdmaud.drv. This thread invokes our Win32AudioOutput::waveOutCallback(). There we link struct Win32AudioOutput::HdrInSlist into slist to dispose it later, during the next call to Win32AudioOutput::submitSamples() or in the destructor of Win32AudioOutput. Before disposing Win32AudioOutput::HdrInSlist we call waveOutUnprepareHeader(), which boils down to deallocating struct OVERLAPPED on heap and destruction of kernel event object.
If a bad-behaved VNC server sends unreasonably large audio frame, then we clamp it at Win32AudioOutput::addSamples(): samples which do not fit into the circular buffer owned by Win32AudioOutput are de facto discarded, but reported as consumed to the caller, so the caller can release this space in BufferedInStream, and continue to do so until Win32AudioOutput::submitSamples() is called at the end of audio frame.
Summarizing all of the above, I think two buffers (socket I/O buffer and async playback buffer) are necessary.
So, what we need from BufferedInStream in this context of CMsgReader is an API to fetch some fresh data from socket and provide raw pointer to the continuous chunk of memory along with number of available bytes. There's an upper limit on the number of bytes we need here (nAudioBytesLeft), and there's no lower limit. We're happy to consume less than nAudioBytesLeft bytes here (especially if nAudioBytesLeft is large, i.e. this is a large audio frame). If there's partial audio sample/frame (e.g. less than 4 bytes for default audio format, 16-bit stereo), then up to 3 bytes will be left unconsumed in the BufferedInStream, for later use when more data arrives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The double buffering sounds fine, given the varying lifetime.
But I'd still like to get rid of the complexity. Since there is a limit anyway (which seems to be about 1 MiB if I read the code correctly), then this extra complexity doesn't seem worth it.
We don't need to support the theoretical extremes. Supporting what is actually used out there is quite sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this complexity can be reduced by adding a new method to rdr::InStream, which takes a number of bytes wanted by caller, similar to hasData(), but unlike hasData() does not require all of that bytes to be available. Instead it just returns a pointer and length of what is available. Length may be less than requested. Also this new method must call rdr::InStream::overrun() for actual I/O on socket to happen. Then code in CMsgReader will be shorter. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think I need to provide justification for MSGSTATE_AUDIO_DATA. It's true that entire 1 MiB of audio samples can fit in socket I/O buffer and be submitted for playback in one go. However consider ill-behaved VNC server which sends 1 GiB of audio data, so 1 MiB of these data is going to be submitted for playback and the rest (1023 MiB) is going to be discarded. How we can model this logic in the terms of rdr::InStream: "keep N bytes at the beginning of the buffer, but discard everything above N?" I think we cannot do this, rdr::InStream is designed for FIFO-style consumption of incoming data, we can't cut away samples/bytes/whatever in the middle of that buffer. Hence the compexity in CMsgReader, the new state of the FSM, and the double buffering. It's just a coincidence that double buffering fits nicely with async nature of audio I/O on Windows.