Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept bytes-like objects #478

Closed
aaugustin opened this issue Sep 23, 2018 · 5 comments
Closed

Accept bytes-like objects #478

aaugustin opened this issue Sep 23, 2018 · 5 comments

Comments

@aaugustin
Copy link
Member

isinstance(thing, collections.abc.ByteString) likely beats isinstance(thing, bytes).

We just need to make sure that it doesn't require any other changes.

Suggested by @cjerdonek in #296 (comment).

@aaugustin
Copy link
Member Author

In fact ByteString will provide support for bytes and bytearray but not memoryview.

@aaugustin
Copy link
Member Author

memoryview objects won't work out of the box.

Frame.write writes the payload to an io.BytesIO object. This is only supported if the memoryview is contiguous:

>>> import io
>>> output = io.BytesIO()
>>> m = memoryview(b'binary payload')
>>> output.write(m)
14
>>> output.write(m[2:10:3])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BufferError: memoryview: underlying buffer is not C-contiguous

Let's leave users call tobytes() until someone comes up with a good use case.

@aaugustin
Copy link
Member Author

I thought this would do the job:

diff --git a/src/websockets/framing.py b/src/websockets/framing.py
index afbc664..c393f79 100644
--- a/src/websockets/framing.py
+++ b/src/websockets/framing.py
@@ -245,10 +245,10 @@ def encode_data(data):
     # Expect str or bytes, return bytes.
     if isinstance(data, str):
         return data.encode('utf-8')
-    elif isinstance(data, bytes):
+    elif isinstance(data, collections.abc.ByteString):
         return data
     else:
-        raise TypeError("data must be bytes or str")
+        raise TypeError("data must be bytes-like or str")


 def parse_close(data):
diff --git a/src/websockets/protocol.py b/src/websockets/protocol.py
index 6e15853..14893e7 100644
--- a/src/websockets/protocol.py
+++ b/src/websockets/protocol.py
@@ -446,7 +446,7 @@ class WebSocketCommonProtocol(asyncio.StreamReaderProtocol):
         if isinstance(data, str):
             yield from self.write_frame(True, OP_TEXT, data.encode('utf-8'))

-        elif isinstance(data, bytes):
+        elif isinstance(data, collections.abc.ByteString):
             yield from self.write_frame(True, OP_BINARY, data)

         # Fragmented message -- regular iterator.
@@ -463,7 +463,7 @@ class WebSocketCommonProtocol(asyncio.StreamReaderProtocol):
             if isinstance(data, str):
                 yield from self.write_frame(False, OP_TEXT, data.encode('utf-8'))
                 encode_data = True
-            elif isinstance(data, bytes):
+            elif isinstance(data, collections.abc.ByteString):
                 yield from self.write_frame(False, OP_BINARY, data)
                 encode_data = False
             else:
diff --git a/tests/test_framing.py b/tests/test_framing.py
index ee85157..fb43b25 100644
--- a/tests/test_framing.py
+++ b/tests/test_framing.py
@@ -158,6 +158,9 @@ class FramingTests(unittest.TestCase):
     def test_encode_data_bytes(self):
         self.assertEqual(encode_data(b'tea'), b'tea')

+    def test_encode_data_bytearray(self):
+        self.assertEqual(encode_data(b'tea'), bytearray(b'tea'))
+
     def test_encode_data_other(self):
         with self.assertRaises(TypeError):
             encode_data(None)
diff --git a/tests/test_protocol.py b/tests/test_protocol.py
index d9aea19..1180ebd 100644
--- a/tests/test_protocol.py
+++ b/tests/test_protocol.py
@@ -471,6 +471,10 @@ class CommonTests:
         self.loop.run_until_complete(self.protocol.send(b'tea'))
         self.assertOneFrameSent(True, OP_BINARY, b'tea')

+    def test_send_binary_from_bytearray(self):
+        self.loop.run_until_complete(self.protocol.send(bytearray(b'tea')))
+        self.assertOneFrameSent(True, OP_BINARY, b'tea')
+
     def test_send_type_error(self):
         with self.assertRaises(TypeError):
             self.loop.run_until_complete(self.protocol.send(42))
@@ -490,6 +494,14 @@ class CommonTests:
             (False, OP_BINARY, b'te'), (False, OP_CONT, b'a'), (True, OP_CONT, b'')
         )

+    def test_send_iterable_binary_from_bytearray(self):
+        self.loop.run_until_complete(
+            self.protocol.send([bytearray(b'te'), bytearray(b'a')])
+        )
+        self.assertFramesSent(
+            (False, OP_BINARY, b'te'), (False, OP_CONT, b'a'), (True, OP_CONT, b'')
+        )
+
     def test_send_empty_iterable(self):
         self.loop.run_until_complete(self.protocol.send([]))
         self.assertNoFrameSent()

Alas:

$ python setup.py build_ext --inplace
running build_ext
building 'websockets.speedups' extension
creating build
creating build/temp.macosx-10.13-x86_64-3.7
creating build/temp.macosx-10.13-x86_64-3.7/src
creating build/temp.macosx-10.13-x86_64-3.7/src/websockets
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -I/Users/myk/.pyenv/versions/3.7.0/include/python3.7m -c src/websockets/speedups.c -o build/temp.macosx-10.13-x86_64-3.7/src/websockets/speedups.o
creating build/lib.macosx-10.13-x86_64-3.7
creating build/lib.macosx-10.13-x86_64-3.7/websockets
clang -bundle -undefined dynamic_lookup -L/usr/local/opt/readline/lib -L/usr/local/opt/readline/lib -L/usr/local/opt/openssl/lib -L/Users/myk/.pyenv/versions/3.7.0/lib -L/usr/local/opt/readline/lib -L/usr/local/opt/readline/lib -L/usr/local/opt/openssl/lib -L/Users/myk/.pyenv/versions/3.7.0/lib build/temp.macosx-10.13-x86_64-3.7/src/websockets/speedups.o -o build/lib.macosx-10.13-x86_64-3.7/websockets/speedups.cpython-37m-darwin.so
copying build/lib.macosx-10.13-x86_64-3.7/websockets/speedups.cpython-37m-darwin.so -> src/websockets
$ make test
python -W default -m unittest
...............................................................................................................................................................................................................................................................................................................................................................................E..E...........................................................................................................
======================================================================
ERROR: test_send_binary_from_bytearray (tests.test_protocol.ClientTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/myk/dev/websockets/tests/test_protocol.py", line 475, in test_send_binary_from_bytearray
    self.loop.run_until_complete(self.protocol.send(bytearray(b'tea')))
  File "/Users/myk/.pyenv/versions/3.7.0/lib/python3.7/asyncio/base_events.py", line 568, in run_until_complete
    return future.result()
  File "/Users/myk/.pyenv/versions/3.7.0/lib/python3.7/asyncio/coroutines.py", line 61, in send
    return self.gen.send(value)
  File "/Users/myk/dev/websockets/src/websockets/protocol.py", line 450, in send
    yield from self.write_frame(True, OP_BINARY, data)
  File "/Users/myk/.pyenv/versions/3.7.0/lib/python3.7/asyncio/coroutines.py", line 58, in __next__
    return self.gen.send(None)
  File "/Users/myk/dev/websockets/src/websockets/protocol.py", line 862, in write_frame
    frame.write(self.writer.write, mask=self.is_client, extensions=self.extensions)
  File "/Users/myk/dev/websockets/src/websockets/framing.py", line 201, in write
    data = apply_mask(frame.data, mask_bits)
TypeError: argument 1 must be read-only bytes-like object, not bytearray

======================================================================
ERROR: test_send_iterable_binary_from_bytearray (tests.test_protocol.ClientTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/myk/dev/websockets/tests/test_protocol.py", line 499, in test_send_iterable_binary_from_bytearray
    self.protocol.send([bytearray(b'te'), bytearray(b'a')])
  File "/Users/myk/.pyenv/versions/3.7.0/lib/python3.7/asyncio/base_events.py", line 568, in run_until_complete
    return future.result()
  File "/Users/myk/.pyenv/versions/3.7.0/lib/python3.7/asyncio/coroutines.py", line 61, in send
    return self.gen.send(value)
  File "/Users/myk/dev/websockets/src/websockets/protocol.py", line 467, in send
    yield from self.write_frame(False, OP_BINARY, data)
  File "/Users/myk/.pyenv/versions/3.7.0/lib/python3.7/asyncio/coroutines.py", line 58, in __next__
    return self.gen.send(None)
  File "/Users/myk/dev/websockets/src/websockets/protocol.py", line 862, in write_frame
    frame.write(self.writer.write, mask=self.is_client, extensions=self.extensions)
  File "/Users/myk/dev/websockets/src/websockets/framing.py", line 201, in write
    data = apply_mask(frame.data, mask_bits)
TypeError: argument 1 must be read-only bytes-like object, not bytearray

----------------------------------------------------------------------
Ran 478 tests in 7.856s

FAILED (errors=2)

@aaugustin
Copy link
Member Author

I'd have to use a Py_buffer instead of a const char * in speedups.c to support mutable buffers.

Since this discussion started from a rather theoretical "it would be nice if...", I'm not convinced it's worth putting a lot of effort into this change.

@aaugustin
Copy link
Member Author

I was exploring using bytearray and memoryview to minimize memory copies, which led me to implement this.

This use case only involves C-contiguous memoryviews. For non-C-contiguous memoryviews, making a copy is inevitable, so we can simply use tobytes().

aaugustin added a commit that referenced this issue Nov 4, 2018
aaugustin added a commit that referenced this issue Nov 4, 2018
Minimize memory copies when they're C-contiguous.

Fix #478.
aaugustin added a commit that referenced this issue Nov 4, 2018
aaugustin added a commit that referenced this issue Nov 4, 2018
Minimize memory copies when they're C-contiguous.

Fix #478.
aaugustin added a commit that referenced this issue Nov 4, 2018
aaugustin added a commit that referenced this issue Nov 4, 2018
aaugustin added a commit that referenced this issue Nov 4, 2018
Minimize memory copies when they're C-contiguous.

Fix #478.
aaugustin added a commit that referenced this issue Nov 4, 2018
aaugustin added a commit that referenced this issue Nov 4, 2018
aaugustin added a commit that referenced this issue Nov 4, 2018
Minimize memory copies when they're C-contiguous.

Fix #478.
aaugustin added a commit that referenced this issue Nov 4, 2018
aaugustin added a commit that referenced this issue Nov 6, 2018
aaugustin added a commit that referenced this issue Nov 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant