Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flooding GRPC errors when downloading models #655

Open
austinbv opened this issue Jan 29, 2025 · 9 comments
Open

Flooding GRPC errors when downloading models #655

austinbv opened this issue Jan 29, 2025 · 9 comments

Comments

@austinbv
Copy link
Contributor

When trying to download models the downloads will start but I am flooded with

et (en0))'], ai-mac-5: ['ai-mac-4(Ethernet (en0))', 'ai-mac-3(Ethernet (en0))'], ai-mac-4: ['ai-mac-5(Ethernet (en0))', 'ai-mac-3(Ethernet (en0))']})
Error sending opaque status to ai-mac-4: <AioRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "Received RST_STREAM with error code 7"
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"Received RST_STREAM with error code 7", grpc_status:14, created_time:"2025-01-29T15:55:34.05777-07:00"}"
>
Traceback (most recent call last):
  File "/Users/exo/workspace/exo/.venv/lib/python3.12/site-packages/exo/orchestration/node.py", line 606, in send_status_to_peer
    await asyncio.wait_for(peer.send_opaque_status(request_id, status), timeout=15.0)
  File "/Users/exo/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
    return await fut
           ^^^^^^^^^
  File "/Users/exo/workspace/exo/.venv/lib/python3.12/site-packages/exo/networking/grpc/grpc_peer_handle.py", line 197, in send_opaque_status
    await self.stub.SendOpaqueStatus(request)
  File "/Users/exo/workspace/exo/.venv/lib/python3.12/site-packages/grpc/aio/_call.py", line 327, in __await__
    raise _create_rpc_error(
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "Received RST_STREAM with error code 7"
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"Received RST_STREAM with error code 7", grpc_status:14, created_time:"2025-01-29T15:55:34.05777-07:00"}"
@AlexCheema
Copy link
Contributor

Looks like one node lost connection. Did it go away after a few seconds?

@austinbv
Copy link
Contributor Author

It shouldn't have they are all thunderbolt and going though a 1gb switch.

@austinbv
Copy link
Contributor Author

Following up. I don't think that's it. it floods thousands of these errors pretty consistently when downloading models. Tiny chat also freezes and downloads stall. Restarting exo solves for a bit but happens again on any large download

@cnsren
Copy link

cnsren commented Jan 31, 2025

I have exact same problems with larger model download, waiting for the resolution.

@austinbv
Copy link
Contributor Author

I have exact same problems with larger model download, waiting for the resolution.

How are you connecting?

@AlexCheema
Copy link
Contributor

Does the download complete successfully?
It looks like something related to high network load

@austinbv
Copy link
Contributor Author

austinbv commented Feb 3, 2025

It does not. You need to restart the Exo process and restart the downloads

@dakecrazy
Copy link

me too

@dakecrazy
Copy link

I use thunderbolt 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants