You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which service(blob, file, queue) does this issue concern?
Async blob service. The following code can be used to reproduce the issue
importasynciofromcontextlibimportAsyncExitStackimportosimporttimefromazure.storage.blob.aioimportBlobServiceClientfromdotenvimportload_dotenvload_dotenv()
asyncdefcheck_if_blob_exists(container_client, blob):
# We sleep for a while to slow things down a give us an opportunity# to monitor file descriptor usetime.sleep(0.01)
blob_client=container_client.get_blob_client(blob=blob)
ifawaitblob_client.exists():
returnTrueelse:
returnFalseasyncdefgather_with_concurrency(n, *tasks):
"""A gather function that limits the concurrency to avoid overloading the backend. Params ------ n: int The max number of concurrent coroutines that can be run tasks: The futures we want to execute """semaphore=asyncio.Semaphore(n)
asyncdefsem_task(task):
asyncwithsemaphore:
returnawaittaskreturnawaitasyncio.gather(
*(sem_task(task) fortaskintasks), return_exceptions=True
)
asyncdefeat_file_descriptor(container_client):
blob_name="some_blob_name"_=awaitcheck_if_blob_exists(
container_client,
blob=blob_name,
)
asyncdefmain():
asyncwithAsyncExitStack() asstack:
blob_service_client=awaitstack.enter_async_context(
BlobServiceClient.from_connection_string(
os.environ["BLOB_STORAGE_CONN_STR"]
)
)
container_client=blob_service_client.get_container_client(
os.environ["CONTAINER"]
)
# Create the futures and gather themresults=awaitgather_with_concurrency(
int(os.environ["MAX_CONCURRENCY_CONSOLIDATE"]),
*[eat_file_descriptor(container_client) for_inrange(40000)],
)
returnresultsif__name__=="__main__":
res=asyncio.run(main())
Which version of the SDK was used? Please provide the output of pip freeze.
Running Python 3.8.6 under WSL2. My per process limit on file descriptors is 1024
The exists method for azure.storage.blob.aio._blob_client_async.BlobClient leaks file descriptors. If a big number of futures that make use of the method are launched the per process limit for open files kicks in real quick. From that point on everything grinds to a halt and OS too many open file errors start popping up all over the place.
I monitor file descriptor use with the following command. It will print the processes with the highest file descriptor usage. for pid in ps -o pid -u some_user ; do echo "$(ls /proc/$pid/fd/ 2>/dev/null | wc -l ) for PID: $pid" ; done | sort -n | tail
Have you found a mitigation/solution?
Yes, not using the exists method. I surround the SDK calls on a try/except block that raises when the blob is not there. Using exceptions control for flow control is not ideal but it saved the day here.
The text was updated successfully, but these errors were encountered:
Which service(blob, file, queue) does this issue concern?
Async blob service. The following code can be used to reproduce the issue
Which version of the SDK was used? Please provide the output of
pip freeze
.Running Python 3.8.6 under WSL2. My per process limit on file descriptors is 1024
aiohttp==3.4.4
appdirs==1.4.4
asgiref==3.2.10
async-timeout==3.0.1
attrs==21.2.0
azure-core==1.16.0
azure-identity==1.5.0
azure-kusto-data==2.3.0
azure-storage-blob==12.8.1
black==21.7b0
certifi==2021.5.30
cffi==1.14.6
chardet==3.0.4
charset-normalizer==2.0.3
click==8.0.1
cryptography==3.4.7
idna==3.2
isodate==0.6.0
msal==1.9.0
msal-extensions==0.3.0
msrest==0.6.21
multidict==4.7.6
mypy-extensions==0.4.3
numpy==1.21.1
oauthlib==3.1.1
pandas==1.2.5
pathspec==0.9.0
portalocker==1.7.1
pyarrow==4.0.1
pycparser==2.20
PyJWT==2.1.0
python-dateutil==2.8.2
python-dotenv==0.19.0
pytz==2021.1
regex==2021.7.6
requests==2.26.0
requests-oauthlib==1.3.0
river==0.7.1
scipy==1.7.0
six==1.16.0
structlog==21.1.0
tenacity==8.0.1
tomli==1.1.0
urllib3==1.26.6
yarl==1.6.3
What problem was encountered?
The
exists
method forazure.storage.blob.aio._blob_client_async.BlobClient
leaks file descriptors. If a big number of futures that make use of the method are launched the per process limit for open files kicks in real quick. From that point on everything grinds to a halt and OS too many open file errors start popping up all over the place.I monitor file descriptor use with the following command. It will print the processes with the highest file descriptor usage.
for pid in
ps -o pid -u some_user; do echo "$(ls /proc/$pid/fd/ 2>/dev/null | wc -l ) for PID: $pid" ; done | sort -n | tail
Have you found a mitigation/solution?
Yes, not using the
exists
method. I surround the SDK calls on a try/except block that raises when the blob is not there. Using exceptions control for flow control is not ideal but it saved the day here.The text was updated successfully, but these errors were encountered: