Replies: 5 comments
-
Could you get a reproduction sample for this? 🙏 Also, do you/did you modify the request queue directory at all? |
Beta Was this translation helpful? Give feedback.
-
It's happened twice. This time after scraping about 170,000 results. Have not changed the the request queue directory at all. Its running on ECS with EFS mount for storage. In both cases there was an empty file that was throwing the error and the file was locked. Not sure why the file is empty. Maybe crawlee can't recover after being shut down unexpectedly. Because it is on ECS Fargate Spot instances it could be shut down at any time. |
Beta Was this translation helpful? Give feedback.
-
I imagine an update to remove the empty json file and lock on start up would resolve the issue for total unexpected cases. Adding a handle for the SIGTERM signal would work for systems like ECS Fargate Spot that send a SIGTERM signal giving a two minute warning before shutting down. |
Beta Was this translation helpful? Give feedback.
-
We really need to see a complete reproduction for such issues, otherwise we can't help. As I pointed out in #2088, you should probably adjust your setup, so the JSON files are not created at all (the edit: moving to discussions for now |
Beta Was this translation helpful? Give feedback.
-
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/memory-storage
Issue description
SyntaxError: Unexpected end of JSON input
at JSON.parse ()
at RequestQueueFileSystemEntry.get (/home/myuser/node_modules/@crawlee/memory-storage/fs/request-queue/fs.js:49:30)
at async RequestQueueClient.listHead (/home/myuser/node_modules/@crawlee/memory-storage/resource-clients/request-queue.js:162:29)
at async RequestQueue._ensureHeadIsNonEmpty (/home/myuser/node_modules/@crawlee/core/storages/request_queue.js:693:101)
at async RequestQueue.isEmpty (/home/myuser/node_modules/@crawlee/core/storages/request_queue.js:609:9)
at async PuppeteerCrawler._isTaskReadyFunction (/home/myuser/node_modules/@crawlee/basic/internals/basic-crawler.js:848:38)
at async AutoscaledPool._maybeRunTask (/home/myuser/node_modules/@crawlee/core/autoscaling/autoscaled_pool.js:482:27)
�[33mWARN�[39m �[33m RequestQueue:�[39m The request queue seems to be stuck for 300s, resetting internal state.�[90m {"inProgress":[]}�[39m
undefined:1
Code sample
No response
Package version
3.5.4
Node.js version
18.16.0
Operating system
Windows 10
Apify platform
I have tested this on the
next
releaseNo response
Other context
No response
Beta Was this translation helpful? Give feedback.
All reactions