You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MongoDbStorage's insert_many method should probably check for the total size of the documents or if one of the documents is too big itself. In some (pretty rare) cases, the size could exceed the 16mb BSON limit and result in an exception:
mcrit-server | 2023-09-12 17:57:10 [FALCON] [ERROR] POST /samples => Traceback (most recent call last):
mcrit-server | File "/opt/mcrit/mcrit/storage/MongoDbStorage.py", line 229, in _dbInsertMany
mcrit-server | insert_result = self._database[collection].insert_many([self._toBinary(document) for document in data])
mcrit-server | File "/usr/local/lib/python3.8/dist-packages/pymongo/_csot.py", line 108, in csot_wrapper
mcrit-server | return func(self, *args, **kwargs)
mcrit-server | File "/usr/local/lib/python3.8/dist-packages/pymongo/collection.py", line 757, in insert_many
mcrit-server | blk.execute(write_concern, session=session)
mcrit-server | File "/usr/local/lib/python3.8/dist-packages/pymongo/bulk.py", line 580, in execute
mcrit-server | return self.execute_command(generator, write_concern, session)
mcrit-server | File "/usr/local/lib/python3.8/dist-packages/pymongo/bulk.py", line 447, in execute_command
mcrit-server | client._retry_with_session(self.is_retryable, retryable_bulk, s, self)
mcrit-server | File "/usr/local/lib/python3.8/dist-packages/pymongo/mongo_client.py", line 1413, in _retry_with_session
mcrit-server | return self._retry_internal(retryable, func, session, bulk)
mcrit-server | File "/usr/local/lib/python3.8/dist-packages/pymongo/_csot.py", line 108, in csot_wrapper
mcrit-server | return func(self, *args, **kwargs)
mcrit-server | File "/usr/local/lib/python3.8/dist-packages/pymongo/mongo_client.py", line 1460, in _retry_internal
mcrit-server | return func(session, conn, retryable)
mcrit-server | File "/usr/local/lib/python3.8/dist-packages/pymongo/bulk.py", line 435, in retryable_bulk
mcrit-server | self._execute_command(
mcrit-server | File "/usr/local/lib/python3.8/dist-packages/pymongo/bulk.py", line 381, in _execute_command
mcrit-server | result, to_send = bwc.execute(cmd, ops, client)
mcrit-server | File "/usr/local/lib/python3.8/dist-packages/pymongo/message.py", line 966, in execute
mcrit-server | request_id, msg, to_send = self.__batch_command(cmd, docs)
mcrit-server | File "/usr/local/lib/python3.8/dist-packages/pymongo/message.py", line 956, in __batch_command
mcrit-server | request_id, msg, to_send = _do_batched_op_msg(
mcrit-server | File "/usr/local/lib/python3.8/dist-packages/pymongo/message.py", line 1353, in _do_batched_op_msg
mcrit-server | return _batched_op_msg(operation, command, docs, ack, opts, ctx)
mcrit-server | pymongo.errors.DocumentTooLarge: BSON document too large (60427090 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.
mcrit-server |
mcrit-server | During handling of the above exception, another exception occurred:
mcrit-server |
mcrit-server | Traceback (most recent call last):
mcrit-server | File "falcon/app.py", line 365, in falcon.app.App.__call__
mcrit-server | File "/opt/mcrit/mcrit/server/utils.py", line 51, in wrapper
mcrit-server | func(*args, **kwargs)
mcrit-server | File "/opt/mcrit/mcrit/server/SampleResource.py", line 126, in on_post_collection
mcrit-server | summary = self.index.addReportJson(req.media, username=username)
mcrit-server | File "/opt/mcrit/mcrit/index/MinHashIndex.py", line 280, in addReportJson
mcrit-server | return self.addReport(report, calculate_hashes=calculate_hashes, calculate_matches=calculate_matches, username=username)
mcrit-server | File "/opt/mcrit/mcrit/index/MinHashIndex.py", line 265, in addReport
mcrit-server | sample_entry = self._storage.addSmdaReport(smda_report)
mcrit-server | File "/opt/mcrit/mcrit/storage/MongoDbStorage.py", line 622, in addSmdaReport
mcrit-server | self._dbInsertMany("functions", function_dicts)
mcrit-server | File "/opt/mcrit/mcrit/storage/MongoDbStorage.py", line 238, in _dbInsertMany
mcrit-server | raise ValueError("Database insert failed.")
mcrit-server | ValueError: Database insert failed.
Unfortunately I didn't add any print of the samples that caused this, so I don't really have context to provide 😭. Overall this is pretty uncommon, happened 4 times for over 120k files.
The text was updated successfully, but these errors were encountered:
pymongo.errors.DocumentTooLarge: BSON document too large (60427090 bytes)
Suggests that there was a single function for which the JSON representation was 60M+ in size.
If I had to guess and you want to find any of the 4 samples for reproduction, I'd search for the largest binaries or .text sections you can find among what you processed. :)
Now, the generic solution for this would go beyond just checking for size.
In order to not lose any data, these objects should instead be stored in GridFS. At that point, probably all control flow graphs should instead be stored in GridFS, and possibly be compressed to save roundabout 5-10x of space used as they are only used rarely.
This would then also require providing code for migrating the database layout or reindexing the samples.
I'll keep the issue open as a reminder that this issue exists, despite being a very rare edge case as your observations suggest.
MongoDbStorage
'sinsert_many
method should probably check for the total size of the documents or if one of the documents is too big itself. In some (pretty rare) cases, the size could exceed the 16mb BSON limit and result in an exception:Unfortunately I didn't add any print of the samples that caused this, so I don't really have context to provide 😭. Overall this is pretty uncommon, happened 4 times for over 120k files.
The text was updated successfully, but these errors were encountered: