You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had to re-index myEpisode table due to adding a new attribute and running into various other issues. The table contains about 26k records where each record is reasonable size. Keep in mind all this data was already indexed and i am attempting to re-index it.
Attempt 1 - Episode.reindex!
My first attempt was to simply follow the code in the read me and call Episode.reindex!. This caused the following error after some time and resulted in a lot of records missing:
413 Payload Too Large - The provided payload reached the size limit. The maximum accepted payload size is 20 MiB.. See https://docs.meilisearch.com/errors#payload_too_large. (MeiliSearch::ApiError)
I tried running this multiple times but always end up getting this error. I don't understand why data size would be an issue here since all my records were previously already indexed without encountering this issue. Especially when looking at the next 2 attempts.
Attempt 2 - Episode.reindex! with smaller batch size
For this i ran Episode.reindex!(100) decreasing the default batch size to about 10% of the default 1000. This seems to work but takes forever and eventually times out. the timeout could be due to my SSH connection timing out at this point. However, in this case i don't get the payload_too_large error which is strange since the same data is being sent.
This indicates to me that the issue might be a too big batch size? This seems like a bug.
Attempt 3 - Custom batch size and background job
The way i was finally able to make it (mostly - will open a separate issue for this) work is by batching records myself and moving the indexing to a background job.
Notice that this is still creating batches of 1000 records but without throwing a payload_too_large error. It also executes a lot faster since i am running 5 jobs in parallel.
I guess my questions are
Why am i getting a payload_too_large error?
What is the proper way to re-index a Model, small or big?
Do i need to re-index the whole table when i add a new attribute or is there a more efficient way?
Environment (please complete the following information):
OS: [e.g. Debian GNU/Linux]
Meilisearch server version: Cloud v1.11.3 (v1.12 in development due to some bug on the cloud dashboard)
meilisearch-rails version: v0.14.1
Rails version: v8.0
The text was updated successfully, but these errors were encountered:
Thank you for your patience and detailed feedback. We're still investigating the issue, but we wanted to provide an update.
Why am I getting a payload_too_large error?
The Build plan in Meilisearch Cloud has a payload size limit of 20MB (compared to the default 100MB). Larger documents or batches exceeding this size can trigger this error, especially with the default reindex! batch size of 1000 and the fact it might be expecting a default payload limit of 100MB. This could be linked to the size of your documents in the Episode model.
Why does reindex! on the Model with a batch size of 1000 fails, but your custom job works, although also using reindex! but on specific records?
We're still investigating this, but the difference likely comes from how Model.reindex! and records.reindex! handle data and the size of your documents. If no issue is found with Model.reindex!, the difference might be due to how attributes are loaded using Model.reindex! potentially including more data in each payload.
What is the proper way to re-index a model?
Using Model.reindex! is the intended way, and can be optimized with:
Defining a meilisearch_import scope to limit queried attributes and relationships with scope :meilisearch_import, -> { select(:id, :title, :description) }
In the meantime, since it seems that there is an issue with the SDK in your case, your custom script with background jobs and smaller batches is a valid workaround.
Do I need to re-index the whole table when adding a new attribute?
Usually yes, a full re-index is necessary to changing attributes. The reindex! method should do this job.
Additional notes:
reindex! method could probably only update documents instead of deleting/re-adding them, which would optimize a lot the way it currently work (update/replace)
It seems there is a difference between Model.reindex!(1000) and 1000 records reindex! that needs to be investigated further
Description
I had to re-index my
Episode
table due to adding a new attribute and running into various other issues. The table contains about 26k records where each record is reasonable size. Keep in mind all this data was already indexed and i am attempting to re-index it.Attempt 1 - Episode.reindex!
My first attempt was to simply follow the code in the read me and call
Episode.reindex!
. This caused the following error after some time and resulted in a lot of records missing:413 Payload Too Large - The provided payload reached the size limit. The maximum accepted payload size is 20 MiB.. See https://docs.meilisearch.com/errors#payload_too_large. (MeiliSearch::ApiError)
I tried running this multiple times but always end up getting this error. I don't understand why data size would be an issue here since all my records were previously already indexed without encountering this issue. Especially when looking at the next 2 attempts.
Attempt 2 - Episode.reindex! with smaller batch size
For this i ran
Episode.reindex!(100)
decreasing the default batch size to about 10% of the default 1000. This seems to work but takes forever and eventually times out. the timeout could be due to my SSH connection timing out at this point. However, in this case i don't get thepayload_too_large
error which is strange since the same data is being sent.This indicates to me that the issue might be a too big batch size? This seems like a bug.
Attempt 3 - Custom batch size and background job
The way i was finally able to make it (mostly - will open a separate issue for this) work is by batching records myself and moving the indexing to a background job.
Which i then just call:
Notice that this is still creating batches of 1000 records but without throwing a
payload_too_large
error. It also executes a lot faster since i am running 5 jobs in parallel.I guess my questions are
payload_too_large
error?Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: