You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are currently indexing the file list for each record twice, one as part of pdc_describe_json_ss and another as files_ss. This is not a problem for small datasets but when we have datasets with 60K files this is rather inefficient.
There is really no need to index the second field (files_ss) since pdc_describe_json_ss has the file list already.
Looks like staging is sometimes having issues processing some of our large datasets. Sometimes they succeed sometimes they don't (see details in Honeybadger https://app.honeybadger.io/projects/95072/faults/116339390). The fix in this issue might help here.
The text was updated successfully, but these errors were encountered:
We are currently indexing the file list for each record twice, one as part of
pdc_describe_json_ss
and another asfiles_ss
. This is not a problem for small datasets but when we have datasets with 60K files this is rather inefficient.There is really no need to index the second field (
files_ss
) sincepdc_describe_json_ss
has the file list already.Making this change will require calculating the file list (rather than storing it in Solr) from the data in
pdc_describe_json_ss
while keeping in place the logic in the https://github.com/pulibrary/pdc_discovery/blob/main/config/traject/pdc_describe_indexing_config.rb#L244-L261Looks like staging is sometimes having issues processing some of our large datasets. Sometimes they succeed sometimes they don't (see details in Honeybadger https://app.honeybadger.io/projects/95072/faults/116339390). The fix in this issue might help here.
The text was updated successfully, but these errors were encountered: