Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MODINVSTOR-1277: Delete 6 unused instance database indexes #1101

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

julianladisch
Copy link
Contributor

Purpose

Delete the 6 biggest database indexes that are never used.

  • instance_identifiers_idx_gin
  • instance_identifiers_idx_ft
  • instance_title_idx
  • instance_contributors_idx
  • instance_indextitle_idx
  • instance_publication_idx

Approach

  • Delete 6 unused instance database indexes by marking them for deletion in schema.json
  • This autoamtically drops CQL array support for identifiers, therefore delete identifiers array unit tests
  • Delete PgUtil.getWithOptimizedSql code because the title index has been deleted
  • Delete populateRmbInternalIndex.sql because it runs for mod-inventory-storage <= 19.1.0

Changes Checklist

  • API Changes: Document any API paths, methods, request or response bodies changed, added, or removed.
  • Database Schema Changes: Indicate any database schema changes and their impact. Confirm that migration scripts were created.
  • Interface Version Changes: Indicate any changes to interface versions.
  • Interface Dependencies: Document added or removed dependencies.
  • Permissions: Document any changes to permissions.
  • Logging: Confirm that logging is appropriately handled.
  • Unit Testing: Confirm that changed classes were covered by unit tests.
  • Integration Testing: Confirm that changed logic was covered by integration tests.
  • Manual Testing: Confirm that changes were tested on local or dev environment.
  • NEWS: Confirm that the NEWS file is updated with relevant information about the changes made in this pull request.

Related Issues

https://folio-org.atlassian.net/browse/MODINVSTOR-1277

…array support for identifiers

https://folio-org.atlassian.net/browse/MODINVSTOR-1277

* Delete 6 unused instance database indexes:
  * instance_identifiers_idx_gin
  * instance_identifiers_idx_ft
  * instance_title_idx
  * instance_contributors_idx
  * instance_indextitle_idx
  * instance_publication_idx
* Drop CQL array support for identifiers, delete identifiers array unit tests
* Delete PgUtil.getWithOptimizedSql because the title index has been deleted
* Delete populateRmbInternalIndex.sql because it runs for mod-inventory-storage <= 19.1.0
Copy link

Comment on lines -404 to +410
"tOps": "ADD",
"tOps": "DELETE",
"caseSensitive": false,
"removeAccents": true
},
{
"fieldName": "title",
"tOps": "ADD",
"tOps": "DELETE",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep these 2 indexes? The title index is used by FQM and the indexTitle index probably should be, so removing them would hurt FQM's performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example query where PostgreSQL actually uses the title or indexTitle index? I doubt that real world queries from mod-lists really need these db indexes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A list made with the query instance.title == 'abc' would use that index. And I can't remember for sure, but I believe instance.title starts_with 'abc' would, too (I can't remember for sure on that one, though... We still might be waiting on the changes in FQM that would make that the case)

Copy link
Contributor

@mweaver-ebsco mweaver-ebsco Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting that when an index is available, we go out of our way to define the corresponding FQM field in a way that will use that index wherever possible. So even if the index is "left"(lower(diku_mod_inventory_storage.f_unaccent(jsonb ->> 'title')), 600) or something like that, FQM can use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such kind of queries should better go to mod-search because mod-search has the instances, holdings and items combined (de-normalized) which allows better search and better sorting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mweaver-ebsco
In general if you have a known list of instances you are interested in you do further queries using a unique identifier like id or hrid, not title.
I don't see a real world use case for the queries instance.title == 'abc' and instance.title == 'abc*'. Can you give a real world use case?
And why can't you use mod-search that we have introduced to replace non-transactional mod-inventory-storage queries like these title searches?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @julianladisch, seems that @mweaver-ebsco has just doubts but not strong objections. So, I'm approving the PR.

Copy link
Contributor

@mweaver-ebsco mweaver-ebsco Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have strong objections (sorry, I missed the follow-up questions). Please do not delete those 2 indexes. I'd love for mod-fqm-manager to use mod-search instead of relying on these indexes in the DB, but we're not there yet (that change will take significant architectural changes in FQM), so for now, these are definitely still being used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general if you have a known list of instances you are interested in you do further queries using a unique identifier like id or hrid, not title.

FQM exists to cover uses like this where you're providing query conditions and not specific IDs (it can handle specific IDs, too, if needed). Title is one of the many fields users can search by. If they have those IDs handy, they should 100% use them, but we can't force them to use them.

I don't see a real world use case for the queries instance.title == 'abc' and instance.title == 'abc*'. Can you give a real world use case?

Maybe a user needs to export a list of loans, joined with the associated detailed inventory data, where the corresponding instance titles containing 'abc' or something similar ¯\_(ツ)_/¯ Maybe they are doing some work with various translations of some multi-volume work, where they are dealing with a large number of items with very similar names, and they need the circulation data alongside the corresponding item and instance data. Perhaps someone made a mistake and accidentally changed the title on 500 instances to "abc" somehow and they need to find and fix those quickly. With the instance title index in place, it could be fixed in minutes using Bulk Edit (which uses FQM). Without it, it could take hours.

And why can't you use mod-search that we have introduced to replace non-transactional mod-inventory-storage queries like these title searches?

Believe me, I really really really want to do exactly that. It's waaaaay up near the top of my own wishlist for mod-fqm-manager features. Unfortunately, that will require major architectural changes in mod-fqm-manager and a fairly significant amount of work, so we haven't been able to do it yet. It's not just a matter of switching to mod-search; it's introducing the ability to join search results from mod-search with query results from the DB. It is not a trivial change.

non-transactional mod-inventory-storage queries like these title searches?

I'm not sure I understand what you mean here. A query for instance.title = 'abc' in mod-fqm-manager is entirely transactional, in the sense that it happens within a transaction, and it depends pretty heavily on this. It only becomes non-transactional if you interact with it in a non-transactional way (for example, mod-lists does interact with it non-transactionally, but this is by design). If mod-fqm-manager starts using mod-search to handle parts of its queries, it makes things significantly less transactional, since it will need to execute different parts of the query on different platforms, and then join those together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

title and indexTitle have b-tree indexes using the default UTF8 locale. Therefore they only support exact match, they don't support right truncation, as explained on https://www.postgresql.org/docs/current/indexes-types.html#INDEXES-TYPES-BTREE :

Exact match using the title b-tree index:

folio=# select jsonb->>'title' from instance where ("left"(lower(f_unaccent(jsonb ->> 'title'::text)), 600)) LIKE 'abc-team : a';
   ?column?
--------------
 ABC-Team : A
(1 row)

Time: 0.776 ms

Right truncation uses a full table scan:

folio=# select jsonb->>'title' from instance where ("left"(lower(f_unaccent(jsonb ->> 'title'::text)), 600)) LIKE 'abc-team : a%';
   ?column?
--------------
 ABC-Team : A
(1 row)

Time: 1821.425 ms (00:01.821)

starts_with 'abc' is exactly the same as LIKE 'abc%' and therefore also needs a full table scan.

Please provide a real world use case where an exact title match is needed. I don't see why one would search for the exact title 'abc', if the exact title is known one can search using the id or hrid.

Copy link
Contributor

@psmagin psmagin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update NEWS by moving it's updates into not-released block.

Copy link
Contributor

@mweaver-ebsco mweaver-ebsco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the 2 indexes that are actively being used by mod-fqm-manager, as mentioned in my previous comment (adding a separate comment because github won't let me hit "request changes" without a new comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants