Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MARC 880 Support #1888

Closed
wants to merge 1 commit into from
Closed

Conversation

bbusenius
Copy link
Contributor

Fixes UC #109

This pull request aims to add support for 880 fields to search result and full record pages.

@bbusenius
Copy link
Contributor Author

So far, this very limited in scope. I've only added what I think might be needed for adding the linked title field as it would appear visually on the search results and full record pages. I feel like this alone gives us a lot to talk about. Before I proceed, I'd like to work through feedback around this to help chart a course. I think this raises quite a few questions as is. I'll add some of them in comments in the code.

Comment on lines +22 to +24
alt_title = LNK245ab, first
alt_title_sub = LNK245b, first
alt_title_short = LNK245a, first
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When working on this, I felt like it was really nice to have all these fields named with a prefix rather than a suffix. This allows them to all be grouped and sorted together in one place. I felt like that was nice and made for an easy way to keep track of them. That being said, I don't know if other's would rather use a suffix and have them interfile with their corresponding fields. I don't feel strongly about this, so I'm happy to change it. Since, I'm assuming that multivalued fields will just get the fields added to their configuration, this "separation" of linked fields with prefixes, will not be pure anyhow. I will note, however, that if we use a suffix it cannot be _alt because title_alt already exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong opinions about this, and the best option may naturally present itself as we explore further... but I do feel strongly that we should avoid having fields names both alt_title and title_alt -- nobody is going to understand the difference between those, and it will cause confusion. Looks like the current title_alt field is used for alternate title forms and secondary titles, while of course 880 is for alternate renderings of the same title. A perfect prefix or suffix for capturing this distinction is not leaping to my mind, but I'll keep pondering it...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...this is perhaps too verbose, but you're using AlternateDisplay as the prefix for method names in the record driver, so would alt_display_ be a sensible prefix for these fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the best option might present itself as we explore further but alt_display_ is reasonable...

Comment on lines +856 to +868
/**
* Get an alternate display title.
*
* @return string
*/
public function getAlternateDisplayTitle()
{
$format = '%s %s';
$short = $this->fields['alt_title_short'] ?? '';
$sub = $this->fields['alt_title_sub'] ?? '';
return sprintf($short, $sub);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this for convenience but maybe this should be split out into two functions like the corresponding "normal" fields are? Or maybe the corresponding fields could be combined into 1 function, e.g. getDisplayTitle? I wasn't sure what to do here so I just did something quick and easy. I'm not sure if it's problematic though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, you're indexing the short + sub titles in the alt_title field, so why do this concatenation at all? Wouldn't it be better to simply return the alt_title field directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I didn't explain this very well. I agree with not concatenating at all. This was my attempt to keep these fields analogous with their linked counterparts. This is the title that displays on the full record page and the "normal" versions of these fields are concatenated in the templates there. I was just being lazy and doing it here, rather than in the templates because I didn't want to write two redundant methods until I understood better the direction we were going to take. If we just use alt_title we would be treating the 880 title that displays on the full record slightly different than the analogous title that displays. Of course this would be mitigated by the fact that they use the same subfields (in this case) and people can index them however they want, but I thought there might be a reason for that separation in the analogous fields that might also pertain to 880 titles in how they might be used at other institutions, if that makes sense. That being said though, this is a good example of some of the weirdness that can come up here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is just that there are three record driver methods for regular titles: getTitle(), getShortTitle(), and getSubtitle(). Each of these methods corresponds to a field in the Solr index. If you want to create three parallel versions for 880's, it would make sense for each of those to also correspond with the equivalent Solr fields... and the concatenation you're doing here takes the fields I would expect you to use in ShortTitle and Subtitle and mashing them together in place of just using the Title field. It just seems like a confusingly mixed approach, though I understand that we may not want to go all-out in creating all these variations. Certainly worth discussing further before taking lots of action, especially with @EreMaijala's cautionary note. :-)

Comment on lines +300 to +329
/**
* Formats the title. This method will return both the regular title and an
* alternate title if both are present, otherwise it will return either the
* regular title or the alternate title if only one is present. This method
* tries to return the highlighted title first but will return the unhighlighted
* title if a one is not present. Returns 'Title not available' if there is no
* title.
*
* @param string $title Title not highlighted
* @param string $altTitle Alternate title not highlighted
* @param string $highlightedTitle Highlighted title
* @param string $highlightedAltTitle Alternate highlighted title
*
* @return string
*/
protected function formatTitle(
$title, $altTitle, $highlightedTitle, $highlightedAltTitle
) {
$format = '%s<br/>%s';
$title = !empty($highlightedTitle) ? $highlightedTitle : $title;
$altTitle = !empty($highlightedAltTitle) ? $highlightedAltTitle : $altTitle;
if ($title && $altTitle) {
return sprintf($format, $title, $altTitle);
}
if ($title || $altTitle) {
return $title ?? $altTitle;
}
$transEsc = $this->getView()->plugin('transEsc');
return $transEsc('Title not available');
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this function can't be generalized such that it might work for all linked field pairs. I think I'll get a feel for that as I add other single value fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, seems possible.

@bbusenius
Copy link
Contributor Author

@demiankatz and @lahmann this is just a very simple implementation of 1 field (well, 3 fields representing 1 visual element). I left some comments in the code. Please feel free to let me know if we need to alter this approach. Also, forgive me for my ignorance. I have not done much with the indexing side of things in VuFind.

@demiankatz demiankatz changed the title 880 Support MARC 880 Support Mar 19, 2021
Copy link
Member

@demiankatz demiankatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't given this anything like an exhaustive review yet, but I did leave a few comments. In addition to that, some other high-level thoughts:

The 880 field is allowed to repeat according to the MARC spec, but this code currently seems to assume there will be exactly one 880 per record (or that any subsequent 880's should be ignored). Do we need to worry about multiple alternate title forms? I realize that doing so would make life more complicated, but if it's part of the spec, maybe we need to deal with it.

Another thought: I could easily imagine different institutions having different opinions about what the "primary" title is. Some might prefer to favor the 880's in order to give a more accurate representation of the original intent, while others might prefer the normalization of the "standard" MARC fields. I wonder if it would be a useful pattern to set up indexing rules in pairs using the "first" and "notfirst" modifiers, like:

title = 245ab:LNK245ab, first
alt_title = 245ab:LNK245ab, notfirst

Because then it would be easy to reverse the order of the clauses in the field list in order to seamlessly adjust VuFind's behavior. Of course, for this to work, we have to assume the alt_ fields are multi-valued, which leads to the complexity I alluded to earlier.

Does that make any sense?

Comment on lines +22 to +24
alt_title = LNK245ab, first
alt_title_sub = LNK245b, first
alt_title_short = LNK245a, first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong opinions about this, and the best option may naturally present itself as we explore further... but I do feel strongly that we should avoid having fields names both alt_title and title_alt -- nobody is going to understand the difference between those, and it will cause confusion. Looks like the current title_alt field is used for alternate title forms and secondary titles, while of course 880 is for alternate renderings of the same title. A perfect prefix or suffix for capturing this distinction is not leaping to my mind, but I'll keep pondering it...

Comment on lines +856 to +868
/**
* Get an alternate display title.
*
* @return string
*/
public function getAlternateDisplayTitle()
{
$format = '%s %s';
$short = $this->fields['alt_title_short'] ?? '';
$sub = $this->fields['alt_title_sub'] ?? '';
return sprintf($short, $sub);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, you're indexing the short + sub titles in the alt_title field, so why do this concatenation at all? Wouldn't it be better to simply return the alt_title field directly?

Comment on lines +22 to +24
alt_title = LNK245ab, first
alt_title_sub = LNK245b, first
alt_title_short = LNK245a, first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...this is perhaps too verbose, but you're using AlternateDisplay as the prefix for method names in the record driver, so would alt_display_ be a sensible prefix for these fields?

Comment on lines +300 to +329
/**
* Formats the title. This method will return both the regular title and an
* alternate title if both are present, otherwise it will return either the
* regular title or the alternate title if only one is present. This method
* tries to return the highlighted title first but will return the unhighlighted
* title if a one is not present. Returns 'Title not available' if there is no
* title.
*
* @param string $title Title not highlighted
* @param string $altTitle Alternate title not highlighted
* @param string $highlightedTitle Highlighted title
* @param string $highlightedAltTitle Alternate highlighted title
*
* @return string
*/
protected function formatTitle(
$title, $altTitle, $highlightedTitle, $highlightedAltTitle
) {
$format = '%s<br/>%s';
$title = !empty($highlightedTitle) ? $highlightedTitle : $title;
$altTitle = !empty($highlightedAltTitle) ? $highlightedAltTitle : $altTitle;
if ($title && $altTitle) {
return sprintf($format, $title, $altTitle);
}
if ($title || $altTitle) {
return $title ?? $altTitle;
}
$transEsc = $this->getView()->plugin('transEsc');
return $transEsc('Title not available');
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, seems possible.

@EreMaijala
Copy link
Contributor

I think it's worth a quick pause here and consider the pro's and con's of adding the index fields.

1.) I can see the benefit if you can avoid processing the MARC record when displaying search results, but otherwise, I'm not sure. I've found it quite impossible to avoid reading MARC even for the results list, so all it would take to display the information is to make the SolrMarc records fetch the data from MARC. We actually do that already. I'd be happy to contribute a SolrMarc method that gets the alt script version of a given field.

2.) If you want to find the titles using alternate script search terms in title search, you could just index them in title_alt.

3.) If you'd want to do things really really well, you'd have to add separate fields for each script so that you could do different analysis for them in the index.

@bbusenius
Copy link
Contributor Author

@EreMaijala I appreciate this comment. I'm really just looking to get agreement on an approach at this point, so this is exactly the kind of conversation I want to have. I might have been too hasty in my assumption that it would be desirable to avoid a performance hit here. What I can tell you is that we do something similar as well. Our implementation is very old and messy and could not be slotted over, however, I know this approach is possible. It also sounds similar to what @lahmann is doing, though his implementation only applies to the full record at the moment.

Regarding performance, anecdotally I have not noticed a difference between our custom VuFind and stock VuFind (and we display 60 results by default). Also, like I mentioned, we're not doing this in an optimal way, so I would expect your method to be better. I think I was operating under the assumption that this would not be desirable. Just the simple fact that you're open to this approach, makes me want to elevate it in consideration.

Here are a couple of considerations:

1.) By doing it the way you describe, we would have the slightly strange side effect that a matching title on the results page would not be highlighted. It would match, and the item would be found, but the search term would not be highlighted (though it could highlight below depending on how it was indexed). Regarding this side effect, I will note that it is not a deal breaker for us. This is how it currently works in our catalog and it does not bother anyone as far as I know. So we would be okay with this.

2.) The nice part about doing it in Solr is that it allows us to configure it however we want. We display many different subfields than what stock VuFind does. I'm sure your institutions do as well. Configurability would be a "nice to have". I can imagine there might be a way to achieve this but I will also note, it's probably not a deal breaker for us since we could simply override the method calls as needed and it would still put us more in line with core VuFind than we currently are. I also don't want to overly complicate things.

3.) This probably would not apply to the way you're talking about but I'll mention it because it's important to us. In our implementation, 880s only display if the corresponding linked field is present. This is a bug that we need to remedy. In whatever approach we use, 880s should always display. I don't see any reason why that couldn't be the case with what you're proposing.

Sorry this is so long! I'll just end in saying, the way you're proposing we might want to do this would be a lot cleaner and more concise. I like it for that reason. If everybody is okay with the tradeoffs, I'm certainly open to this. I'll be interested to hear what everyone else thinks.

@bbusenius
Copy link
Contributor Author

@demiankatz thanks for those thoughts. I was focusing on keeping fields analogous but I can see that was misguided. What you say about supporting multiple alternate title forms makes a lot of sense. I think the way I have it set up might work for us in this particular case but I can imagine it might not work for other institutions and might not work in other cases. Always treating the alternate fields as multivalued fields probably makes a lot of sense. You have a better idea as to how this might complicate things than I do but in some ways I could imagine this simplifying things in another way. I guess I'd find out if we took this approach. I think your notes on the set up of indexing rules in pairs make sense.

@demiankatz
Copy link
Member

Just to take a step back, I wonder if we should start by settling on a record driver implementation, and then work backwards from there. The record driver really sets the interface and the functionality, and we can fill it in from the index or the MARC as we see fit. I suspect we may be able to more easily reach consensus on that point first, and then the rest can follow. In fact, depending on how it works out, that might even inform the underlying implementation. But questions about naming, single/multi-valued fields, etc. would all be answered at the record driver level.

You do make a good point that we need to index the fields for proper highlighting, but if that's not a deal-breaker one way or the other, we can defer that part of the implementation.

I would also argue that another possible advantage of the index-oriented approach is that it allows us to populate the same fields from multiple metadata sources and get consistent functionality without having to write extra driver code... but honestly, 880 is pretty specialized, and so I'm not sure if it's likely to come into play much outside of MARC, at least for the moment.

Anyway, just some random thoughts for the moment....

@EreMaijala
Copy link
Contributor

I'm running out of time today, but a short note regarding title_short / title_sub. I find the division problematic in several ways. I understand that there are reasons for it, but the current structure isn't very non-MARC friendly. I'm aware of at least the following issues:

1.) title_short is often too short for identifying a record. Bookmark a record page, and the page title is often missing the crucial bit.
2.) Having to concatenate title_short and title_sub causes the problem of whether some punctuation is needed and what's correct. You can index 245a with trailing punctuation in title_short, but then it shows up also when there's no 245b. If you don't, you'll need to add one, and will be wrong in some cases.
3.) Many formats don't allow for splitting to these two fields anyway.

Here's two examples (first ones I found), see the page titles:
https://vufind.org/demo/Record/1242517
https://vufind.org/demo/Record/1839936

@demiankatz
Copy link
Member

@EreMaijala, I agree that these issues are problematic, and probably long overdue for some kind of solution... but I think that's a whole separate conversation. My feeling is that, no matter what path we take going forward, we should make sure that the 880 and non-880 versions work in a parallel fashion, because if they do not, it will become even more confusing.

Do you have any specific ideas for a better method of handling titles? I wonder if as a starting point we should take inventory of where the various title methods are used, and evaluate whether we can increase standardization. Perhaps a smart truncation algorithm would be preferable to using titleShort in some or all places, and if we reduce the need for that, the other parts might go away too.

@EreMaijala
Copy link
Contributor

@demiankatz Agreed, the discussion doesn't belong here, and working on this one doesn't prevent further work on title. In the long run I'd like to see title_short+title_sub replaced with title at least most of the time. In some places it would mean adding truncation, but then it would be possible to do it in a way suitable for each case.

Okay, back to the issue at hand. I'd propose these as a starting point:
https://github.com/NatLibFi/NDL-VuFind2/blob/dev/module/Finna/src/Finna/RecordDriver/SolrMarc.php#L1193
https://github.com/NatLibFi/NDL-VuFind2/blob/dev/module/Finna/src/Finna/RecordDriver/SolrMarc.php#L1208
https://github.com/NatLibFi/NDL-VuFind2/blob/dev/module/Finna/src/Finna/RecordDriver/SolrMarc.php#L1579

Some modification will be necessary to implement what @bbusenius said in point 3 of #1888 (comment). Currently our code also checks that the linked field sequence number matches the original field. It should be a straighforward change to alter that, but it should be an option since it would lose the link between the original and alt script field and take away the ability to e.g. display the corresponding fields as a pair. It might make sense to do it so that if just a field code is passed, the sequence is not checked, but if a complete MARC field is passed, only its counterpart would suffice.

I acknowledge that the downside of this, as @bbusenius mentioned, is that the fields won't be highlighted. So far it hasn't been a problem for us since title matches are typically quite obvious even without highlighting. In any case, I don't think starting with this would conflict with adding highlighting based on new fields later on.

@bbusenius
Copy link
Contributor Author

@EreMaijala just going by one of our past bug reports and browsing the MARC spec, it looks like the 880 should display without the associated field only when the 880 has a field code (reserved occurrence number) of 00, e.g. 880|6 500-00/$1 |a 卷分上, 中, 下. We have a record like this here: https://catalog.lib.uchicago.edu/vufind/Record/4671634/Details#tabnav. It's a frustrating detail...

@EreMaijala
Copy link
Contributor

@bbusenius Oh, yes, you're right. In that case I think it would be better to have a separate method for fetching the 880 field(s). Actually, it might make sense to add this functionality in the MarcReader class. I think I'll just quickly try something a bit.

@EreMaijala
Copy link
Contributor

See #1895 for some initial work. What do you think?

@EreMaijala
Copy link
Contributor

By the way, #1895 highlights the problem with handling the title in pieces. It would be quite impractical to try to piece together the corresponding parts if there are multiple fields. That's why I used methods that get all the required subfields in one go.

@bbusenius
Copy link
Contributor Author

@EreMaijala the work you've done here! I commented on the other PR. I'm very encouraged by what you've done there. From what I'm seeing, I feel like it would work for us.

@demiankatz
Copy link
Member

@bbusenius, now that #1895 is merged, how would you like to proceed with this PR? Is there functionality here that you still want to see in dev, or does the other solution fully meet your needs? (I know that highlighting is one obvious limitation, but that also didn't seem to be a top-priority problem). Please let me know what you see as the next step(s), and thanks for getting this process started!

@bbusenius
Copy link
Contributor Author

@demiankatz from a functionality standpoint, I think this is good. What we will need next would be "getOtherThingAltScript" methods that could be invoked in the templates the same as what was done for title. I don't really have a spec for this so I don't have a full list but getting the common ones like author seems like it would be good. I would probably just add additional methods to MarcReaderTrait.php and call them in the templates. I was imagining that I would just do that after the merge, so maybe I could take that up?

@EreMaijala
Copy link
Contributor

In some cases where we return structured data it might make sense to include the alt script presentations there. For example getAllSubjectHeadings(true) could include a headingAltScript field. This would allow the alternative script versions to be displayed alongside their counterparts. Otherwise it'll become quite complicated/unmanageable on the template level.
In other cases we may need to start returning structured data...

@demiankatz
Copy link
Member

I agree with @EreMaijala, in cases like subject headings where there is already an option to retrieve detailed structured data from the driver, adding the alt scripts there seems to make a lot of sense. In other cases, it may be justified to add a new switch to the methods to support an alternate structured return format so correlation is possible. (I guess it depends, though, on whether we care about grouping the primary and secondary versions of the same value together, or if we just want to display all the forms as if they were independent values... I'm not really sure how this should look from a UI perspective).

Adding alternate scripts to subject headings may also be motivation to finish up #1791, since this is going to make the subject list even longer. I believe on that one we're currently waiting for somebody to have time to use the generic truncation mechanism in more parts of the code (replacing other methods in facets, etc.). I have not forgotten about that, but I'm still trying to finish off some even older PRs right now.

In any case, for the moment, it looks like the first step is to reconcile conflicts here and decide whether #1895 obviates the need for any part(s) of this work, and then build out the rest on top of the result!

@bbusenius
Copy link
Contributor Author

bbusenius commented May 4, 2021

@demiankatz @EreMaijala thanks for the information. I've been pondering what to do with this one. Firstly, I don't think that anything in this current PR, in its current form, needs to be preserved. From my point of view @EreMaijala's work entirely replaces what is currently here. I'm inclined to say that it might be best for me to do what I can in MarcReaderTrait.php based on @EreMaijala's work in a local module, just to get us in reasonable shape for launching on Folio. Maybe I could skip fields such as subject headings or implement temporary solutions for those. Then that work could inform what I might do in a future PR where I could port what is repurposable and implement solutions for some of the fields that need structured data. That might also help flush out some of the potential UI issues. Then maybe we could write an issue for this somewhere and you guys could help me list out which fields will likely need to use structured data. How does that sound?

@demiankatz
Copy link
Member

Thanks, @bbusenius, that sounds good to me. I'm closing this PR now, and you can open a new one (or a JIRA ticket) when you've had a chance to evaluate more of your local needs and do some experimentation. If I can be of further assistance in the meantime, just let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

880 Support
3 participants