Skip to content
This repository has been archived by the owner on Oct 24, 2024. It is now read-only.

Commit

Permalink
🐛 Skip derivative generation for thumbnails
Browse files Browse the repository at this point in the history
We're seeing jobs that are trying to find the HOCR of a thumbnail; we
don't need that HOCR and it's spawning 5 jobs that are unecessary.

Related to:

- https://github.com/scientist-softserv/adventist-dl/issues/311
  • Loading branch information
jeremyf committed Nov 22, 2023
1 parent 92879b9 commit 81bab28
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 5 deletions.
12 changes: 9 additions & 3 deletions app/jobs/create_derivatives_job_decorator.rb
Original file line number Diff line number Diff line change
@@ -1,26 +1,32 @@
# frozen_string_literal: true

# OVERRIDE HYRAX 2.9.5 to conditionally skip derivative generation

# OVERRIDE HYRAX 2.9.6 to conditionally skip derivative generation
module CreateDerivativesJobDecorator
# @note Override to include conditional validation
def perform(file_set, file_id, filepath = nil)
return unless CreateDerivativesJobDecorator.create_derivative_for?(file_set: file_set)
super
end

##
# @see https://github.com/scientist-softserv/adventist-dl/issues/311 for discussion on structure
# of non-Archival PDF.
NON_ARCHIVAL_PDF_SUFFIXES = [".reader.pdf", ".pdf-r.pdf"].freeze

##
# We should not be creating derivatives for thumbnails.
FILE_SUFFIXES_TO_SKIP_DERIVATIVE_CREATION = ([".tn.jpg", ".tn.png"] + NON_ARCHIVAL_PDF_SUFFIXES).freeze

# rubocop:disable Metrics/LineLength
def self.create_derivative_for?(file_set:)
# Our options appear to be `file_set.label` or `file_set.original_file.original_name`; in
# favoring `#label` we are avoiding a call to Fedora. Is the label likely to be the original
# file name? I hope so.
return false if NON_ARCHIVAL_PDF_SUFFIXES.any? { |suffix| file_set.label.downcase.end_with?(suffix) }
return false if FILE_SUFFIXES_TO_SKIP_DERIVATIVE_CREATION.any? { |suffix| file_set.label.downcase.end_with?(suffix) }

true
end
# rubocop:enable Metrics/LineLength
end

CreateDerivativesJob.prepend(CreateDerivativesJobDecorator)
4 changes: 2 additions & 2 deletions spec/jobs/create_derivatives_job_decorator_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@

let(:file_set) { double(FileSet, label: label) }

context 'when the file set is for a non-archival PDF' do
let(:label) { "my-non-archival#{described_class::NON_ARCHIVAL_PDF_SUFFIXES.first}" }
context 'when the file set is for a skipped suffix' do
let(:label) { "my-non-archival#{described_class::FILE_SUFFIXES_TO_SKIP_DERIVATIVE_CREATION.first}" }

it { is_expected.to be_falsey }
end
Expand Down

0 comments on commit 81bab28

Please sign in to comment.