-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canonical types and aliases #101
base: main
Are you sure you want to change the base?
Conversation
216b5b3
to
075b285
Compare
|
||
Marcel::MimeType.extend "audio/aac", extensions: %w( aac ), parents: "audio/x-aac" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This style of "overriding" a MIME type results in .aac files resolving to audio/aac
but leaves existing magic bytes matches resolving to audio/x-aac
.
Replaced with canonicalize "audio/aac", instead_of: "audio/x-aac"
.
Marcel::MimeType.extend "application/vnd.apple.keynote", extensions: %w( key ), parents: "application/zip" | ||
Marcel::MimeType.extend "application/vnd.apple.pages", parents: "application/zip" | ||
Marcel::MimeType.extend "application/vnd.apple.numbers", parents: "application/zip" | ||
Marcel::MimeType.extend "application/vnd.apple.keynote", parents: "application/zip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including the file extensions suggests they're being extended, but they aren't. Omit them to clarify that we're changing the parent.
Marcel::MimeType.extend "audio/ogg", extensions: %w( ogg oga ), magic: [[0, 'OggS', [[29, 'vorbis']]]] | ||
Marcel::MimeType.canonicalize "audio/aac", instead_of: "audio/x-aac" | ||
Marcel::MimeType.canonicalize "audio/flac", instead_of: "audio/x-flac" | ||
Marcel::MimeType.canonicalize "audio/x-wav", instead_of: "audio/vnd.wave" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, change the canonical type rather than introducing misleading MIME subtypes.
|
||
Marcel::MimeType.extend "image/vnd.dwg", magic: [[0, "AC10"]] | ||
Marcel::MimeType.extend "audio/mpc", magic: [[0, "MPCKSH"]], extensions: %w( mpc ) | ||
Marcel::MimeType.extend "audio/ogg", extensions: %w( ogg oga ), magic: [[0, 'OggS', [[29, 'vorbis']]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really odd one, breaking the MIME hierarchy entirely and using the same magic matcher as audio/vorbis
. Leaving this for later.
Marcel::MimeType.extend "image/heic", magic: [[4, "ftypheic"]], extensions: %w( heic ) | ||
Marcel::MimeType.extend "image/avif", magic: [[4, "ftypavif"]] | ||
Marcel::MimeType.extend "image/heif", magic: [[4, "ftypmif1"]] | ||
Marcel::MimeType.extend "image/heic", magic: [[4, "ftypheic"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These matchers are already in the Tika data. Unclear whether these are simply defunct and can be removed.
|
||
Marcel::MimeType.extend "image/x-raw-sony", extensions: %w( arw ), parents: "image/tiff" | ||
Marcel::MimeType.extend "image/x-raw-canon", extensions: %w( cr2 crw ), parents: "image/tiff" | ||
Marcel::MimeType.extend "image/x-raw-canon", parents: "image/tiff" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop duplicate extensions.
|
||
Marcel::MimeType.extend "video/mp4", magic: [[4, "ftypisom"], [4, "ftypM4V "]], extensions: %w( mp4 m4v ) | ||
|
||
Marcel::MimeType.extend "audio/flac", magic: [[0, 'fLaC']], extensions: %w( flac ), parents: "audio/x-flac" | ||
Marcel::MimeType.extend "audio/x-wav", magic: [[0, 'RIFF', [[8, 'WAVE']]]], extensions: %w( wav ), parents: "audio/vnd.wav" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to canonicalize
.
test "detects #{content_type} given magic bytes from #{name} and aliased type #{aliased}" do | ||
assert_equal content_type, Marcel::MimeType.for(file, declared_type: aliased) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test that declared_type: alias
will resolve to the canonical type. This what'll allow browser uploads using alias types to be correctly identified.
Marcel::Magic.add('canonical/child', aliases: 'alias/child', parents: 'canonical/parent') | ||
|
||
assert Marcel::Magic.child?('alias/child', 'alias/parent') | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguments may be alias types, so check that they're resolved to canonicals before checking parentage.
existing_magic = Marcel::MAGIC.select { |type, _| type == type }.map(&:last) | ||
if magic.any? && magic == existing_magic | ||
warn "#{type} already has magic matchers #{magic.inspect}" | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all pretty grotesque. Trying to limit the API footprint and not mess with the carefully tuned data tables.
* MIME type aliases are now supported. * Aliases are resolved to their canonical type in all APIs. * Introduce `MimeType.canonicalize type, instead_of: old` to override a Tika canonical type with our own, essentially renaming the type and making the old type an alias of the new one. Common scenario with types like WAV with multiple competing historical types, RFCs that aren't actually followed, and browser support trumping them all. This allows us to give preference to browsers' actual MIME type support while keeping Tika's file extensions and magic byte matchers. * Warns when extending a type with preexisting extensions, parents, etc. and when extending an aliased type.
MimeType.canonicalize type, instead_of: old
to override a Tika canonical type with our own, essentially renaming the type and making the old type an alias of the new one. Common scenario with types like WAV with multiple competing historical types, RFCs that aren't actually followed, and browser support trumping them all. This allows us to give preference to browsers' actual MIME type support while keeping Tika's file extensions and magic byte matchers.