Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow payloads to be propagated to new tasks #56

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

conitrade-as
Copy link
Contributor

This change set adds the ability to propagate payloads from received tasks to new tasks for extracted files.

This allows e.g. to specify in the parent task that the file came originally from an e-mail.

task.add_payload('ext_origin_id', 'email', persistent=True)

Any downstream karton consumers can use these propagated payloads to fine-tune their decision making. A simple example is provided below. We want to flag .docm files in a .zip archive received over e-mail:

sample = task.get_resource('sample')
ext_origin_id = task.get_payload('ext_origin_id')
extraction_level = task.get_payload('extraction_level')
if '.docm' in sample.name and extraction_level and ext_origin_id == 'email':
    print('.docm files extracted from .zip over email are considered super suspicious')

@conitrade-as conitrade-as force-pushed the feature/payload-propagation branch from dcc17bc to 197dfe2 Compare April 18, 2024 10:39
@conitrade-as
Copy link
Contributor Author

@psrok1 @nazywam @msm-code Can someone of you please take a look at this PR?

Copy link
Member

@msm-cert msm-cert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, we have some improvement ideas for the implementation, but they're solvable. But we don't fully understand the use case of this. For example, let's take a look at the example from description:

task.add_payload('ext_origin_id', 'email', persistent=True)

and you use your changes to automatically copy ext_origin_id payload to all the subtasks. But you don't have to do this! Because the payload is already "persistent" (so it's copied automatically to all children tasks)

In general, to the best of my knowledge, this PR is a no-op if a payload is already persistent

The only situation where it does something is if you want to automatically copy payload to subtasks of archive extractor, but you don't want it to be persistent (so in case of A->archive extractor->B->C->D the payload will propagate to B, but not to C and D).

Do you have an use case for this?

If you have an use case for this (please provide an example), we are OK with this change (speaking for myself here a bit), but it should be configured in a much simpler way. Instead of:

[archive-extractor]
...
[archive-extractor-payload-propagation]
ext_origin_id = False
ext_source_id = False

we can just do:

[archive-extractor]
propagate_payloads = ext_origin_id,ext_source_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants