Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video pre-processing pipeline? #1

Open
hkunzhe opened this issue Jul 3, 2024 · 4 comments
Open

Video pre-processing pipeline? #1

hkunzhe opened this issue Jul 3, 2024 · 4 comments

Comments

@hkunzhe
Copy link

hkunzhe commented Jul 3, 2024

Hi, great work! I can't find a specific data collection and pre-processing pipeline. Could you elaborate on video sources and text annotations?

@nankepan
Copy link
Collaborator

nankepan commented Jul 4, 2024

Thank you for your attention. Please refer to the data collection and preprocessing procedures in our paper.

@hkunzhe hkunzhe closed this as completed Jul 12, 2024
@hkunzhe hkunzhe reopened this Jul 12, 2024
@hkunzhe
Copy link
Author

hkunzhe commented Jul 12, 2024

Hi @nankepan, Could you mind sharing the data-processing scripts?

@hkunzhe
Copy link
Author

hkunzhe commented Jul 12, 2024

@nankepan In the third section of the paper, why is computing the semantic similarity between adjacent frames to differentiate between static and flicker videos, instead of directly compute the motion score?

Consider the following two scenarios: 1) A person speaking to the camera with slight bodily movements, and 2) camera movements such as dolly or pan shots. In the former case, since the background remains stationary, the similarity of CLIP features should also be relatively high. Conversely, in the latter scenario, due to the camera's motion, the semantic similarity between sampled adjacent frames might actually be quite lower. However, neither of these belong to the static or flicker videos.

Could you provide a few examples to illustrate the necessity of this operation?

@moonbow721
Copy link

Hi @nankepan, I also have a question about data preprocessing. How do you annotate the camera motion classes such as static, pan right, zoom in, zoom in + tilt down, etc.? It would be more helpful if you could release the code or detailed tutorial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants