-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Video pre-processing pipeline? #1
Comments
Thank you for your attention. Please refer to the data collection and preprocessing procedures in our paper. |
Hi @nankepan, Could you mind sharing the data-processing scripts? |
@nankepan In the third section of the paper, why is computing the semantic similarity between adjacent frames to differentiate between static and flicker videos, instead of directly compute the motion score? Consider the following two scenarios: 1) A person speaking to the camera with slight bodily movements, and 2) camera movements such as dolly or pan shots. In the former case, since the background remains stationary, the similarity of CLIP features should also be relatively high. Conversely, in the latter scenario, due to the camera's motion, the semantic similarity between sampled adjacent frames might actually be quite lower. However, neither of these belong to the static or flicker videos. Could you provide a few examples to illustrate the necessity of this operation? |
Hi @nankepan, I also have a question about data preprocessing. How do you annotate the camera motion classes such as static, pan right, zoom in, zoom in + tilt down, etc.? It would be more helpful if you could release the code or detailed tutorial. |
Hi, great work! I can't find a specific data collection and pre-processing pipeline. Could you elaborate on video sources and text annotations?
The text was updated successfully, but these errors were encountered: