Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance evaluation of PySceneDetect in terms of both latency and accuracy #481

Open
awkrail opened this issue Feb 11, 2025 · 4 comments

Comments

@awkrail
Copy link
Contributor

awkrail commented Feb 11, 2025

Problem/Use Case

Currently, PySceneDetect does not support evaluation, However, evaluating its performance is crucial for the further development. I propose a feature to integrate evaluation codes into PySceneDetect. This issue describes the procedure.

Solutions

Datasets

To evaluate the performance, we need datasets that consists of videos and manually-annotated shots. I investigated shot detection on Google scholar, and found that the following datasets were proposed. I think that BCC and RAI are a good starting point because these datasets are frequently used in shot detection literature and the dataset size is small, so easy to download. In addition, Kinetics-GEBD, ClipShot, and AutoShot collected videos from YouTube, thus using them for our evaluation protocols may violate YouTube's policy.

Dataset conference domain #videos Avg. video length (second) #citations Paper title
BCC ACMMM15 Broadcast 11 2,945 133 A deep siamese network for scene detection in broadcast videos
RAI CAIP15 Broadcast 10 591 86 Shot and scene detection via clustering for re-using broadcast video
Kinetics-GEBD ICCV21 General 55351 n/a 81 Generic Event Boundary Detection: A Benchmark for Event Detection
ClipShot ACCV18 General 4039 237 54 Fast Video Shot Transition Localization with Deep Structured Models
AutoShot CVPR Workshop 23 General 853 39 13 AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection

Metrics

The previous literature use recall, precision, and F1 scores to evaluate their methods. Let $\hat{Y}=(\hat{y}_1, \hat{y}_2, \cdots, \hat{y}_k, \cdots, \hat{y}_K)$ be predicted shot boundary frame numbers and $Y=(y_1, y_2, \cdots, y_l, \cdots, y_L)$ be the manually-annotated shot frame numbers.
Recall and precision is calculated as the following Python code:

def compute_f1(hat_ys, ys):
    threshold = 5 # if abs(hat_y - y) <= threshold, the prediction is accurate
    correct = 0
    for hat_y in hat_ys:
         if min([abs(hat_y - y) for y in ys]) < threshold:
             correct += 1
    recall = correct / len(ys)
    precision = correct / len(hat_ys)
    f1 = 2 * recall * precision / (recall + precision)

Note that this code provides a rough overview of the evaluation process. For precise implementation details, I will need to understand edge cases (e.g., two hat_y correspond to one y, so many-to-one case).

Implementation

I believe two evaluation modes are necessary: local mode and CI mode.
For local mode, I created an evaluation/ directory in the home directory and wrote Python scripts to run evaluations on local laptops.
For CI mode, based on the evaluation/ directory, we set up GitHub Actions to automatically run evaluation commands whenever new commits are pushed.

Questions

How do we store RAI and BCC video datasets? Because the video size are larger than Github limitations (100MB), we need a storage service.
Zenodo is one of the candidate because it allows us to store datasets for academic purposes and allows us to download the datasets in a CLI-friendly manner (like curl and wget).

@Breakthrough
Copy link
Owner

This would be fantastic to have, thanks for writing this up. Have you been able to run any evaluations locally? Feel free to upload a pull request with any scripts you might have. Even if they can only be run locally by a developer, or if they need to download files, that's okay for now.

Once we have a workflow that's easy enough to run locally, I don't mind looking into the other issues you raised about how to do this with Github Actions. Thanks for the link to Zenodo as well by the way, that looks super useful. We might be able to find other existing datasets on there as well in the future.

Note that we could also use Git LFS on Github and store the artifacts in a separate repository. I registered a Github organization called PySceneDetect, so we could setup a repo there for this purpose.

@awkrail
Copy link
Contributor Author

awkrail commented Feb 12, 2025

@Breakthrough Thank you for your reply.

Have you been able to run any evaluations locally? Feel free to upload a pull request with any scripts you might have. Even if they can only be run locally by a developer, or if they need to download files, that's okay for now.

I just started writing code for it. Which directory structure do you prefer: PySceneDetect/scenedetect/evaluation or PySceneDetect/evaluation? I’m wondering if I should place my Python code in PySceneDetect/scenedetect/evaluation, since the current code is located in PySceneDetect/scenedetect.

Note that we could also use Git LFS on Github and store the artifacts in a separate repository. I registered a Github organization called PySceneDetect, so we could setup a repo there for this purpose.

Github LFS seems a good option. Because the video size is limited, LFS might be better if you allow me to use it.
Anyway, I will create a PR for evaluation commands in local, and then create a next PR for Github actions (CI mode).

@Breakthrough
Copy link
Owner

Breakthrough commented Feb 13, 2025

Which directory structure do you prefer: PySceneDetect/scenedetect/evaluation or PySceneDetect/evaluation? I’m wondering if I should place my Python code in PySceneDetect/scenedetect/evaluation, since the current code is located in PySceneDetect/scenedetect.

There are actually issues placing sub-folders under the scenedetect/ folder since it acts like the module for Python. Could you create a new folder called benchmarks in the root of the repo and use that?

Github LFS seems a good option. Because the video size is limited, LFS might be better if you allow me to use it. Anyway, I will create a PR for evaluation commands in local, and then create a next PR for Github actions (CI mode).

Hmm, I did some more digging and using Github for this might not be tenable - the bandwidth limit for free accounts is 1 GiB. Pricing is $0.07/GiB of storage and ~$0.09/GiB for bandwidth. The ClipShot dataset alone is around 45 GiB, which works out to $3.15, plus say ~$4 per download of the entire thing, so that would add up fast. We unfortunately would have to pay those bandwidth costs for Github Actions too each time we run this here.

Let me do some more research into this aspect. I think we definitely need to have some kind of backup mirror for the project's purposes, but we shouldn't need to be blocked on that. Running it locally is also fine for now. Maybe we can choose a small sub-set of the full data that will be good enough for most purposes. E.g. we all choose ~100 videos or so, and limit the dataset size used in the CI actions so we don't hit bandwidth limits. If we can keep the costs under control, I'm happy to cover them.

For now, could we just include links to where to download the datasets, and instructions on where to put them to run the benchmarks?

@awkrail
Copy link
Contributor Author

awkrail commented Feb 13, 2025

There are actually issues placing sub-folders under the scenedetect/ folder since it acts like the module for Python. Could you create a new folder called benchmarks in the root of the repo and use that?

For now, could we just include links to where to download the datasets, and instructions on where to put them to run the benchmarks?

Got it! Thanks.

Hmm, I did some more digging and using Github for this might not be tenable - the bandwidth limit for free accounts is 1 GiB. Pricing is $0.07/GiB of storage and ~$0.09/GiB for bandwidth. The ClipShot dataset alone is around 45 GiB, which works out to $3.15, plus say ~$4 per download of the entire thing, so that would add up fast. We unfortunately would have to pay those bandwidth costs for Github Actions too each time we run this here.
Let me do some more research into this aspect. I think we definitely need to have some kind of backup mirror for the project's purposes, but we shouldn't need to be blocked on that. Running it locally is also fine for now. Maybe we can choose a small sub-set of the full data that will be good enough for most purposes. E.g. we all choose ~100 videos or so, and limit the dataset size used in the CI actions so we don't hit bandwidth limits. If we can keep the costs under control, I'm happy to cover them.

I agree with selecting videos from the dataset to reduce the cost. RAI and BCC are broadcast videos, so I want to pick up diverse videos from ClipShots or AutoShots. In addition, I think that video's license is also important, Creative Commons License or other free-to-use License are desirable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants