-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[benchmark/WIP] Benchmarking pyscenedetect detectors' performance #484
base: main
Are you sure you want to change the base?
Conversation
I investigated LoVEU Challenge, but relative distance is not appropriate for the shot detection, especially long videos. If a video has 1 hours length, 5% means 180 seconds (3mins). This is undesirable for shot detection. Instead, calculating correctness based on frame numbers is much accurate, so I will implement the evaluation based on absolute frame numbers (+threshold distance). |
Hmm... I am implementing benchmarking codes on RAI, but the performance is quite low.
The output of ContentDetector is here (94 scenes are detected):
Actually, the predicted scenes are not bad. I attached the detected shot examples here. Predicted shots: |
This leaderboard also describes that the ground-truth in RAI contains imperfections (they are collected in their private benchmark).
|
I implemented performance evaluation. The results are computed as recall, precision, F1, and elapsed time (seconds).
As you can see, ContentDetector achieves the highest performance among the detectors. However, unfortunately, other detectors acheives poor performance on BBC dataset. It might be necessary to tune parameters, e.g. threshold to obtain better results. |
Awesome, thanks so much for getting started on this! We definitely have some work to do with tuning the other detectors, but this gives us a nice framework to actually validate those changes. I've approved these changes as-is so you can merge it whenever you would like, but feel free to make more changes if you have any plans. Otherwise feel free to submit a follow-up PR with other changes. I'm excited to start trying this myself - thank you again for getting this up and running! |
@Breakthrough |
Discussed in #481
This PR aims to evaluating PySceneDetect detectors' performance in terms of both latency and accuracy.
The accuracy measurement is based on LoveU Challenge Track 1, which calculates recall, precision, and F1 scores. The following is quoted from the task description:
Question: F1@5% is OK? I think that 5% difference is critical, so probably changing threshold to <1% may be better.
I will use two datasets: RAI and BBC. Both datasets are uploaded on Zenodo. Please check README.md in the
benchmark/
directory to download them.