Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[benchmark/WIP] Benchmarking pyscenedetect detectors' performance #484

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

awkrail
Copy link
Contributor

@awkrail awkrail commented Feb 14, 2025

Discussed in #481
This PR aims to evaluating PySceneDetect detectors' performance in terms of both latency and accuracy.
The accuracy measurement is based on LoveU Challenge Track 1, which calculates recall, precision, and F1 scores. The following is quoted from the task description:

- We use Relative Distance (Rel.Dis) to determine the correctness of each prediction. Rel.Dis is the error between detected and ground-truth timestamps, divided by the length of the whole video. Given a fixed threshold for Rel.Dis, we can determine whether a detection is correct (i.e. <=threshold) or incorrect (i.e. >threshold), then compute precision, recall and F1 score on the whole dataset.
- Note that for each video, we multiple raters who make annotations independently. We (1) compare our detection result with each rater’s annotation and (2) select GT for this video as the rater’s annotation which leads to the best F1 score among all raters.
- The official metric used in this task is F1@5%, which is defined as the F1 score computed with threshold 5%.

Question: F1@5% is OK? I think that 5% difference is critical, so probably changing threshold to <1% may be better.

I will use two datasets: RAI and BBC. Both datasets are uploaded on Zenodo. Please check README.md in the benchmark/ directory to download them.

@awkrail
Copy link
Contributor Author

awkrail commented Feb 14, 2025

I investigated LoVEU Challenge, but relative distance is not appropriate for the shot detection, especially long videos. If a video has 1 hours length, 5% means 180 seconds (3mins). This is undesirable for shot detection. Instead, calculating correctness based on frame numbers is much accurate, so I will implement the evaluation based on absolute frame numbers (+threshold distance).

@awkrail
Copy link
Contributor Author

awkrail commented Feb 15, 2025

Hmm... I am implementing benchmarking codes on RAI, but the performance is quite low.
I applied scenedetect w/ ContentDetector to https://zenodo.org/records/14865179/files/1.mp4?download=1 (1.mp4)
The following file is annotated scenes (scenes_1.txt) are here:

1 717
718 1590
1591 2333
2334 2488
2489 2733
2734 5576
5577 14541

The output of ContentDetector is here (94 scenes are detected): scenedetect -i RAI/1.mp4 list-scenes

Scene Number,Start Frame,Start Timecode,Start Time (seconds),End Frame,End Timecode,End Time (seconds),Length (frames),Length (timecode),Length (seconds)
1,1,00:00:00.000,0.000,522,00:00:20.880,20.880,522,00:00:20.880,20.880
2,523,00:00:20.880,20.880,717,00:00:28.680,28.680,195,00:00:07.800,7.800
3,718,00:00:28.680,28.680,764,00:00:30.560,30.560,47,00:00:01.880,1.880
4,765,00:00:30.560,30.560,823,00:00:32.920,32.920,59,00:00:02.360,2.360
5,824,00:00:32.920,32.920,872,00:00:34.880,34.880,49,00:00:01.960,1.960
6,873,00:00:34.880,34.880,918,00:00:36.720,36.720,46,00:00:01.840,1.840
7,919,00:00:36.720,36.720,964,00:00:38.560,38.560,46,00:00:01.840,1.840
8,965,00:00:38.560,38.560,1017,00:00:40.680,40.680,53,00:00:02.120,2.120
9,1018,00:00:40.680,40.680,1067,00:00:42.680,42.680,50,00:00:02.000,2.000
10,1068,00:00:42.680,42.680,1123,00:00:44.920,44.920,56,00:00:02.240,2.240
11,1124,00:00:44.920,44.920,1168,00:00:46.720,46.720,45,00:00:01.800,1.800
12,1169,00:00:46.720,46.720,1225,00:00:49.000,49.000,57,00:00:02.280,2.280
13,1226,00:00:49.000,49.000,1274,00:00:50.960,50.960,49,00:00:01.960,1.960
14,1275,00:00:50.960,50.960,1328,00:00:53.120,53.120,54,00:00:02.160,2.160
15,1329,00:00:53.120,53.120,1590,00:01:03.600,63.600,262,00:00:10.480,10.480
16,1591,00:01:03.600,63.600,1640,00:01:05.600,65.600,50,00:00:02.000,2.000
17,1641,00:01:05.600,65.600,1686,00:01:07.440,67.440,46,00:00:01.840,1.840
18,1687,00:01:07.440,67.440,1734,00:01:09.360,69.360,48,00:00:01.920,1.920
19,1735,00:01:09.360,69.360,1889,00:01:15.560,75.560,155,00:00:06.200,6.200
20,1890,00:01:15.560,75.560,1936,00:01:17.440,77.440,47,00:00:01.880,1.880
21,1937,00:01:17.440,77.440,1988,00:01:19.520,79.520,52,00:00:02.080,2.080
22,1989,00:01:19.520,79.520,2041,00:01:21.640,81.640,53,00:00:02.120,2.120
23,2042,00:01:21.640,81.640,2088,00:01:23.520,83.520,47,00:00:01.880,1.880
24,2089,00:01:23.520,83.520,2138,00:01:25.520,85.520,50,00:00:02.000,2.000
25,2139,00:01:25.520,85.520,2189,00:01:27.560,87.560,51,00:00:02.040,2.040
26,2190,00:01:27.560,87.560,2236,00:01:29.440,89.440,47,00:00:01.880,1.880
27,2237,00:01:29.440,89.440,2285,00:01:31.400,91.400,49,00:00:01.960,1.960
28,2286,00:01:31.400,91.400,2333,00:01:33.320,93.320,48,00:00:01.920,1.920
29,2334,00:01:33.320,93.320,2385,00:01:35.400,95.400,52,00:00:02.080,2.080
30,2386,00:01:35.400,95.400,2430,00:01:37.200,97.200,45,00:00:01.800,1.800
31,2431,00:01:37.200,97.200,2488,00:01:39.520,99.520,58,00:00:02.320,2.320
32,2489,00:01:39.520,99.520,2547,00:01:41.880,101.880,59,00:00:02.360,2.360
33,2548,00:01:41.880,101.880,2594,00:01:43.760,103.760,47,00:00:01.880,1.880
34,2595,00:01:43.760,103.760,2651,00:01:46.040,106.040,57,00:00:02.280,2.280
35,2652,00:01:46.040,106.040,2698,00:01:47.920,107.920,47,00:00:01.880,1.880
36,2699,00:01:47.920,107.920,2733,00:01:49.320,109.320,35,00:00:01.400,1.400
37,2734,00:01:49.320,109.320,3432,00:02:17.280,137.280,699,00:00:27.960,27.960
38,3433,00:02:17.280,137.280,3701,00:02:28.040,148.040,269,00:00:10.760,10.760
39,3702,00:02:28.040,148.040,3743,00:02:29.720,149.720,42,00:00:01.680,1.680
40,3744,00:02:29.720,149.720,3794,00:02:31.760,151.760,51,00:00:02.040,2.040
41,3795,00:02:31.760,151.760,3851,00:02:34.040,154.040,57,00:00:02.280,2.280
42,3852,00:02:34.040,154.040,4155,00:02:46.200,166.200,304,00:00:12.160,12.160
43,4156,00:02:46.200,166.200,4284,00:02:51.360,171.360,129,00:00:05.160,5.160
44,4285,00:02:51.360,171.360,4339,00:02:53.560,173.560,55,00:00:02.200,2.200
45,4340,00:02:53.560,173.560,4388,00:02:55.520,175.520,49,00:00:01.960,1.960
46,4389,00:02:55.520,175.520,4437,00:02:57.480,177.480,49,00:00:01.960,1.960
47,4438,00:02:57.480,177.480,4487,00:02:59.480,179.480,50,00:00:02.000,2.000
48,4488,00:02:59.480,179.480,4535,00:03:01.400,181.400,48,00:00:01.920,1.920
49,4536,00:03:01.400,181.400,4691,00:03:07.640,187.640,156,00:00:06.240,6.240
50,4692,00:03:07.640,187.640,4768,00:03:10.720,190.720,77,00:00:03.080,3.080
51,4769,00:03:10.720,190.720,4808,00:03:12.320,192.320,40,00:00:01.600,1.600
52,4809,00:03:12.320,192.320,4870,00:03:14.800,194.800,62,00:00:02.480,2.480
53,4871,00:03:14.800,194.800,5077,00:03:23.080,203.080,207,00:00:08.280,8.280
54,5078,00:03:23.080,203.080,5172,00:03:26.880,206.880,95,00:00:03.800,3.800
55,5173,00:03:26.880,206.880,5576,00:03:43.040,223.040,404,00:00:16.160,16.160
56,5577,00:03:43.040,223.040,5639,00:03:45.560,225.560,63,00:00:02.520,2.520
57,5640,00:03:45.560,225.560,5757,00:03:50.280,230.280,118,00:00:04.720,4.720
58,5758,00:03:50.280,230.280,5869,00:03:54.760,234.760,112,00:00:04.480,4.480
59,5870,00:03:54.760,234.760,6002,00:04:00.080,240.080,133,00:00:05.320,5.320
60,6003,00:04:00.080,240.080,6499,00:04:19.960,259.960,497,00:00:19.880,19.880
61,6500,00:04:19.960,259.960,6546,00:04:21.840,261.840,47,00:00:01.880,1.880
62,6547,00:04:21.840,261.840,6612,00:04:24.480,264.480,66,00:00:02.640,2.640
63,6613,00:04:24.480,264.480,6682,00:04:27.280,267.280,70,00:00:02.800,2.800
64,6683,00:04:27.280,267.280,6738,00:04:29.520,269.520,56,00:00:02.240,2.240
65,6739,00:04:29.520,269.520,6895,00:04:35.800,275.800,157,00:00:06.280,6.280
66,6896,00:04:35.800,275.800,6963,00:04:38.520,278.520,68,00:00:02.720,2.720
67,6964,00:04:38.520,278.520,7034,00:04:41.360,281.360,71,00:00:02.840,2.840
68,7035,00:04:41.360,281.360,7083,00:04:43.320,283.320,49,00:00:01.960,1.960
69,7084,00:04:43.320,283.320,7204,00:04:48.160,288.160,121,00:00:04.840,4.840
70,7205,00:04:48.160,288.160,7267,00:04:50.680,290.680,63,00:00:02.520,2.520
71,7268,00:04:50.680,290.680,7917,00:05:16.680,316.680,650,00:00:26.000,26.000
72,7918,00:05:16.680,316.680,7990,00:05:19.600,319.600,73,00:00:02.920,2.920
73,7991,00:05:19.600,319.600,8050,00:05:22.000,322.000,60,00:00:02.400,2.400
74,8051,00:05:22.000,322.000,8153,00:05:26.120,326.120,103,00:00:04.120,4.120
75,8154,00:05:26.120,326.120,8208,00:05:28.320,328.320,55,00:00:02.200,2.200
76,8209,00:05:28.320,328.320,8304,00:05:32.160,332.160,96,00:00:03.840,3.840
77,8305,00:05:32.160,332.160,9213,00:06:08.520,368.520,909,00:00:36.360,36.360
78,9214,00:06:08.520,368.520,9268,00:06:10.720,370.720,55,00:00:02.200,2.200
79,9269,00:06:10.720,370.720,9327,00:06:13.080,373.080,59,00:00:02.360,2.360
80,9328,00:06:13.080,373.080,9439,00:06:17.560,377.560,112,00:00:04.480,4.480
81,9440,00:06:17.560,377.560,9512,00:06:20.480,380.480,73,00:00:02.920,2.920
82,9513,00:06:20.480,380.480,9611,00:06:24.440,384.440,99,00:00:03.960,3.960
83,9612,00:06:24.440,384.440,9664,00:06:26.560,386.560,53,00:00:02.120,2.120
84,9665,00:06:26.560,386.560,9730,00:06:29.200,389.200,66,00:00:02.640,2.640
85,9731,00:06:29.200,389.200,9929,00:06:37.160,397.160,199,00:00:07.960,7.960
86,9930,00:06:37.160,397.160,10120,00:06:44.800,404.800,191,00:00:07.640,7.640
87,10121,00:06:44.800,404.800,10238,00:06:49.520,409.520,118,00:00:04.720,4.720
88,10239,00:06:49.520,409.520,10296,00:06:51.840,411.840,58,00:00:02.320,2.320
89,10297,00:06:51.840,411.840,11073,00:07:22.920,442.920,777,00:00:31.080,31.080
90,11074,00:07:22.920,442.920,11162,00:07:26.480,446.480,89,00:00:03.560,3.560
91,11163,00:07:26.480,446.480,11229,00:07:29.160,449.160,67,00:00:02.680,2.680
92,11230,00:07:29.160,449.160,11585,00:07:43.400,463.400,356,00:00:14.240,14.240
93,11586,00:07:43.400,463.400,11599,00:07:43.960,463.960,14,00:00:00.560,0.560
94,11600,00:07:43.960,463.960,14541,00:09:41.640,581.640,2942,00:01:57.680,117.680

Actually, the predicted scenes are not bad. I attached the detected shot examples here.
Therefore, the shot boundary we assumed is different from RAI, thus RAI may not be suitable for the evaluation.
Let me try BBC.

Predicted shots:
https://github.com/user-attachments/assets/74d3ebff-1b7d-4560-9efa-c10072acb8d2
https://github.com/user-attachments/assets/abfd2f93-692b-4f9d-9182-88d43dba47de
https://github.com/user-attachments/assets/dad6d2d0-371e-4fc2-951a-facb99be5805
https://github.com/user-attachments/assets/671067bc-a2ee-4dcb-98d6-8013620fd828

@awkrail
Copy link
Contributor Author

awkrail commented Feb 15, 2025

This leaderboard also describes that the ground-truth in RAI contains imperfections (they are collected in their private benchmark).

Our collection contains videos from popular RAI dataset, videos from MSU codecs comparison 2019 and 2020 test sets, and also videos collected from different sources. Our analysis has shown that groud truth in RAI contains imperfections, which we fixed in our collections

@awkrail
Copy link
Contributor Author

awkrail commented Feb 15, 2025

I implemented performance evaluation. The results are computed as recall, precision, F1, and elapsed time (seconds).
The following is the result:

Detector Recall Precision F1 Elapsed time (second)
AdaptiveDetector 7.80 96.18 14.44 25.75
ContentDetector 84.52 88.77 86.59 25.50
HashDetector 8.57 80.27 15.48 23.78
HistogramDetector 8.22 70.82 14.72 18.60
ThresholdDetector 0.00 0.00 0.00 18.95

As you can see, ContentDetector achieves the highest performance among the detectors. However, unfortunately, other detectors acheives poor performance on BBC dataset. It might be necessary to tune parameters, e.g. threshold to obtain better results.

@Breakthrough Breakthrough self-assigned this Feb 15, 2025
@Breakthrough
Copy link
Owner

Awesome, thanks so much for getting started on this! We definitely have some work to do with tuning the other detectors, but this gives us a nice framework to actually validate those changes. I've approved these changes as-is so you can merge it whenever you would like, but feel free to make more changes if you have any plans. Otherwise feel free to submit a follow-up PR with other changes.

I'm excited to start trying this myself - thank you again for getting this up and running!

@awkrail
Copy link
Contributor Author

awkrail commented Feb 16, 2025

@Breakthrough
Thank you for approving this PR. I merge this and then send a new one to evaluate detectors on AutoShots (maybe subset). I don't have a permission to merge this PR, so could you merge it? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance evaluation of PySceneDetect in terms of both latency and accuracy
2 participants