[benchmark/WIP] Benchmarking pyscenedetect detectors' performance #484

awkrail · 2025-02-14T03:04:13Z

Discussed in #481
This PR aims to evaluating PySceneDetect detectors' performance in terms of both latency and accuracy.
The accuracy measurement is based on LoveU Challenge Track 1, which calculates recall, precision, and F1 scores. The following is quoted from the task description:

- We use Relative Distance (Rel.Dis) to determine the correctness of each prediction. Rel.Dis is the error between detected and ground-truth timestamps, divided by the length of the whole video. Given a fixed threshold for Rel.Dis, we can determine whether a detection is correct (i.e. <=threshold) or incorrect (i.e. >threshold), then compute precision, recall and F1 score on the whole dataset.
- Note that for each video, we multiple raters who make annotations independently. We (1) compare our detection result with each rater’s annotation and (2) select GT for this video as the rater’s annotation which leads to the best F1 score among all raters.
- The official metric used in this task is F1@5%, which is defined as the F1 score computed with threshold 5%.

Question: F1@5% is OK? I think that 5% difference is critical, so probably changing threshold to <1% may be better.

I will use two datasets: RAI and BBC. Both datasets are uploaded on Zenodo. Please check README.md in the benchmark/ directory to download them.

RAI
BBC

awkrail · 2025-02-14T12:04:01Z

I investigated LoVEU Challenge, but relative distance is not appropriate for the shot detection, especially long videos. If a video has 1 hours length, 5% means 180 seconds (3mins). This is undesirable for shot detection. Instead, calculating correctness based on frame numbers is much accurate, so I will implement the evaluation based on absolute frame numbers (+threshold distance).

awkrail · 2025-02-15T02:02:38Z

Hmm... I am implementing benchmarking codes on RAI, but the performance is quite low.
I applied scenedetect w/ ContentDetector to https://zenodo.org/records/14865179/files/1.mp4?download=1 (1.mp4)
The following file is annotated scenes (scenes_1.txt) are here:

The output of ContentDetector is here (94 scenes are detected): scenedetect -i RAI/1.mp4 list-scenes

Scene Number,Start Frame,Start Timecode,Start Time (seconds),End Frame,End Timecode,End Time (seconds),Length (frames),Length (timecode),Length (seconds)
1,1,00:00:00.000,0.000,522,00:00:20.880,20.880,522,00:00:20.880,20.880
2,523,00:00:20.880,20.880,717,00:00:28.680,28.680,195,00:00:07.800,7.800
3,718,00:00:28.680,28.680,764,00:00:30.560,30.560,47,00:00:01.880,1.880
4,765,00:00:30.560,30.560,823,00:00:32.920,32.920,59,00:00:02.360,2.360
5,824,00:00:32.920,32.920,872,00:00:34.880,34.880,49,00:00:01.960,1.960
6,873,00:00:34.880,34.880,918,00:00:36.720,36.720,46,00:00:01.840,1.840
7,919,00:00:36.720,36.720,964,00:00:38.560,38.560,46,00:00:01.840,1.840
8,965,00:00:38.560,38.560,1017,00:00:40.680,40.680,53,00:00:02.120,2.120
9,1018,00:00:40.680,40.680,1067,00:00:42.680,42.680,50,00:00:02.000,2.000
10,1068,00:00:42.680,42.680,1123,00:00:44.920,44.920,56,00:00:02.240,2.240
11,1124,00:00:44.920,44.920,1168,00:00:46.720,46.720,45,00:00:01.800,1.800
12,1169,00:00:46.720,46.720,1225,00:00:49.000,49.000,57,00:00:02.280,2.280
13,1226,00:00:49.000,49.000,1274,00:00:50.960,50.960,49,00:00:01.960,1.960
14,1275,00:00:50.960,50.960,1328,00:00:53.120,53.120,54,00:00:02.160,2.160
15,1329,00:00:53.120,53.120,1590,00:01:03.600,63.600,262,00:00:10.480,10.480
16,1591,00:01:03.600,63.600,1640,00:01:05.600,65.600,50,00:00:02.000,2.000
17,1641,00:01:05.600,65.600,1686,00:01:07.440,67.440,46,00:00:01.840,1.840
18,1687,00:01:07.440,67.440,1734,00:01:09.360,69.360,48,00:00:01.920,1.920
19,1735,00:01:09.360,69.360,1889,00:01:15.560,75.560,155,00:00:06.200,6.200
20,1890,00:01:15.560,75.560,1936,00:01:17.440,77.440,47,00:00:01.880,1.880
21,1937,00:01:17.440,77.440,1988,00:01:19.520,79.520,52,00:00:02.080,2.080
22,1989,00:01:19.520,79.520,2041,00:01:21.640,81.640,53,00:00:02.120,2.120
23,2042,00:01:21.640,81.640,2088,00:01:23.520,83.520,47,00:00:01.880,1.880
24,2089,00:01:23.520,83.520,2138,00:01:25.520,85.520,50,00:00:02.000,2.000
25,2139,00:01:25.520,85.520,2189,00:01:27.560,87.560,51,00:00:02.040,2.040
26,2190,00:01:27.560,87.560,2236,00:01:29.440,89.440,47,00:00:01.880,1.880
27,2237,00:01:29.440,89.440,2285,00:01:31.400,91.400,49,00:00:01.960,1.960
28,2286,00:01:31.400,91.400,2333,00:01:33.320,93.320,48,00:00:01.920,1.920
29,2334,00:01:33.320,93.320,2385,00:01:35.400,95.400,52,00:00:02.080,2.080
30,2386,00:01:35.400,95.400,2430,00:01:37.200,97.200,45,00:00:01.800,1.800
31,2431,00:01:37.200,97.200,2488,00:01:39.520,99.520,58,00:00:02.320,2.320
32,2489,00:01:39.520,99.520,2547,00:01:41.880,101.880,59,00:00:02.360,2.360
33,2548,00:01:41.880,101.880,2594,00:01:43.760,103.760,47,00:00:01.880,1.880
34,2595,00:01:43.760,103.760,2651,00:01:46.040,106.040,57,00:00:02.280,2.280
35,2652,00:01:46.040,106.040,2698,00:01:47.920,107.920,47,00:00:01.880,1.880
36,2699,00:01:47.920,107.920,2733,00:01:49.320,109.320,35,00:00:01.400,1.400
37,2734,00:01:49.320,109.320,3432,00:02:17.280,137.280,699,00:00:27.960,27.960
38,3433,00:02:17.280,137.280,3701,00:02:28.040,148.040,269,00:00:10.760,10.760
39,3702,00:02:28.040,148.040,3743,00:02:29.720,149.720,42,00:00:01.680,1.680
40,3744,00:02:29.720,149.720,3794,00:02:31.760,151.760,51,00:00:02.040,2.040
41,3795,00:02:31.760,151.760,3851,00:02:34.040,154.040,57,00:00:02.280,2.280
42,3852,00:02:34.040,154.040,4155,00:02:46.200,166.200,304,00:00:12.160,12.160
43,4156,00:02:46.200,166.200,4284,00:02:51.360,171.360,129,00:00:05.160,5.160
44,4285,00:02:51.360,171.360,4339,00:02:53.560,173.560,55,00:00:02.200,2.200
45,4340,00:02:53.560,173.560,4388,00:02:55.520,175.520,49,00:00:01.960,1.960
46,4389,00:02:55.520,175.520,4437,00:02:57.480,177.480,49,00:00:01.960,1.960
47,4438,00:02:57.480,177.480,4487,00:02:59.480,179.480,50,00:00:02.000,2.000
48,4488,00:02:59.480,179.480,4535,00:03:01.400,181.400,48,00:00:01.920,1.920
49,4536,00:03:01.400,181.400,4691,00:03:07.640,187.640,156,00:00:06.240,6.240
50,4692,00:03:07.640,187.640,4768,00:03:10.720,190.720,77,00:00:03.080,3.080
51,4769,00:03:10.720,190.720,4808,00:03:12.320,192.320,40,00:00:01.600,1.600
52,4809,00:03:12.320,192.320,4870,00:03:14.800,194.800,62,00:00:02.480,2.480
53,4871,00:03:14.800,194.800,5077,00:03:23.080,203.080,207,00:00:08.280,8.280
54,5078,00:03:23.080,203.080,5172,00:03:26.880,206.880,95,00:00:03.800,3.800
55,5173,00:03:26.880,206.880,5576,00:03:43.040,223.040,404,00:00:16.160,16.160
56,5577,00:03:43.040,223.040,5639,00:03:45.560,225.560,63,00:00:02.520,2.520
57,5640,00:03:45.560,225.560,5757,00:03:50.280,230.280,118,00:00:04.720,4.720
58,5758,00:03:50.280,230.280,5869,00:03:54.760,234.760,112,00:00:04.480,4.480
59,5870,00:03:54.760,234.760,6002,00:04:00.080,240.080,133,00:00:05.320,5.320
60,6003,00:04:00.080,240.080,6499,00:04:19.960,259.960,497,00:00:19.880,19.880
61,6500,00:04:19.960,259.960,6546,00:04:21.840,261.840,47,00:00:01.880,1.880
62,6547,00:04:21.840,261.840,6612,00:04:24.480,264.480,66,00:00:02.640,2.640
63,6613,00:04:24.480,264.480,6682,00:04:27.280,267.280,70,00:00:02.800,2.800
64,6683,00:04:27.280,267.280,6738,00:04:29.520,269.520,56,00:00:02.240,2.240
65,6739,00:04:29.520,269.520,6895,00:04:35.800,275.800,157,00:00:06.280,6.280
66,6896,00:04:35.800,275.800,6963,00:04:38.520,278.520,68,00:00:02.720,2.720
67,6964,00:04:38.520,278.520,7034,00:04:41.360,281.360,71,00:00:02.840,2.840
68,7035,00:04:41.360,281.360,7083,00:04:43.320,283.320,49,00:00:01.960,1.960
69,7084,00:04:43.320,283.320,7204,00:04:48.160,288.160,121,00:00:04.840,4.840
70,7205,00:04:48.160,288.160,7267,00:04:50.680,290.680,63,00:00:02.520,2.520
71,7268,00:04:50.680,290.680,7917,00:05:16.680,316.680,650,00:00:26.000,26.000
72,7918,00:05:16.680,316.680,7990,00:05:19.600,319.600,73,00:00:02.920,2.920
73,7991,00:05:19.600,319.600,8050,00:05:22.000,322.000,60,00:00:02.400,2.400
74,8051,00:05:22.000,322.000,8153,00:05:26.120,326.120,103,00:00:04.120,4.120
75,8154,00:05:26.120,326.120,8208,00:05:28.320,328.320,55,00:00:02.200,2.200
76,8209,00:05:28.320,328.320,8304,00:05:32.160,332.160,96,00:00:03.840,3.840
77,8305,00:05:32.160,332.160,9213,00:06:08.520,368.520,909,00:00:36.360,36.360
78,9214,00:06:08.520,368.520,9268,00:06:10.720,370.720,55,00:00:02.200,2.200
79,9269,00:06:10.720,370.720,9327,00:06:13.080,373.080,59,00:00:02.360,2.360
80,9328,00:06:13.080,373.080,9439,00:06:17.560,377.560,112,00:00:04.480,4.480
81,9440,00:06:17.560,377.560,9512,00:06:20.480,380.480,73,00:00:02.920,2.920
82,9513,00:06:20.480,380.480,9611,00:06:24.440,384.440,99,00:00:03.960,3.960
83,9612,00:06:24.440,384.440,9664,00:06:26.560,386.560,53,00:00:02.120,2.120
84,9665,00:06:26.560,386.560,9730,00:06:29.200,389.200,66,00:00:02.640,2.640
85,9731,00:06:29.200,389.200,9929,00:06:37.160,397.160,199,00:00:07.960,7.960
86,9930,00:06:37.160,397.160,10120,00:06:44.800,404.800,191,00:00:07.640,7.640
87,10121,00:06:44.800,404.800,10238,00:06:49.520,409.520,118,00:00:04.720,4.720
88,10239,00:06:49.520,409.520,10296,00:06:51.840,411.840,58,00:00:02.320,2.320
89,10297,00:06:51.840,411.840,11073,00:07:22.920,442.920,777,00:00:31.080,31.080
90,11074,00:07:22.920,442.920,11162,00:07:26.480,446.480,89,00:00:03.560,3.560
91,11163,00:07:26.480,446.480,11229,00:07:29.160,449.160,67,00:00:02.680,2.680
92,11230,00:07:29.160,449.160,11585,00:07:43.400,463.400,356,00:00:14.240,14.240
93,11586,00:07:43.400,463.400,11599,00:07:43.960,463.960,14,00:00:00.560,0.560
94,11600,00:07:43.960,463.960,14541,00:09:41.640,581.640,2942,00:01:57.680,117.680

Actually, the predicted scenes are not bad. I attached the detected shot examples here.
Therefore, the shot boundary we assumed is different from RAI, thus RAI may not be suitable for the evaluation.
Let me try BBC.

Predicted shots:
https://github.com/user-attachments/assets/74d3ebff-1b7d-4560-9efa-c10072acb8d2
https://github.com/user-attachments/assets/abfd2f93-692b-4f9d-9182-88d43dba47de
https://github.com/user-attachments/assets/dad6d2d0-371e-4fc2-951a-facb99be5805
https://github.com/user-attachments/assets/671067bc-a2ee-4dcb-98d6-8013620fd828

awkrail · 2025-02-15T02:14:31Z

This leaderboard also describes that the ground-truth in RAI contains imperfections (they are collected in their private benchmark).

Our collection contains videos from popular RAI dataset, videos from MSU codecs comparison 2019 and 2020 test sets, and also videos collected from different sources. Our analysis has shown that groud truth in RAI contains imperfections, which we fixed in our collections

awkrail · 2025-02-15T08:07:23Z

I implemented performance evaluation. The results are computed as recall, precision, F1, and elapsed time (seconds).
The following is the result:

Detector	Recall	Precision	F1	Elapsed time (second)
AdaptiveDetector	7.80	96.18	14.44	25.75
ContentDetector	84.52	88.77	86.59	25.50
HashDetector	8.57	80.27	15.48	23.78
HistogramDetector	8.22	70.82	14.72	18.60
ThresholdDetector	0.00	0.00	0.00	18.95

As you can see, ContentDetector achieves the highest performance among the detectors. However, unfortunately, other detectors acheives poor performance on BBC dataset. It might be necessary to tune parameters, e.g. threshold to obtain better results.

Breakthrough · 2025-02-15T17:22:27Z

Awesome, thanks so much for getting started on this! We definitely have some work to do with tuning the other detectors, but this gives us a nice framework to actually validate those changes. I've approved these changes as-is so you can merge it whenever you would like, but feel free to make more changes if you have any plans. Otherwise feel free to submit a follow-up PR with other changes.

I'm excited to start trying this myself - thank you again for getting this up and running!

awkrail · 2025-02-16T01:27:03Z

@Breakthrough
Thank you for approving this PR. I merge this and then send a new one to evaluate detectors on AutoShots (maybe subset). I don't have a permission to merge this PR, so could you merge it? Thanks.

create benchmark/ directory for pyscenedetect performance evaluation

d7d2851

implemented evaluator on the BBC dataset

61d1cda

Breakthrough self-assigned this Feb 15, 2025

Breakthrough self-requested a review February 15, 2025 17:22

Breakthrough assigned awkrail and unassigned Breakthrough Feb 15, 2025

Breakthrough added the technical item label Feb 15, 2025

Breakthrough linked an issue Feb 15, 2025 that may be closed by this pull request

Performance evaluation of PySceneDetect in terms of both latency and accuracy #481

Open

Breakthrough approved these changes Feb 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmark/WIP] Benchmarking pyscenedetect detectors' performance #484

[benchmark/WIP] Benchmarking pyscenedetect detectors' performance #484

awkrail commented Feb 14, 2025

awkrail commented Feb 14, 2025

awkrail commented Feb 15, 2025

awkrail commented Feb 15, 2025

awkrail commented Feb 15, 2025

Breakthrough commented Feb 15, 2025

awkrail commented Feb 16, 2025 •

edited

Loading

[benchmark/WIP] Benchmarking pyscenedetect detectors' performance #484

Are you sure you want to change the base?

[benchmark/WIP] Benchmarking pyscenedetect detectors' performance #484

Conversation

awkrail commented Feb 14, 2025

awkrail commented Feb 14, 2025

awkrail commented Feb 15, 2025

awkrail commented Feb 15, 2025

awkrail commented Feb 15, 2025

Breakthrough commented Feb 15, 2025

awkrail commented Feb 16, 2025 • edited Loading

awkrail commented Feb 16, 2025 •

edited

Loading