Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subtitles appear earlier in specific cases #113

Open
Lod3 opened this issue Dec 18, 2024 · 8 comments
Open

Subtitles appear earlier in specific cases #113

Lod3 opened this issue Dec 18, 2024 · 8 comments

Comments

@Lod3
Copy link

Lod3 commented Dec 18, 2024

So I tested the subtitle feature on a italian movie that was released with non english subtitles only, the accuracy is good, surprisingly good. But the first sentence of the subtitle appears at the beginning of the movie during the intro.
And in other parts of the movie where there is a long pause, it shows the upcoming subtitle way to soon.

This only happens when there is a long part of silence before a spoken line. It is back in sync afterwards.

Not sure how to describe this better sorry if this unclear.
Let me know if I can help with clarifying.

@kaixxx
Copy link
Owner

kaixxx commented Dec 18, 2024

That's an interesting observation. I have a suspicion what might be the culprit, but have to look deeper into this. Is the movie publicly available? Would be nice to have a good test case.

@Lod3
Copy link
Author

Lod3 commented Dec 19, 2024

Good news that you already have a suspicion.
The movie I tested it with was Ariaferma.
It would be possible to test this with any film I think. provided there are long pauses before a line comes up.
I can send send you the vtt file if you want?

@kaixxx
Copy link
Owner

kaixxx commented Dec 19, 2024

Let me first try to reproduce the issue with my own test files.

@kaixxx
Copy link
Owner

kaixxx commented Dec 19, 2024

Ok, I have tried, but I cannot reproduce the issue.
My suspicion was that the time of the pause (segment without speech) would be added to the next section of text in the vtt file, but that is not the case. A pause is simply a gap in between time codes, like so:

2
00:00:07.920 --> 00:00:10.380
<v S00>some text

3
00:00:11.080 --> 00:00:11.980
<v S00>text after pause

The pause in this case would be between the two segments, so between 00:00:10.380 and 00:00:11.080, which is less than a second in this example. But it should work the same way for longer pauses.

Can you check in your vtt file if and how pauses are marked there?

@Lod3
Copy link
Author

Lod3 commented Dec 20, 2024

NOTE
Transcribed with noScribe vers. 0.5
Audio file: Z:/movies/Ariaferma (2021)/Ariaferma (2021).mkv
(Start (hh:mm:ss): 00:00:00 | Quality: precise | Language: en | Speaker detection: auto | Overlapping speech: 1 | Timestamps: 1 | Mark pause: 0)


NOTE media: Z:/movies/Ariaferma (2021)/Ariaferma (2021).mkv

1
00:01:41.610 --> 00:02:15.670
<v S21>When I was little, I used to go hunting with my older brother.

Not sure to be honest

@kaixxx
Copy link
Owner

kaixxx commented Dec 20, 2024

What I can tell already:

the first sentence of the subtitle appears at the beginning of the movie during the intro.

This must be an error with your software showing the subtitles, because the first segment of text in your example clearly starts at 00:01:41.610 (so, about 1 minute and 40 seconds in).

Could you post an example which is a bit longer? The pauses will be between the segments...

@Lod3
Copy link
Author

Lod3 commented Dec 20, 2024

I tried it with Plex and VLC, in both the first line show up early.

The first entry "00:01:41.610 --> 00:02:15.670"
Means it shows on screen from until, so very long for the line "When I was little, I used to go hunting with my older brother."

I dont see any markings.
There were as for as I saw 2 moments when checking the movies with the generated subtitles were it would show up early.
I will watch the movie completely soon and check.
Ariaferma (2021).en.vtt.txt

@kaixxx
Copy link
Owner

kaixxx commented Dec 20, 2024

Thank you. I've tried your vtt file with a random video in VLC, and the first line "When I was little..." appeared exactly at 1 minutes and 41 seconds, so the vtt file seems to work as intended. I don't have access to the original movie. Is this too early?

Indeed, the first line stays on screen very long, but that might be a different problem. noScribe is not really optimized for subtitles.

I would suggest that you turn off the speaker detection, since the speaker is not shown in the subtitles anyway (at least not in VLC). This will save you about half the transcription time and might even give you a better timing of your subtitles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants