-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to capture precise publication time for YouTube video metadata #1343
Comments
Can I work on this? |
@Dev-Khant Has the migration from pytube to yt-dlp been merged? Because if so, yt-dlp returns metadata such as publication time and this will be a very easy fix then. I could do it |
No sorry it has not been merged yet. But please feel free to open a PR for this using yt-dlp. |
@Dev-Khant, I was writing the script for yt-dlp migration and realized that although the youtube dataloader does return meta data for videos, the LLM does not seem to have access to the video metadata. It only seems to have access to the video context. I ran these 4 queries: I was wondering if this is by design or an issue? |
@MoizKhuzema If you use So it's by design that you get metadata separately. |
Understood, thanks |
@Dev-Khant is this issue open for contribution? |
@MoizKhuzema Please let us know if you are working on this or else @02shanks would like to pick this up. Thanks. |
I would like to pick this up as I see it's still open. |
@Dev-Khant, can i pick this up? |
@burnerlee Sure feel free to work on this! |
@Dev-Khant may i pick up this issue? |
🐛 Describe the bug
Embedchain currently captures the publish date of YouTube videos but omits the precise publication time.
The metadata stored for each video includes the date in the format YYYY-MM-DD 00:00:00, which defaults to a time of 00:00:00, indicating that the time component is not being processed or stored.The issue with just the date is because of time zones.
Example of the issue:
For a video published at 3 PM on April 15, 2024, Embedchain stores the publish date as 2024-04-15 00:00:00
Whereas it should be in ISO 8601 format ( example: 2024-04-15T14:30:00Z )
This format allows the publication time to be used globally without confusion about time zone differences.
The text was updated successfully, but these errors were encountered: