-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix_#1343: Correctly Store YouTube Video Publish Date with Timezone Support #2114
base: main
Are you sure you want to change the base?
Conversation
Hey @hensikavar Thanks for the PR, can you please share an example here where the publication date is properly captured? |
Thanks, @hensikavar! This looks great. Could you share a code example where a YouTube URL is passed in, a search call is made, and the output includes metadata? That would help us see it in action. |
Hey @hensikavar Please run it while being in the |
I have made changes in test_youtube_video.py file to check the specific date format: all the tests are passed and the sample test data is been shown in console: @Dev-Khant this is what you required as example code or anything else? |
If anything else is required please let me know |
Hey @hensikavar Can you please run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the poetry file changes as it's causing merge conflicts
|
||
@register_deserializable | ||
class YoutubeVideoLoader(BaseLoader): | ||
def load_data(self, url): | ||
"""Load data from a Youtube video.""" | ||
video_id = _parse_video_id(url) | ||
|
||
print(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this print line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed that print statement and also restored poetry.lock file
except Exception as e: | ||
logging.warning(f"Failed to parse publishedAt field '{published_at}': {e}") | ||
|
||
print(metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
} | ||
] | ||
print(expected_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
Description
This PR resolves the issue where the publish date of YouTube videos was being stored as YYYY-MM-DD 00:00:00, without the time component. This caused issues when handling time zones. The fix ensures the publish date is stored with the correct time and in ISO 8601 format.
Fixes # (1343)
file: embedchain/loaders/youtube_video.py
Type of change
How Has This Been Tested?
Updated code to convert publishedAt timestamp to ISO 8601 format.
Validated that timestamps are now stored with time and timezone, e.g., 2024-04-15T15:00:00Z.
Manually tested the following cases:
* Correct conversion for UTC (Z suffix) timestamps.
* Conversion for time zone-aware timestamps (e.g., +02:00).
* Ensured any invalid timestamps are logged with a warning.
Checklist:
Maintainer Checklist