Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix_#1343: Correctly Store YouTube Video Publish Date with Timezone Support #2114

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

hensikavar
Copy link

Description

This PR resolves the issue where the publish date of YouTube videos was being stored as YYYY-MM-DD 00:00:00, without the time component. This caused issues when handling time zones. The fix ensures the publish date is stored with the correct time and in ISO 8601 format.

Fixes # (1343)
file: embedchain/loaders/youtube_video.py

  • Updated the code to properly handle the publishedAt field from the metadata.
  • Ensured that the publishedAt timestamp is converted to an ISO 8601 format (e.g., 2024-04-15T14:30:00Z) to correctly store both the date and time, considering UTC timestamps.
  • Introduced logic to replace the Z suffix with +00:00 for UTC timestamps, ensuring that the time zone is properly handled.
  • Added error logging to capture any parsing issues with the publishedAt field.
  • imported datetime

Type of change

  • [-] Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Updated code to convert publishedAt timestamp to ISO 8601 format.
Validated that timestamps are now stored with time and timezone, e.g., 2024-04-15T15:00:00Z.
Manually tested the following cases:
* Correct conversion for UTC (Z suffix) timestamps.
* Conversion for time zone-aware timestamps (e.g., +02:00).
* Ensured any invalid timestamps are logged with a warning.

Checklist:

  • [-] My code follows the style guidelines of this project
  • [-] I have performed a self-review of my own code
  • [-] I have checked my code and corrected any misspellings

Maintainer Checklist

@CLAassistant
Copy link

CLAassistant commented Dec 25, 2024

CLA assistant check
All committers have signed the CLA.

@Dev-Khant
Copy link
Member

Hey @hensikavar Thanks for the PR, can you please share an example here where the publication date is properly captured?

@hensikavar
Copy link
Author

Changes in file: mem0/embedchain/embedchain/youtube_video

dateformatconversion

I have converted a publish date which is coming in ISO format and the data will be stored in proper format in metadata.

@Dev-Khant
Copy link
Member

Thanks, @hensikavar! This looks great. Could you share a code example where a YouTube URL is passed in, a search call is made, and the output includes metadata? That would help us see it in action.

@hensikavar
Copy link
Author

on running test_youtube_video.py module not found error is occurring:
{506974B0-DD5E-4238-9091-B6C5A7BF5FF6}

@Dev-Khant
Copy link
Member

Hey @hensikavar Please run it while being in the \embedchain directory. And the command would look like poetry run pytest tests\loaders\test_youtube_video.py.

@hensikavar
Copy link
Author

I have made changes in test_youtube_video.py file to check the specific date format:

{3EF0D229-B4E5-4E65-A3D2-DCFF1C841AAA}

all the tests are passed and the sample test data is been shown in console:

{8619552F-DB5B-41A2-BAFD-DF9E1228BC43}

@Dev-Khant this is what you required as example code or anything else?

@hensikavar
Copy link
Author

If anything else is required please let me know

@Dev-Khant
Copy link
Member

Hey @hensikavar Can you please run poetry lock [--no-update] because tests are failing? Thanks.

Copy link
Member

@Dev-Khant Dev-Khant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the poetry file changes as it's causing merge conflicts


@register_deserializable
class YoutubeVideoLoader(BaseLoader):
def load_data(self, url):
"""Load data from a Youtube video."""
video_id = _parse_video_id(url)

print(url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this print line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed that print statement and also restored poetry.lock file

except Exception as e:
logging.warning(f"Failed to parse publishedAt field '{published_at}': {e}")

print(metadata)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

}
]
print(expected_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to capture precise publication time for YouTube video metadata
3 participants