Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

- Implemented _parse_date method to convert dateTimeString to a UNIX … #33068

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Asry18
Copy link

@Asry18 Asry18 commented Feb 21, 2025

This PR adds functionality to parse the dateTimeString from events and return a valid timestamp for each event. The following changes have been made:

Implemented _parse_date method: This method parses the dateTimeString (in ISO 8601 format) and converts it into a UNIX timestamp. The timestamp is also adjusted to UTC before being returned.
Modified _real_extract method: The method now includes the parsed timestamp field in the returned event data.
This enhancement ensures that each event's timestamp is accurately parsed, providing consistent date/time information for further processing.

Additional Information:
The dateTimeString field is expected to be in ISO 8601 format (e.g., 2022-01-01T12:00:00-05:00).
The timestamp is now returned as part of the event data, which can be used for further processing or display.
This PR improves the extraction process by adding proper timestamp handling.

@dirkf
Copy link
Contributor

dirkf commented Feb 22, 2025

pytz cannot be included in the project because it's MIT licensed. Therefore a fallback would be needed, which would make pytz irrelevant anyway.

Review unified_timstamp() and parse_iso8601() in utils.py which should address the case of numeric TZ (but perhaps both don't). If you find test-cases (try the ones from yt-dlp first) that fail, please raise an issue with the details. Also, please raise an issue for the extractor problem that this PR was intended to address.

For alphabetic TZ we should compat the zoneinfo module from Py3.9+, so that utils.py can say from ..compat import compat_zoneinfo as zoneinfo or similar.

@dirkf dirkf marked this pull request as draft February 22, 2025 13:20
@Asry18
Copy link
Author

Asry18 commented Feb 22, 2025

I’ve updated the implementation to remove pytz and use zoneinfo (for Python 3.9+) with a compat_zoneinfo fallback for older versions.
Additionally, I’ve integrated unified_timestamp() and parse_iso8601() from utils.py to handle timestamp parsing where applicable. If these fail, the extractor falls back to manual ISO 8601 parsing while ensuring UTC conversion.

@dirkf
Copy link
Contributor

dirkf commented Feb 22, 2025

One approach is just to fall back to not handling alphabetic TZ if zoneinfo is unavailable.

A better but harder one is to borrow and convert the zoneinfo from Py3.9+ into a youtube_dl.zoneinfo module. If we only need the ZoneInfo class it can then be loaded in compat.py:

try:
    from zoneinfo import ZoneInfo as compat_ZoneInfo
except ImportError:
    from .zoneinfo import _ZoneInfo as compat_ZoneInfo

Or the passthrough module mechanism from yt-dlp could be used.

The ZoneInfo class uses the system TZ data, which is less likely to succeed in older systems, or the TZ data from the PyPi tzdata package.

The first problem here is that the tzdata package is under the Apache Licence, which means we can't distribute it either. Users would have to install it themselves.

The second problem is that, although the package is supposedly compatible with Py2 (we care about 2.6 and 2.7), PyPi isn't, because it requires HTTPS features that don't exist in Py2.6, and so pip access to PyPi fails. So Py2.6 users can't install with pip in the normal way, although it's still possible to install from a copy of the package wheel stored locally or on a less fussy network server: perhaps it would be acceptable to proxy the wheel in a separate repo forked from the tzdata repo.

@Asry18
Copy link
Author

Asry18 commented Feb 22, 2025

Thanks for the feedback! I've removed pytz and replaced it with zoneinfo (Python 3.9+). If zoneinfo is unavailable, the code falls back to ignoring alphabetic time zones. Numeric time zones are still parsed correctly.

@Asry18 Asry18 marked this pull request as ready for review February 23, 2025 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants