-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[twitter] Shorten title attribute for #29912 #31151
Conversation
@dirkf, I believe I have achieved your request with a little less code. The uploader name is pretty strictly limited by Twitter, so I don't think we need to slice it. That means we can use a more straightforward slice like the one just pushed. Give it a look and let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM, with comments as below. I'll enable the CI tests so that you can check the results.
Co-authored-by: dirkf <[email protected]>
@dirkf, those changes are in. |
The flake errors in the test run appear to be in other files. |
Apparently the default flake8 settings that the project uses changed in some way so that a keyword must be separated from the next token by a space, which is basically good but irritating. Otherwise GH somehow removed some spaces from the code, which seems unlikely and would be much worse. If you update from the current master the 2 places where this caused problems are fixed. |
I think I've done that. |
imo, this should not be implemented.
As I see, changing the default outtmpl to something like Or, if the |
This extractor is unusual in that there is no title in the plain webpage, although a title and an
No. But Twitter was the source of some of the original issues complaining about filename length errors with the default output template, so this is an opportunity to avoid those in future, without expecting users to check for a FAQ that tells them to use Instead of fixing the title, we could generate a warning if the title is "long", or possibly include the FAQ text in the error message if we could reliably detect an exception caused by an over-long filename (a generic "solution" in place of #29989). Changing the default output template could be an alternative but I would be concerned that setting a sufficiently low value to avoid filename length issues would affect filenames on filesystems with no effective limit, attracting a different set of complaints. The extractor could manipulate the output template by inserting a length field into any
I believe that the default output template was chosen on the assumption that Apparently the page title for a tweet set by Twitter has the form (in terms of the extracted fields) Nitter tries to improve Twitter by making a tweet title thus In the browser title bar I see the Twitter title presented with
After all, yt-dl itself is just a bandaid in some sense. What would make for a better implementation? Surely even a harsh critic would call the current implementation at least "adequate"? Footnotes
|
I gave one suggestion above. As you pointed out, that has it's own issues too. (I still think this is a better solution though). If a proper solution was easy, I doubt this issue would have existed for so long From a user perspective, I do agree this is better than nothing. But I also think that it'll become a headache to maintain any kind of consistency b/w extractors over time. And I disagree that the twitter is special in this regard. There are plenty of other extractors that face this issue. This search is a quick way to get some examples That said, since twitter is so popular, I can understand why you'd consider this technical debt to be acceptable. And for the record, while an FAQ entry might help some users, I don't think it will reduce the number of issues opened. The kind of people who actually read the FAQ are the same people who'd look for duplicate issues. |
If you really want to get existential about it, I think the core problem is requiring a title for all sites. As detailed above, there's not a tidy title for a tweet analogous to a YouTube title. So there will never be tidy answer to this problem. That said, making the title field optional seems like a can of worms that isn't worth opening either. Some kind of compromise is necessary and, as documented in the issue that prompted this pull request, the current solution is bad for users. So, IMHO, some alternative is needed. Something along the lines of what's proposed here does seem like the simplest solution. If universality is what you're after, I think enforcing a global trim on all filenames seems like the most natural extension. |
The rationale for
Amen to that. If we can get something like #29989 to work, that would fix the problem for all sites. Meanwhile:
Exactly. The Twitter title field is up for grabs: its extraction shouldn't routinely crash the program. I guess that extractor is used more than all the other possibly problematic extractors together, which are also a small proportion of the ~800 (crikey) we have. And
Absolutely, why I mentioned putting some FAQ text in error message appears, if only we knew reliably what that would be. |
I'm withdrawing this request. |
Please follow the guide below
x
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
This request trims the standard Twitter ID field to workaround the bug reported in #29912