-
-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
poor rendering of some incoming emails #23
Comments
It already does use the HTML part if there is one. (See _get_attachment_leaves_recursive in incoming_message.rb, near the comment "Take an HTML one as even higher priority"). For the main part, this is currently then rendered to text using elinks. (see _convert_part_body_to_text, where it says "XXX This is a bit of a hack as it is calling a convert to text routine. Could instead call a sanitize HTML one.". And then the code for HTML in IncomingMessage._get_attachment_text_internal_one_file) It's using this elinks command line: This is relatively new code - it's absolutely true that the plain text part itself is often really bad e.g. not event containing vital hyperlinks. But the conversion via elinks of the HTML part improved that kind of problem loads. So what are the specific requests where this is a problem? It is to do with colour or tables or what? It certainly shouldn't be to do with hyperlinks. The answer is to find a good HTML renderer/sanitiser library and update _convert_part_body_to_text to show more of the HTML. |
A couple of examples of badly-rendered emails would be useful when we come to address this. |
I see, sorry for jumping to conclusions about which part was being rendered! Here's one example of a reply where the user mentioned it to us: http://www.whatdotheyknow.com/request/grit_bins_locations_dates_reason_19#incoming-140959 In the past I've seen others which where hard to read and in some cases where the users complained to the authority about this. I'll try to remember to add them here when I come across more. |
http://www.whatdotheyknow.com/request/power_line_technology_plt#incoming-154812 Here's another request where the response is hard to read, because (a) the different colour and indentation of the response is not shown, and (b) the response is all inline with the quoted message and hidden behind "Show quoted sections". |
There are examples of replies with formatting issues in todo.txt |
Another example where real content is hidden inside the quoted sections: http://www.whatdotheyknow.com/request/quality_risk_profile_south_essex#incoming-189862 |
Example of a .bmp embedded in the html part not showing up as an attachment: http://www.whatdotheyknow.com/request/chronic_disease_in_lambeth?unfold=1#incoming-138418 |
In this case MIME decoding seems to have failed completely: the raw email opens fine in Thundebird. |
There's a specific issue at https://www.whatdotheyknow.com/request/children_and_adult_social_servic#incoming-1253330 Where only a HTML response was provided and Alaveteli's display of links containing spaces in URLs: Perhaps a similar issue to #3400 |
The issue with spaces in a URL causing an issue occurred again at https://www.whatdotheyknow.com/request/fly_tipping_enforcement_in_londo_2#comment-90013 |
More specific issue: Improve/fix HTML rendering of tables #1528 |
A further example which doesn't obviously fit into any of the more specific tickets: |
Oh dear, that response is pretty mangled. It appears that the reply is being generated using the same software we've noted on #5905, so I wonder if there is some commonality between these issues. In any case, the raw email renders as you'd expect in an email client, however its not clear whether or not the MIME encoding in the email is without error. |
Yep - I can confirm its caused by the same issue |
A user contacted us regarding https://www.whatdotheyknow.com/request/secure_email_contracts_23?unfold=1#incoming-1685108 because the inline responses using a different colour are not clearly shown on WhatDoTheyKnow.com |
Noting a corrupted response at https://www.whatdotheyknow.com/request/ipc_grants_11 in this case the responses don't open legibly in Mac Mail when the raw message is downloaded, so they're either corrupt on receipt or they're being mangled at an early stage by the system. |
Another example of a response only being viewable after clicking "show quoted sections" https://www.whatdotheyknow.com/request/milton_keynes_city_status_bid#incoming-1934835 |
Another example of a response only being viewable after clicking "show quoted sections" https://www.whatdotheyknow.com/request/request_for_details_of_parking_i#incoming-1973914 This case is from the same council, and sent via the same system, as the previous one. |
What looks like a one-off case at https://www.whatdotheyknow.com/request/freedom_of_information_request_a_162#incoming-1454731 Is it possible the problem is content which looks like an email address, which would have been redacted, in the header of the part of the email containing an image ?
The system will have replaced that Content-ID element with a note saying the email is redacted? |
All Frimley Health NHS Foundation Trust responses start with "Description:" repeated multiple times https://www.whatdotheyknow.com/request/foi_request_patients_treated_56#incoming-1933094 |
That's a message from December 2021. This email displayed without the repeated "Description:" when opened in the Mail App on OSX. The raw email contains the following:
And the HTML version contains:
The email appears to have been generated by NHS Microsoft systems. There is a further example from the same body at https://www.whatdotheyknow.com/request/ooutsourcing_radiology_imaging#incoming-473633 (from 2014) Googling suggests the issue might be related to a Microsoft Outlook bug where saving an OFT (Outlook File Template) I don't think there's anything Alavetlei should do here, we could point those generating the problematic emails to above link describing the bug. Upgrading or changing email system might be a way for public bodies to prevent this issue. |
Another example of the description spam from this month: https://www.whatdotheyknow.com/request/epr_solutions_26#incoming-1999411 |
The substantive part of the response to this request isn't displayed at all for some reason: https://www.whatdotheyknow.com/request/avaliable_propertties_foi_2000#incoming-1046540 |
Another occurrence of spaces in URLs causing broken links: https://www.whatdotheyknow.com/request/pass_card_guidance#incoming-2117148 |
This issue has been automatically closed due to a lack of discussion or resolution for over 12 months. |
Incoming emails from authorities are often multipart/alternative, and we display the text part. Often the HTML part is much more readable because of the extra formatting. Usually the writers of the emails don't even know or think about the text part and so it can be very hard to read.
Directly inlining the HTML might be problematic because it could play havoc with the layout etc of the request thread, but it would be good to use it somehow.
The text was updated successfully, but these errors were encountered: