Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poor rendering of some incoming emails #23

Closed
hsenag opened this issue Jan 15, 2011 · 26 comments
Closed

poor rendering of some incoming emails #23

hsenag opened this issue Jan 15, 2011 · 26 comments
Labels
f:request-analysis improvement Improves existing functionality (UI tweaks, refactoring, performance, etc) stale Issues with no activity for 12 months user-experience

Comments

@hsenag
Copy link
Collaborator

hsenag commented Jan 15, 2011

Incoming emails from authorities are often multipart/alternative, and we display the text part. Often the HTML part is much more readable because of the extra formatting. Usually the writers of the emails don't even know or think about the text part and so it can be very hard to read.

Directly inlining the HTML might be problematic because it could play havoc with the layout etc of the request thread, but it would be good to use it somehow.

@frabcus
Copy link

frabcus commented Jan 16, 2011

It already does use the HTML part if there is one. (See _get_attachment_leaves_recursive in incoming_message.rb, near the comment "Take an HTML one as even higher priority").

For the main part, this is currently then rendered to text using elinks. (see _convert_part_body_to_text, where it says "XXX This is a bit of a hack as it is calling a convert to text routine. Could instead call a sanitize HTML one.". And then the code for HTML in IncomingMessage._get_attachment_text_internal_one_file)

It's using this elinks command line:
IO.popen("/usr/bin/elinks -dump-charset utf-8 -force-html -dump " + tempfile.path, "r") do |child|

This is relatively new code - it's absolutely true that the plain text part itself is often really bad e.g. not event containing vital hyperlinks. But the conversion via elinks of the HTML part improved that kind of problem loads.

So what are the specific requests where this is a problem? It is to do with colour or tables or what? It certainly shouldn't be to do with hyperlinks.

The answer is to find a good HTML renderer/sanitiser library and update _convert_part_body_to_text to show more of the HTML.

@sebbacon
Copy link
Contributor

A couple of examples of badly-rendered emails would be useful when we come to address this.

@hsenag
Copy link
Collaborator Author

hsenag commented Jan 16, 2011

I see, sorry for jumping to conclusions about which part was being rendered!

Here's one example of a reply where the user mentioned it to us: http://www.whatdotheyknow.com/request/grit_bins_locations_dates_reason_19#incoming-140959

In the past I've seen others which where hard to read and in some cases where the users complained to the authority about this. I'll try to remember to add them here when I come across more.

@hsenag
Copy link
Collaborator Author

hsenag commented Mar 5, 2011

http://www.whatdotheyknow.com/request/power_line_technology_plt#incoming-154812

Here's another request where the response is hard to read, because (a) the different colour and indentation of the response is not shown, and (b) the response is all inline with the quoted message and hidden behind "Show quoted sections".

@skenaja
Copy link
Collaborator

skenaja commented Mar 5, 2011

There are examples of replies with formatting issues in todo.txt

https://github.com/sebbacon/alaveteli/blob/master/todo.txt

@skenaja
Copy link
Collaborator

skenaja commented Jun 25, 2011

@hsenag
Copy link
Collaborator Author

hsenag commented Jul 5, 2011

Another example where real content is hidden inside the quoted sections: http://www.whatdotheyknow.com/request/quality_risk_profile_south_essex#incoming-189862

@skenaja
Copy link
Collaborator

skenaja commented Jul 15, 2011

Example of a .bmp embedded in the html part not showing up as an attachment:

http://www.whatdotheyknow.com/request/chronic_disease_in_lambeth?unfold=1#incoming-138418

@hsenag
Copy link
Collaborator Author

hsenag commented Jun 7, 2013

In this case MIME decoding seems to have failed completely:
https://www.whatdotheyknow.com/request/pct_contacts_and_gp_systems_3#incoming-396756

the raw email opens fine in Thundebird.

@garethrees garethrees added f:request-analysis improvement Improves existing functionality (UI tweaks, refactoring, performance, etc) labels May 29, 2018
@RichardTaylor
Copy link

RichardTaylor commented Oct 22, 2018

There's a specific issue at

https://www.whatdotheyknow.com/request/children_and_adult_social_servic#incoming-1253330

Where only a HTML response was provided and Alaveteli's display of links containing spaces in URLs:
(" " not "%20")
resulting in the display of broken links.

Perhaps a similar issue to #3400

@RichardTaylor
Copy link

The issue with spaces in a URL causing an issue occurred again at

https://www.whatdotheyknow.com/request/fly_tipping_enforcement_in_londo_2#comment-90013

@RichardTaylor
Copy link

More specific issue: Improve/fix HTML rendering of tables #1528

@RichardTaylor
Copy link

A further example which doesn't obviously fit into any of the more specific tickets:
https://www.whatdotheyknow.com/request/missed_bin_collections_data_7#incoming-1659864

@mdeuk
Copy link
Collaborator

mdeuk commented Oct 27, 2020

A further example which doesn't obviously fit into any of the more specific tickets:
https://www.whatdotheyknow.com/request/missed_bin_collections_data_7#incoming-1659864

Oh dear, that response is pretty mangled. It appears that the reply is being generated using the same software we've noted on #5905, so I wonder if there is some commonality between these issues.

In any case, the raw email renders as you'd expect in an email client, however its not clear whether or not the MIME encoding in the email is without error.

@gbp
Copy link
Member

gbp commented Oct 27, 2020

It appears that the reply is being generated using the same software we've noted on #5905, so I wonder if there is some commonality between these issues.

Yep - I can confirm its caused by the same issue

@MattK1234
Copy link
Collaborator

A user contacted us regarding https://www.whatdotheyknow.com/request/secure_email_contracts_23?unfold=1#incoming-1685108 because the inline responses using a different colour are not clearly shown on WhatDoTheyKnow.com

@RichardTaylor
Copy link

Noting a corrupted response at

https://www.whatdotheyknow.com/request/ipc_grants_11

in this case the responses don't open legibly in Mac Mail when the raw message is downloaded, so they're either corrupt on receipt or they're being mangled at an early stage by the system.

@RichardTaylor
Copy link

Another example of a response only being viewable after clicking "show quoted sections"

https://www.whatdotheyknow.com/request/milton_keynes_city_status_bid#incoming-1934835

@RichardTaylor
Copy link

Another example of a response only being viewable after clicking "show quoted sections"

https://www.whatdotheyknow.com/request/request_for_details_of_parking_i#incoming-1973914

This case is from the same council, and sent via the same system, as the previous one.

@RichardTaylor
Copy link

What looks like a one-off case at

https://www.whatdotheyknow.com/request/freedom_of_information_request_a_162#incoming-1454731

Is it possible the problem is content which looks like an email address, which would have been redacted, in the header of the part of the email containing an image ?


"--_005_CWXP265MB1797D3B3C4A15889D320B57CA6680CWXP265MB1797GBRP_
Content-Type: image/jpeg; name="119102214405700977.jpg"
Content-Disposition: inline; filename="119102214405700977.jpg"
Content-Id: <[email protected]>
Content-Transfer-Encoding: base64"

The system will have replaced that Content-ID element with a note saying the email is redacted?

@FOIMonkey
Copy link
Collaborator

All Frimley Health NHS Foundation Trust responses start with "Description:" repeated multiple times https://www.whatdotheyknow.com/request/foi_request_patients_treated_56#incoming-1933094

@RichardTaylor
Copy link

All Frimley Health NHS Foundation Trust responses start with "Description:" repeated multiple times https://www.whatdotheyknow.com/request/foi_request_patients_treated_56#incoming-1933094

That's a message from December 2021.

This email displayed without the repeated "Description:" when opened in the Mail App on OSX. The raw email contains the following:

--_000_LO2P265MB5546B640EC9FCD7027BB944D95719LO2P265MB5546GBRP_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

[Description: Description: Description: Description: Description: Descripti=
on: Description: Description: Description: Description: Description: Descri=
ption: Description: Description: Description: Description: Description: Des=
cription: Description: Description: Description: Description: Description: =
Description: Description: Description: Description: Description: Descriptio=
n: Description: Description: Description: Description: Description: Descrip=
tion: Description: Description: Description: Description: Description: Desc=
ription: Description: Description: Description: Description: Description: D=
escription: Description: Description: Description: Description: Description=
: Description: Description: Description: Description: Description: Descript=
ion: Description: Description: Description: Description: Description: Descr=
iption: Description: Description: Description: Description: Description: De=
scription: Description: Description: Description: Description: Description:=
 Description: Description: Description: Description: Description: Descripti=
on: Description: Description: Description: Description: Frimley Health FT c=
ol (3)]

Dear Requester

And the HTML version contains:


<body lang=3D"EN-GB" link=3D"blue" vlink=3D"purple" style=3D"word-wrap:brea=
k-word">
<div class=3D"WordSection1">
<p align=3D"right" style=3D"text-align:right"><span style=3D"font-size:10.0=
pt;font-family:&quot;Arial&quot;,sans-serif"><img width=3D"381" height=3D"7=
6" style=3D"width:3.9687in;height:.7916in" id=3D"Picture_x0020_2" src=3D"ci=
d:[email protected]" alt=3D"Description: Description: Descript=
ion: Description: Description: Description: Description: Description: Descr=
iption: Description: Description: Description: Description: Description: De=
scription: Description: Description: Description: Description: Description:=
 Description: Description: Description: Description: Description: Descripti=
on: Description: Description: Description: Description: Description: Descri=
ption: Description: Description: Description: Description: Description: Des=
cription: Description: Description: Description: Description: Description: =
Description: Description: Description: Description: Description: Descriptio=
n: Description: Description: Description: Description: Description: Descrip=
tion: Description: Description: Description: Description: Description: Desc=
ription: Description: Description: Description: Description: Description: D=
escription: Description: Description: Description: Description: Description=
: Description: Description: Description: Description: Description: Descript=
ion: Description: Description: Description: Description: Description: Descr=
iption: Description: Frimley Health FT col (3)"></span><span style=3D"font-=
size:10.0pt;font-family:&quot;Arial&quot;,sans-serif"><o:p></o:p></span></p=
>

The email appears to have been generated by NHS Microsoft systems.

There is a further example from the same body at

https://www.whatdotheyknow.com/request/ooutsourcing_radiology_imaging#incoming-473633 (from 2014)

Googling suggests the issue might be related to a Microsoft Outlook bug where saving an OFT (Outlook File Template)

https://answers.microsoft.com/en-us/outlook_com/forum/all/outlook-2010-picture-alt-text/59aac086-8ac0-4afb-83e2-ca765b8e8bab

I don't think there's anything Alavetlei should do here, we could point those generating the problematic emails to above link describing the bug. Upgrading or changing email system might be a way for public bodies to prevent this issue.

@FOIMonkey
Copy link
Collaborator

Another example of the description spam from this month: https://www.whatdotheyknow.com/request/epr_solutions_26#incoming-1999411
It does look like the outlook bug is to blame. Other authorities are also affected to a much lesser extent eg https://www.whatdotheyknow.com/request/information_on_facilities_manage_188#incoming-2002336 and https://www.whatdotheyknow.com/request/invoices_for_sp_beautiful_brows#incoming-2002002 from today.

@FOIMonkey
Copy link
Collaborator

The substantive part of the response to this request isn't displayed at all for some reason: https://www.whatdotheyknow.com/request/avaliable_propertties_foi_2000#incoming-1046540
2022-04-11
2022-04-11 (1)

@FOIMonkey
Copy link
Collaborator

Another occurrence of spaces in URLs causing broken links: https://www.whatdotheyknow.com/request/pass_card_guidance#incoming-2117148

@HelenWDTK
Copy link
Contributor

This issue has been automatically closed due to a lack of discussion or resolution for over 12 months.
Should we decide to revisit this issue in the future, it can be reopened.

@HelenWDTK HelenWDTK closed this as not planned Won't fix, can't repro, duplicate, stale Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
f:request-analysis improvement Improves existing functionality (UI tweaks, refactoring, performance, etc) stale Issues with no activity for 12 months user-experience
Projects
None yet
Development

No branches or pull requests