Improve/fix HTML rendering of tables #1528

TomSteinberg · 2014-05-22T10:21:37Z

e.g http://www.whatdotheyknow.com/request/it_support_services_1295#incoming-258044
http://www.whatdotheyknow.com/request/it_support_services_347#incoming-258014
http://www.whatdotheyknow.com/request/it_support_services_1236#incoming-257000

taken from #18 by @hsenag

garethrees · 2014-11-05T10:10:57Z

At the moment the conversion doesn't actually render the content in to a html table, so its a conversion issue rather than simply adding some styling.

hsenag · 2015-01-06T06:18:19Z

Another example: https://www.whatdotheyknow.com/request/mental_health_services_4#incoming-601920

The original email has a nice looking HTML table.

tomchance · 2015-10-12T15:52:36Z

I would really welcome this fix/improvement. I submit a fair few FOI requests where I end up with incomprehensible tables. For example:

https://www.whatdotheyknow.com/request/homeless_due_to_end_of_private_t_19#incoming-713565

As a result I usually write a reply asking that they email me the contents directly, which means others can't access the data if they happen across the request on the web site.

RichardTaylor · 2017-07-24T13:09:18Z

Another example:

https://www.whatdotheyknow.com/request/number_children_home_educated_wi

What WhatDoTheyKnow shows:

Summary of Elective Home Education Cases

Academic Year

2014/15 2015/16 2016/17

SEN All From From All From From All From From
status Cases mainstream Special Cases mainstream Special Cases mainstream Special
Statement 10 <10 < 5 11 6 <5 9 5 <5
(1)
EHCP (2) 6 <5 < 5 6 6 0 8 7 0
SEN
Support 67 63 0 81 77 0 102 99 0
(3)
All SEN 83 72 < 5 98 89 <5 119 111 <5
(4, 5)

Table in HTML:

In this case we offered the user an image of the table and may upload the HTML version of the response and link it from an annotation:

https://twitter.com/WhatDoTheyKnow/status/889469208383418368

RichardTaylor · 2018-04-05T14:29:13Z

Adding another example where a table provided in HTML format wasn't legible
https://www.whatdotheyknow.com/request/housing_register_8?unfold=1#incoming-1135657
in that case we put the tables in a Google spreadsheet and linked to it.

Also noting #4003 is a related, broader, ticket for preserving all HTML formatting in response emails.

RichardTaylor · 2018-07-04T18:18:36Z

Another example:
https://www.whatdotheyknow.com/request/requests_for_reception_start_at

garethrees · 2018-10-12T20:41:57Z

https://blog.socialcops.com/technology/engineering/camelot-python-library-pdf-data/

lizconlan · 2018-10-17T10:04:02Z

https://blog.socialcops.com/technology/engineering/camelot-python-library-pdf-data/

Had a quick play with that last night (used an older laptop so made the path to install harder for myself than it needed to be) and it looks interesting. I picked a random PDF attachment with a table in it (the first one I stumbled across) and it made a reasonable job of it, reducing the entire message to just the tabular content detail here.

(But, as far as I can tell, it's only useful for tables stuck inside PDFs)

garethrees · 2018-11-24T13:52:12Z

https://github.com/adworse/iguvium – Ruby gem for extracting tables from PDF as a structured info

mikejamesthompson · 2019-06-25T08:39:48Z

I've come across this recently with tables of data pasted into response emails and then rendered unintelligible by the conversion.

It looks as if most of the examples above are HTML tables that get mangled, so that seems like a good place to focus. Parsing tables out of PDFs is a whole separate challenge in itself (I've been doing this a lot recently ...) and there are plenty of products/libraries out there that (attempt to) do this - Camelot, PDFTables.com, Tabula etc.

Is there any reason you couldn't give people access to the raw response? That would give people the ability to copy-paste the table direct into a spreadsheet or just to eyeball it.

RichardTaylor · 2019-10-25T12:28:19Z

+1 see issue at:

https://www.whatdotheyknow.com/request/july_2019_discretionary_deferral_17#comment-89964

RichardTaylor · 2019-11-01T13:52:48Z

+1 another example at

https://www.whatdotheyknow.com/request/tenancies_ended_by_death_of_the#incoming-1455601

RichardTaylor · 2019-11-01T13:53:07Z

A WhatDoTheyKnow user writes:

A problem which I frequently encounter is that figures are supplied by the respondent as tables.

These are scrambled when they appear on WDTK (titles are shown detached from the figures), which is usually decipherable, but which occasionally necessitates a further clarification request.

Could you find a way to show the tables on WDTK as received (presumably received in the body of an email)?

garethrees · 2020-01-22T12:10:34Z

https://nanonets.com/blog/table-extraction-deep-learning/

MattK1234 · 2020-04-05T12:20:09Z

+1 from me following the poor formatting of the table at https://www.whatdotheyknow.com/request/potholes_rights_of_way_maintenan

The WDTK admin team have provided the user with a copy of the raw email.

RichardTaylor · 2020-10-18T23:06:07Z

+1
https://www.whatdotheyknow.com/request/minimum_maximum_and_median_fpas#incoming-1640619

The data has been copied to a Google sheet linked from an annotation.

RichardTaylor · 2021-03-15T14:13:01Z

+1

https://www.whatdotheyknow.com/request/details_of_current_housing_stock#incoming-1743561

RichardTaylor · 2021-08-07T18:13:11Z

+1 https://www.whatdotheyknow.com/request/healthcare_worker_accommodation_6#incoming-1848176

garethrees · 2021-11-02T09:41:10Z

Sometimes we manually convert these for pro users on request. Process is:

Download raw email > open in apple mail > copy and paste into text edit (to preserve rich text formatting) > make any redactions > print as pdf > upload file > link in annotation.

RichardTaylor · 2021-12-18T12:49:05Z

+1

https://www.whatdotheyknow.com/request/vser_payments_teaching_staff#incoming-1938493

Data supplied to user by email, and made available via Google Sheets

FOIMonkey · 2022-03-23T21:09:27Z

+1 The response here is almost illegible
https://www.whatdotheyknow.com/request/property_and_assets_and_building_291#incoming-1221782

garethrees · 2022-04-06T11:01:42Z

Wondering whether we can convert the main body HTML part to PDF and add it as an "attachment" or similar, so that:

We still render plain text by default
We don't have to sanitise and render random HTML
Users have better access to the underlying email presentation

Would have to consider possible extra work when it comes to hiding and censor rules though.

RichardTaylor · 2022-04-27T14:04:32Z

Further example

https://www.whatdotheyknow.com/request/deaths_and_hospital_admissions_f_117#incoming-1943749

WilliamWDTK · 2022-06-01T20:01:56Z

This issue (alongside, perhaps, #4578) has actually been cited in an FOI response:

I have attached a PDF document with my full response. This includes some tables and hyperlinks that may not otherwise be supported by the WhatDoTheyKnow website.

https://www.whatdotheyknow.com/request/docs_21#incoming-1517708

WilliamWDTK · 2022-06-06T11:34:11Z

Another example on this request.

The table in the raw email actually looks like this:

	Officers	Staff
	Total	Economic Crime
2013	2083.38	13.8
2014	1955.04	13.8
2015	1927.24	15.8
2016	2068.73	10.8
2017	2056.54	10.8
2018	1974.72	11.8
2019	1944.71	10.8
2020	2145.17	10.68
2021	2240.64	9.88
2022	2328.23	11.73

RichardTaylor · 2022-06-14T13:32:53Z

Further example at https://www.whatdotheyknow.com/request/cumulative_amounts_owed_to_the_c#incoming-1776516

ajparsons · 2022-06-28T16:38:46Z

Just to add to the above, this seems like one of the few areas WhatDoTheyKnow is less useful than the user just sending an email.

Is making the original email available as a download just to the requester an option?

RichardTaylor · 2022-06-28T17:47:02Z

Classifying requests on WhatDoTheyKnow reveals lots more examples that we're not logging or acting on; often the lack of formatting doesn't prevent accessing the information with care and effort, but it does make it less easy to read than intended.

RichardTaylor · 2022-06-28T17:52:44Z

Is making the original email available as a download just to the requester an option?

If we did this for all incoming messages it would become another thing to check to see if redactions had worked on when putting censor rules in place.

We probably shouldn't do it retrospectively, not without careful consideration, as we might end up publishing lots of new material.

We could do it on a per-request basis, but that wouldn't help as significantly.

Could we detect a table, and present it in HTML form at the end of a message in a similar way as we extract some links/references and present them at the end?

ajparsons · 2022-06-28T18:22:27Z

Just for ref, the problem I was looking at was a request where some cells were blank, making it hard to easily read or convert the table when it got flattened. Automated approaches might move values around in this case so would need to be extracted from the html version rather than the text (but that's probably possible?)

Could extract all table elements and make new attachments from them as a reduced version of Gareth's pdf idea above.

A recurring problem seems to be that pulling content out of the html is creating extra problems for redaction. Could just have an approach of deleting any derived attachments if a change is applied as a way of balancing the big benefit in many cases with tables? Those requests would just end up like all requests are at the moment.

Email downloads might not be the right answer in this case, but do censor rules matter as much if just available to the original requester? Feel like giving people the option of all the content they'd have got by email is good. This is especially in terms of the case for pro vs email. But can take that to another issue.

RichardTaylor · 2022-06-28T18:46:08Z

I think we want to solve this issue for all readers, not specifically requesters.

If we manually act in these cases we don't just provide the readable table to the requester we also publish it.

Making the material available to the requester easily/automatically might stop them contacting us, prompting us to make the material available to all. It might make the issue from the point of view of the non-requester reader, worse.

Could extract all table elements and make new attachments from them as a reduced version of Gareth's pdf idea above.

I think that's very similar to the proposal of extracting the table and presenting it on the request page. A separate page / "generated attachment" might be preferable though due to the risk of formatting being disrupted by a large table. As with "view as HTML" we'd want a header linking back to the request page.

FOIMonkey · 2022-06-28T19:03:32Z

but do censor rules matter as much if just available to the original requester?

Yes. Though not the main use, sometimes they are in place to keep information from the requester. We would have to be extremely careful.

FOIMonkey · 2022-09-22T07:08:02Z

Another example here: https://www.whatdotheyknow.com/request/scaffolding_contractors_used_by#incoming-2126200

The FOI officer noticed and provided the data as a spreadsheet.

FOIMonkey · 2022-09-22T10:33:27Z

+1 This is almost unreadable:

https://www.whatdotheyknow.com/request/sexual_orientation_workforce_dat_113#incoming-1850225

RichardTaylor · 2022-11-11T12:13:57Z

+1 Another example https://www.whatdotheyknow.com/request/insider_trading_action_taken_by#incoming-2161478

HelenWDTK · 2023-04-26T12:25:32Z

+1 An authority have contacted us because they are unhappy about the way that a response that they have provided is displaying. The table in the email got mangled by the system.

WilliamWDTK · 2023-07-28T16:35:29Z

Here is another example where it is difficult to read. Here, the pipe separators that you sometimes see aren't present:

https://www.whatdotheyknow.com/request/availability_of_sober_accommodat_34#incoming-1938694

The formatted email appears as below:

garethrees · 2024-04-02T09:23:04Z

In the admin view for the main body part (#7999) I used simple_format, which is doing a good enough job at rendering these in most cases I've looked at (haven't looked at loads, so might want to take a slightly larger sample before adding).

e.g. https://www.whatdotheyknow.com/request/foi_request_modern_slavery_train_13#incoming-2502777 → https://www.whatdotheyknow.com/admin/attachments/4466504/edit

HelenWDTK · 2024-05-29T09:09:40Z

+1 An authority has contacted us to complain that our mangling of their response has damaged the reputation of the information governance team.

WilliamWDTK · 2024-08-18T20:03:03Z

This also affects responses from Welsh bodies, which often have bilingual signatures using tables/similar, ending up like this:

Original:

Admin view:

HelenWDTK · 2024-10-03T10:28:47Z

+1 This continues to be a frequent source of user support. It must be particulalry frustrating for pro users to have to write to us again and again about multiple issues in the same batch when they are paying for the service.

garethrees · 2024-11-22T12:41:19Z

This is desirable, but unlikely to be worked on in the next 12 months so closing for now.

TomSteinberg mentioned this issue May 22, 2014

Improve document conversion #18

Closed

crowbot mentioned this issue May 22, 2014

Mangled response messages #913

Closed

garethrees added user-experience t:design framework improvement labels Nov 5, 2014

crowbot mentioned this issue Jun 4, 2015

Formatting Tabular Information in Responses #2518

Closed

RichardTaylor mentioned this issue Oct 17, 2016

Enable better formatting of the text of requests #3547

Closed

crowbot added the 0 - backlog label Sep 21, 2017

RichardTaylor mentioned this issue Mar 22, 2018

Reconsider presenting links in responses in "reference-style" #4578

Closed

garethrees added enhancement Adds new functionality f:request-analysis labels May 29, 2018

RichardTaylor mentioned this issue Nov 1, 2019

poor rendering of some incoming emails #23

Closed

garethrees mentioned this issue Nov 1, 2021

Show table excerpts, and/or graphs from data released #6625

Closed

garethrees removed the backlog label Feb 11, 2022

garethrees removed the framework-improvement label Jul 14, 2022

RichardTaylor mentioned this issue Feb 2, 2023

Reduce Gold-Standard of site-admin work mysociety/whatdotheyknow-theme#1253

Closed

garethrees closed this as not planned Won't fix, can't repro, duplicate, stale Nov 22, 2024

Improve/fix HTML rendering of tables #1528

Improve/fix HTML rendering of tables #1528

Comments

TomSteinberg commented May 22, 2014

garethrees commented Nov 5, 2014

hsenag commented Jan 6, 2015

tomchance commented Oct 12, 2015

RichardTaylor commented Jul 24, 2017

RichardTaylor commented Apr 5, 2018

RichardTaylor commented Jul 4, 2018

garethrees commented Oct 12, 2018

lizconlan commented Oct 17, 2018

garethrees commented Nov 24, 2018

mikejamesthompson commented Jun 25, 2019

RichardTaylor commented Oct 25, 2019

RichardTaylor commented Nov 1, 2019

RichardTaylor commented Nov 1, 2019

garethrees commented Jan 22, 2020

MattK1234 commented Apr 5, 2020 • edited Loading

RichardTaylor commented Oct 18, 2020

RichardTaylor commented Mar 15, 2021

RichardTaylor commented Aug 7, 2021

garethrees commented Nov 2, 2021 • edited Loading

RichardTaylor commented Dec 18, 2021

FOIMonkey commented Mar 23, 2022

garethrees commented Apr 6, 2022

RichardTaylor commented Apr 27, 2022

WilliamWDTK commented Jun 1, 2022

WilliamWDTK commented Jun 6, 2022

RichardTaylor commented Jun 14, 2022

ajparsons commented Jun 28, 2022

RichardTaylor commented Jun 28, 2022

RichardTaylor commented Jun 28, 2022

ajparsons commented Jun 28, 2022

RichardTaylor commented Jun 28, 2022

FOIMonkey commented Jun 28, 2022

FOIMonkey commented Sep 22, 2022

FOIMonkey commented Sep 22, 2022

RichardTaylor commented Nov 11, 2022

HelenWDTK commented Apr 26, 2023

WilliamWDTK commented Jul 28, 2023

garethrees commented Apr 2, 2024

HelenWDTK commented May 29, 2024

WilliamWDTK commented Aug 18, 2024 • edited Loading

HelenWDTK commented Oct 3, 2024 • edited Loading

garethrees commented Nov 22, 2024

MattK1234 commented Apr 5, 2020 •

edited

Loading

garethrees commented Nov 2, 2021 •

edited

Loading

WilliamWDTK commented Aug 18, 2024 •

edited

Loading

HelenWDTK commented Oct 3, 2024 •

edited

Loading