Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with standard email redaction #6400

Closed
RichardTaylor opened this issue Jul 29, 2021 · 2 comments
Closed

Bug with standard email redaction #6400

RichardTaylor opened this issue Jul 29, 2021 · 2 comments
Labels
bug Breaks expected functionality f:redaction stale Issues with no activity for 12 months x:uk

Comments

@RichardTaylor
Copy link

Email addresses in request correspondence should be replaced with eg.

[email address]

(Email addresses requests are sent to are treated specially and identified as the public body's request address)

At the moment on WhatDoTheyKnow.com, even on non /cy/ (Welsh) threads, we are in some circumstances seeing email addresses replaced with

[cyfeiriad ebost]

which is the Welsh for email address.

No link to help text is offered either.

eg.

https://www.whatdotheyknow.com/request/notification_of_spa_requests#incoming-1760631
https://www.whatdotheyknow.com/request/humberside_catalytic_converter_t#incoming-1627530
https://www.whatdotheyknow.com/request/data_collecting_storing_sharing_6#incoming-1742773

Looking at the examples this appears to be linked to the provision of links in reference style. ( #4578 )

This issue also appears to be affecting attachments eg.

Google for eg.

"[cyfeiriad ebost] " site:whatdotheyknow.com -attach

and

"[cyfeiriad ebost] " site:whatdotheyknow.com

for more examples

I suspect this is not just a WhatDoTheyKnow theme issue. Do move this issue if it is.

Note Google hit counts suggest "cyfeiriad e-bost" is more commonly used than "cyfeiriad ebost"

@RichardTaylor RichardTaylor added the bug Breaks expected functionality label Jul 29, 2021
@garethrees
Copy link
Member

we are in some circumstances seeing email addresses replaced with [cyfeiriad ebost]

Taking a quick look at one of the requests, I can see that the Welsh phrase has been cached in the database:

im.cached_main_body_text_folded
#=> "...snip... Visible links\n 1. mailto:[cyfeiriad ebost]\n 2. http://www.gov.uk/dwp\n\n\n"

I assume the page has been visited via the CY locale first, and as such the Welsh replacement phrase was used when generating the cache.

It seems this is a very rare edge case…

SELECT COUNT(*) FROM incoming_messages WHERE cached_main_body_text_folded LIKE '%mailto:[cyfeiriad ebost]%';
 count
-------
   131 -- out of 1821178 incoming_messages total!

…and initially started happening in 2014 (though I don't know if something changed in 2014 that caused this).

SELECT id,created_at FROM incoming_messages WHERE cached_main_body_text_folded LIKE '%mailto:[cyfeiriad ebost]%' ORDER BY created_at ASC LIMIT 10;
   id    |         created_at
---------+----------------------------
  484508 | 2014-02-19 14:16:51.818949
  516847 | 2014-05-15 06:52:46.575731
  622506 | 2015-02-26 10:56:57.008951
  860266 | 2016-08-30 11:10:43.623673
  888699 | 2016-10-31 10:51:58.185605
  895408 | 2016-11-15 13:50:55.315092
 1035276 | 2017-09-11 10:44:35.511878
 1035281 | 2017-09-11 10:48:21.632624
 1049717 | 2017-10-09 09:29:49.574985
 1049749 | 2017-10-09 09:45:22.540962

No link to help text is offered either

It's not clickable because when we visit the request in EN, our https://github.com/mysociety/alaveteli/blob/0.39.1.1/app/models/incoming_message.rb#L620 function is using the EN phrase to match against ("email address"). The replacement is clickable when the request is viewed in Welsh because it's using the CY phrase to match against, which exists in the content.


We could regenerate the caches for each of these incoming messages. I tested this theory on one of the other requests, which, now that I've cleared the cache (msg.clear_in_database_caches!) and re-visited the page in EN, has [email address] cached.

@HelenWDTK HelenWDTK added the stale Issues with no activity for 12 months label Nov 19, 2024
@HelenWDTK
Copy link
Contributor

This issue has been automatically closed due to a lack of discussion or resolution for over 12 months.
Should we decide to revisit this issue in the future, it can be reopened.

@HelenWDTK HelenWDTK closed this as not planned Won't fix, can't repro, duplicate, stale Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Breaks expected functionality f:redaction stale Issues with no activity for 12 months x:uk
Projects
None yet
Development

No branches or pull requests

3 participants