Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logic Error in SQL query invalidates editorial statistics #9132

Closed
ronste opened this issue Jul 4, 2023 · 16 comments
Closed

Logic Error in SQL query invalidates editorial statistics #9132

ronste opened this issue Jul 4, 2023 · 16 comments
Assignees
Labels
Bug:1:Low A bug that does not have a severe consequence or affects a small number of users.
Milestone

Comments

@ronste
Copy link

ronste commented Jul 4, 2023

Describe the bug
When compiling the number of published submissions for all our journals I recognized differences in the reported OJS stats (editorial stats) as compared to the actual number of publications listed in the frontend. For some journals even zero published submissions are shown.

Some of these differences can be accounted for by the known issues with missing publication dates and imported articles.

However, sticking to one particular case with no imported articles at all, 37 articles listed in the frontend and 0 published submissions reported in editorial stats over the last 2 years, I debugged the code and came across the followling lines intended to exclude imported submissions:

// Exclude submissions when the date_submitted is later
// than the first date_published. This prevents imported
// submissions from being counted in editorial stats.
$q->where(function($q) {
$q->whereNull('pi.date_published')
->orWhere('s.date_submitted', '<=', Capsule::raw('pi.date_published'));
});

In line 369 all publications with no publication date are selected, i.e. included in the response instead of excluded as was intended. Changeing line 369 to:

$q->whereNotNull('pi.date_published')

generates the correct number of 37 published articles in the editorial stats backend.

To Reproduce
Steps to reproduce the behavior:

  1. Count the number of publsihed submission in the frontend (maybe for a certian period to make it easier)
  2. Compare to the number of published submissions in the editorial stats in the backend

What application are you using?
OJS version 3.3.0-14

Additional Information
Original discussion of this code: #6011


PRs:

Fix1:

@asmecher
Copy link
Member

@bozana, could you have a look at this one?

@bozana bozana added this to the 3.3.0-16 milestone Jul 11, 2023
@bozana bozana added the Bug:1:Low A bug that does not have a severe consequence or affects a small number of users. label Jul 11, 2023
@bozana
Copy link
Collaborator

bozana commented Jul 11, 2023

Hi @ronste, good catch!
I think that part of the code (whereNull) is correct -- it is an OR condition -- else, if using whereNotNull, the other editorial stats (e.g. number of received submissions) would be wrong because those not published articles would not be considered.

I think the problem is in this line:
->orWhere('s.date_submitted', '<=', Capsule::raw('pi.date_published'));
and I think it should be:
->orWhere(Capsule::raw('DATE(s.date_submitted)'), '<=', Capsule::raw('pi.date_published'));
The date_submitted is a datetime, e.g. 2023-06-27 18:31:12, and date_published is a date, e.g. 2023-06-27. Thus, if both are the same dates, on the same day, the condition will never be true.

Could you please check if those missing articles are maybe submitted and published on the same day i.e. how are the submitted and published date of those articles? Could you maybe also test this change and investigate all the stats counts (not only of the published articles) then?

@ronste
Copy link
Author

ronste commented Jul 11, 2023

Hi @bozana ,

when I try your suggestion, i.e.

$q->where(function($q) {
			$q->whereNull('pi.date_published')
			->orWhere(Capsule::raw('DATE(s.date_submitted)'), '<=', Capsule::raw('pi.date_published'));

I get:

  • 25 published submission
  • 37 other submission
  • 37 imported submissions

instead of 37 published and 0 imported:

grafik

And I just did another short test. I created a new empty journal (actually a fresh installation) and created and published a single submission. With the original code the editorial activity shows:

grafik

With the code you suggested I get:

grafik

That is certainly better. But the submissions are still counted as imported.

If I apply your fix to replace 's.date_submitted'with Capsule::raw('DATE(s.date_submitted)') to the function countImported() I indeed get a count of zero imported submissions. In the same file I can see a number of other occurances of 's.date_submitted' where this might kick in.

However, i still can't figure out why I only get 25 published submissions for my joural instead of 37. When I run the query in the database, i.e.:

select * from `submissions` as `s` left join `publications` as `pi` on
  `pi`.`publication_id` = (select `pi2`.`publication_id` from `publications` as `pi2` where
    `pi2`.`submission_id` = `s`.`submission_id` and `pi2`.`status` = 3
      order by `pi2`.`date_published` asc limit 1) where `s`.`context_id` in (1) and
        `s`.`submission_progress` = 0 and
      (`pi`.`date_published` is null or DATE(s.date_submitted) >= pi.date_published) and
        `pi`.`date_published` > '2021-07-05';

I get the correct number of 37 rows of publications.

@bozana
Copy link
Collaborator

bozana commented Jul 13, 2023

yes, I would then need to apply the changes to those other places in the code 👍
Hmmm... let me think about that 25 vs. 37 published....

@bozana
Copy link
Collaborator

bozana commented Jul 13, 2023

what do you get wit this sql:

select * from `submissions` as `s` 
left join `publications` as `pi` on `pi`.`publication_id` = (select `pi2`.`publication_id` from `publications` as `pi2` where `pi2`.`submission_id` = `s`.`submission_id` and `pi2`.`status` = 3 order by `pi2`.`date_published` asc limit 1) 
left join `publications` as `p` on `p`.`publication_id` = (select `p2`.`publication_id` from `publications` as `p2` where `p2`.`submission_id` = s.submission_id and `p2`.`status` = 3 order by `p2`.`date_published` asc limit 1) 
where `s`.`context_id` in (1) and `s`.`submission_progress` = 0 and (`pi`.`date_published` is null or `s`.`date_submitted` <= pi.date_published) and `s`.`status` = 3 and `p`.`date_published` >= 2021-07-11 and `p`.`date_published` <= 2023-07-10

@ronste
Copy link
Author

ronste commented Jul 13, 2023

what do you get wit this sql:

select * from `submissions` as `s` 
left join `publications` as `pi` on `pi`.`publication_id` = (select `pi2`.`publication_id` from `publications` as `pi2` where `pi2`.`submission_id` = `s`.`submission_id` and `pi2`.`status` = 3 order by `pi2`.`date_published` asc limit 1) 
left join `publications` as `p` on `p`.`publication_id` = (select `p2`.`publication_id` from `publications` as `p2` where `p2`.`submission_id` = s.submission_id and `p2`.`status` = 3 order by `p2`.`date_published` asc limit 1) 
where `s`.`context_id` in (1) and `s`.`submission_progress` = 0 and (`pi`.`date_published` is null or `s`.`date_submitted` <= pi.date_published) and `s`.`status` = 3 and `p`.`date_published` >= 2021-07-11 and `p`.`date_published` <= 2023-07-10

I get an empty set.

And I get 4 if I ommit the date range, 8 if I ommit s.status = 3 and 109 if I ommit (pi.date_published is null or s.date_submitted <= pi.date_published).

By the way, this is the journal used for this test: https://dedo.ub.uni-siegen.de/index.php/de_do.
We are talking about the issues 2021 and 2022 which together have 37 publications (plus two issue pdfs) on the web site.

@bozana
Copy link
Collaborator

bozana commented Jul 13, 2023

Ah sorry the DATE around was missing, thus:

select * from `submissions` as `s` 
left join `publications` as `pi` on `pi`.`publication_id` = (select `pi2`.`publication_id` from `publications` as `pi2` where `pi2`.`submission_id` = `s`.`submission_id` and `pi2`.`status` = 3 order by `pi2`.`date_published` asc limit 1) 
left join `publications` as `p` on `p`.`publication_id` = (select `p2`.`publication_id` from `publications` as `p2` where `p2`.`submission_id` = s.submission_id and `p2`.`status` = 3 order by `p2`.`date_published` asc limit 1) 
where `s`.`context_id` in (1) and `s`.`submission_progress` = 0 and (`pi`.`date_published` is null or DATE(`s`.`date_submitted`) <= pi.date_published) and `s`.`status` = 3 and `p`.`date_published` >= 2021-07-11 and `p`.`date_published` <= 2023-07-10

@ronste
Copy link
Author

ronste commented Jul 13, 2023

Ah sorry the DATE around was missing, thus:

select * from `submissions` as `s` 
left join `publications` as `pi` on `pi`.`publication_id` = (select `pi2`.`publication_id` from `publications` as `pi2` where `pi2`.`submission_id` = `s`.`submission_id` and `pi2`.`status` = 3 order by `pi2`.`date_published` asc limit 1) 
left join `publications` as `p` on `p`.`publication_id` = (select `p2`.`publication_id` from `publications` as `p2` where `p2`.`submission_id` = s.submission_id and `p2`.`status` = 3 order by `p2`.`date_published` asc limit 1) 
where `s`.`context_id` in (1) and `s`.`submission_progress` = 0 and (`pi`.`date_published` is null or DATE(`s`.`date_submitted`) <= pi.date_published) and `s`.`status` = 3 and `p`.`date_published` >= 2021-07-11 and `p`.`date_published` <= 2023-07-10

Still an empty set unfortunately. Maybe we can have a call to try these things? Can you send me your number by e-mail?

@bozana
Copy link
Collaborator

bozana commented Jul 13, 2023

start and end date maybe need ' around it:

select * from `submissions` as `s` 
left join `publications` as `pi` on `pi`.`publication_id` = (select `pi2`.`publication_id` from `publications` as `pi2` where `pi2`.`submission_id` = `s`.`submission_id` and `pi2`.`status` = 3 order by `pi2`.`date_published` asc limit 1) 
left join `publications` as `p` on `p`.`publication_id` = (select `p2`.`publication_id` from `publications` as `p2` where `p2`.`submission_id` = s.submission_id and `p2`.`status` = 3 order by `p2`.`date_published` asc limit 1) 
where `s`.`context_id` in (1) and `s`.`submission_progress` = 0 and (`pi`.`date_published` is null or DATE(`s`.`date_submitted`) <= pi.date_published) and `s`.`status` = 3 and `p`.`date_published` >= '2021-07-11' and `p`.`date_published` <= '2023-07-10'

@ronste
Copy link
Author

ronste commented Jul 13, 2023

start and end date maybe need ' around it:

select * from `submissions` as `s` 
left join `publications` as `pi` on `pi`.`publication_id` = (select `pi2`.`publication_id` from `publications` as `pi2` where `pi2`.`submission_id` = `s`.`submission_id` and `pi2`.`status` = 3 order by `pi2`.`date_published` asc limit 1) 
left join `publications` as `p` on `p`.`publication_id` = (select `p2`.`publication_id` from `publications` as `p2` where `p2`.`submission_id` = s.submission_id and `p2`.`status` = 3 order by `p2`.`date_published` asc limit 1) 
where `s`.`context_id` in (1) and `s`.`submission_progress` = 0 and (`pi`.`date_published` is null or DATE(`s`.`date_submitted`) <= pi.date_published) and `s`.`status` = 3 and `p`.`date_published` >= '2021-07-11' and `p`.`date_published` <= '2023-07-10'

Now its 25

@bozana
Copy link
Collaborator

bozana commented Jul 13, 2023

This is the SQL executed to get published submissions. How do other 12 submissions from you not fit into it... ? 🤔
EDIT: do all those published submissions have only one version/publication?

@ronste
Copy link
Author

ronste commented Jul 13, 2023

This is the SQL executed to get published submissions. How do other 12 submissions from you not fit into it... ? 🤔 EDIT: do all those published submissions have only one version/publication?

Yes, the database in total contains 140 submission IDs and 140 publication IDs.

@bozana
Copy link
Collaborator

bozana commented Jul 14, 2023

@ronste, I've just tested the case when an article is submitted, assigned to an already published issue, and then published. The article date published is not the issue's date published, but the current date. I have tested it using the latest release i.e. the current stable_3_3_0 branch. Thus, this is not your case. Maybe QuickSubmitt plugin behaves differently, maybe you can double check that i.e. if your users have used that plugin and entered the date published that way...
However I will provide the SQL fix here, that we agreed solves the wrong numbers on the editorial statistics page.

bozana added a commit to bozana/pkp-lib that referenced this issue Jul 14, 2023
bozana added a commit to bozana/ojs that referenced this issue Jul 14, 2023
bozana added a commit to bozana/pkp-lib that referenced this issue Jul 14, 2023
bozana added a commit to bozana/ojs that referenced this issue Jul 14, 2023
bozana added a commit to bozana/pkp-lib that referenced this issue Jul 18, 2023
bozana added a commit to bozana/ojs that referenced this issue Jul 18, 2023
bozana added a commit to bozana/pkp-lib that referenced this issue Jul 18, 2023
bozana added a commit to bozana/ojs that referenced this issue Jul 18, 2023
bozana added a commit to bozana/pkp-lib that referenced this issue Jul 18, 2023
bozana added a commit to bozana/ojs that referenced this issue Jul 18, 2023
bozana added a commit that referenced this issue Jul 19, 2023
#9132 fix SQL datetime and date comparison for editorial s…
bozana added a commit to pkp/ojs that referenced this issue Jul 19, 2023
pkp/pkp-lib#9132 submodule update ##bozana/9132-3_4_0##
@ronste
Copy link
Author

ronste commented Jul 21, 2023

@ronste, I've just tested the case when an article is submitted, assigned to an already published issue, and then published. The article date published is not the issue's date published, but the current date. I have tested it using the latest release i.e. the current stable_3_3_0 branch. Thus, this is not your case. Maybe QuickSubmitt plugin behaves differently, maybe you can double check that i.e. if your users have used that plugin and entered the date published that way... However I will provide the SQL fix here, that we agreed solves the wrong numbers on the editorial statistics page.

Hi @bozana ,

thanks for the discuission of the issue and the fixes! I will have a look at our specific case(s) when I'm back from vacation. This aprticular journal started with OJS 3 but other have an OJS 2 history, so there might be different reasons for the remaining issue.

@ronste ronste closed this as completed Jul 21, 2023
bozana added a commit to bozana/pkp-lib that referenced this issue Jul 29, 2023
bozana added a commit to bozana/ojs that referenced this issue Jul 29, 2023
@asmecher
Copy link
Member

asmecher commented Aug 2, 2023

@bozana, could you check out this comment?

@asmecher asmecher reopened this Aug 2, 2023
bozana added a commit to bozana/pkp-lib that referenced this issue Aug 9, 2023
bozana added a commit to bozana/ojs that referenced this issue Aug 9, 2023
bozana added a commit to bozana/omp that referenced this issue Aug 9, 2023
bozana added a commit to bozana/ops that referenced this issue Aug 9, 2023
bozana added a commit to bozana/pkp-lib that referenced this issue Aug 9, 2023
bozana added a commit to bozana/ojs that referenced this issue Aug 9, 2023
bozana added a commit that referenced this issue Aug 9, 2023
#9132 fix SQL datetime and date comparison for editorial s…
bozana added a commit to pkp/ojs that referenced this issue Aug 9, 2023
pkp/pkp-lib#9132 submodule update ##bozana/9132##
bozana added a commit that referenced this issue Aug 9, 2023
bozana added a commit to pkp/ojs that referenced this issue Aug 9, 2023
pkp/pkp-lib#9132 submodule update ##bozana/9132-3_3_0##
bozana added a commit to pkp/omp that referenced this issue Aug 9, 2023
pkp/pkp-lib#9132 submodule update ##bozana/9132-3_3_0##
bozana added a commit to pkp/ops that referenced this issue Aug 9, 2023
pkp/pkp-lib#9132 submodule update ##bozana/9132-3_3_0##
bozana added a commit to bozana/pkp-lib that referenced this issue Aug 10, 2023
bozana added a commit to bozana/ojs that referenced this issue Aug 10, 2023
bozana added a commit to bozana/omp that referenced this issue Aug 10, 2023
bozana added a commit to bozana/ops that referenced this issue Aug 10, 2023
bozana added a commit that referenced this issue Aug 11, 2023
bozana added a commit to pkp/ojs that referenced this issue Aug 11, 2023
pkp/pkp-lib#9132 submodule update ##bozana/9132-3_4_0##
bozana added a commit to pkp/omp that referenced this issue Aug 11, 2023
pkp/pkp-lib#9132 submodule update ##bozana/9132-3_4_0##
bozana added a commit to pkp/ops that referenced this issue Aug 11, 2023
pkp/pkp-lib#9132 submodule update ##bozana/9132-3_4_0##
@bozana
Copy link
Collaborator

bozana commented Aug 11, 2023

Fixes merged, thus closing...

@bozana bozana closed this as completed Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug:1:Low A bug that does not have a severe consequence or affects a small number of users.
Projects
None yet
Development

No branches or pull requests

3 participants