Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Citation order within brackets affects citation grouping #11971

Open
apcamargo opened this issue Jan 28, 2025 · 13 comments
Open

Citation order within brackets affects citation grouping #11971

apcamargo opened this issue Jan 28, 2025 · 13 comments
Assignees
Labels
bug Something isn't working citations Issues with citations pandoc
Milestone

Comments

@apcamargo
Copy link

Bug description

This issue is triggered under very specific conditions. I tried to reduce the problem as much as possible, and a reproducible example is available in this repository: https://github.com/apcamargo/quarto-citations-bug

In certain cases, multiple citations within square brackets fail to group correctly in the resulting HTML (see the first line of the screenshot below). Through testing, I identified that this issue is triggered, in this specific example, when the following conditions are met:

  1. The text preceding the citations is highlighted using quarto-highlight-text.
  2. The text preceding the citations is enclosed in parentheses ().
  3. The word Figures is used instead of Figure.
  4. The citations within the square brackets are in a specific order. In this case: [@soto_perez_crispr_cas_2019; @gregory_gut_2020; @nayfach_metagenomic_2021] or [@soto_perez_crispr_cas_2019; @nayfach_metagenomic_2021; @gregory_gut_2020].
Image

Steps to reproduce

git clone [email protected]:apcamargo/quarto-citations-bug.git
cd quarto-citations-bug.git
quarto preview

Expected behavior

Citations within square brackets should be grouped regardless of the order of the citations or the text preceding the opening of the brackets.

Actual behavior

The text preceding the citations as well as the order of citations within the square brackets can prevent citations from being grouped.

Your environment

  • IDE: Positron Version: 2025.01.0 (Universal) build 39
  • OS: macOS 15.2 (24C101)

Quarto check output

Quarto 1.6.40
[✓] Checking environment information...
      Quarto cache location: /Users/APCamargo-M55/Library/Caches/quarto
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.4.0: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.46.3: OK
      Typst version 0.11.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.6.40
      Path: /Applications/quarto/bin

[✓] Checking tools....................OK
      TinyTeX: (not installed)
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Tex:  (not detected)

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.9.20 (Conda)
      Path: /Users/APCamargo-M55/.mambaforge/bin/python
      Jupyter: (None)

      Jupyter is not available in this Python installation.
      Install with conda install jupyter

[✓] Checking R installation...........(None)

      Unable to locate an installed version of R.
      Install R from https://cloud.r-project.org/
@apcamargo apcamargo added the bug Something isn't working label Jan 28, 2025
@mcanouil
Copy link
Collaborator

mcanouil commented Jan 28, 2025

Without testing, I would say, this is likely on my extension/me.

To be sure, only the first line is not expected, right?

@mcanouil
Copy link
Collaborator

mcanouil commented Jan 29, 2025

Nope, the extension has nothing to do with the issue.

InputOutput
references.bib
@article{soto_perez_crispr_cas_2019,
  title        = {{CRISPR}-{Cas} {System} of a {Prevalent} {Human} {Gut} {Bacterium} {Reveals} {Hyper}-targeting against {Phages} in a {Human} {Virome} {Catalog}},
  author       = {Soto-Perez, Paola and Bisanz, Jordan E. and Berry, Joel D. and Lam, Kathy N. and Bondy-Denomy, Joseph and Turnbaugh, Peter J.},
  year         = 2019,
  month        = sep,
  journal      = {Cell Host \& Microbe},
  volume       = 26,
  number       = 3,
  pages        = {325--335.e5},
  doi          = {10.1016/j.chom.2019.08.008},
  issn         = 19313128,
  url          = {https://linkinghub.elsevier.com/retrieve/pii/S1931312819304172},
  language     = {en}
}
@article{gregory_gut_2020,
  title        = {The {Gut} {Virome} {Database} {Reveals} {Age}-{Dependent} {Patterns} of {Virome} {Diversity} in the {Human} {Gut}},
  author       = {Gregory, Ann C. and Zablocki, Olivier and Zayed, Ahmed A. and Howell, Allison and Bolduc, Benjamin and Sullivan, Matthew B.},
  year         = 2020,
  month        = nov,
  journal      = {Cell Host \& Microbe},
  volume       = 28,
  number       = 5,
  pages        = {724--740.e8},
  doi          = {10.1016/j.chom.2020.08.003},
  issn         = 19313128,
  url          = {https://linkinghub.elsevier.com/retrieve/pii/S193131282030456X},
  language     = {en}
}
@article{nayfach_metagenomic_2021,
  title        = {Metagenomic compendium of 189,680 {DNA} viruses from the human gut microbiome},
  author       = {Nayfach, Stephen and Páez-Espino, David and Call, Lee and Low, Soo Jen and Sberro, Hila and Ivanova, Natalia N. and Proal, Amy D. and Fischbach, Michael A. and Bhatt, Ami S. and Hugenholtz, Philip and Kyrpides, Nikos C.},
  year         = 2021,
  month        = jun,
  journal      = {Nature Microbiology},
  volume       = 6,
  number       = 7,
  pages        = {960--970},
  doi          = {10.1038/s41564-021-00928-6},
  issn         = {2058-5276},
  url          = {https://www.nature.com/articles/s41564-021-00928-6},
  language     = {en}
}
Quarto
---
title: "Bug demo"
bibliography: references.bib
format: html
---

[(Figures 1A and S1A, Table S1)]{} [@soto_perez_crispr_cas_2019; @gregory_gut_2020; @nayfach_metagenomic_2021].

[(Figures 1A and S1A, Table S1)]{} [
  @soto_perez_crispr_cas_2019;
  @gregory_gut_2020;
  @nayfach_metagenomic_2021
].
Image

@mcanouil
Copy link
Collaborator

mcanouil commented Jan 29, 2025

If you shorten the IDs of the references ...

InputOutput
---
title: "Bug demo"
bibliography: references.bib
format: html
---

[(Figures 1A and S1A, Table S1)]{} [@soto_2019; @gregory_2020; @nayfach_2021].
Image

That's very weird, it's not really about the length but it is ...

Image

Here is a fully self-contained example:

---
title: "Bug demo"
format: html
references:
  - id: soto-perez-crispr-cas-2019
    type: article-journal
    title: "Article 1a"
    author:
      - "John Doe"
      - "Jane Doe"
    issued: 2019
  - id: soto-perez-crispr-cas-201
    type: article-journal
    title: "Article 1b"
    author:
      - "John Doe"
      - "Jane Doe"
    issued: 2019
  - id: soto-perezH-crispr-cas-2019
    type: article-journal
    title: "Article 1c"
    author:
      - "John Doe"
      - "Jane Doe"
    issued: 2019
  - id: gregory_2020
    type: article-journal
    title: "Article 2"
    author: "John Doe"
    issued: 2020
  - id: nayfach_2021
    type: article-journal
    title: "Article 3"
    author: "Jane Doe"
    issued: 2021
---

- Issue

[(Figures 1A and S1A, Table S1)]{} [@soto-perez-crispr-cas-2019; @gregory_2020; @nayfach_2021].

- No span before the citation

(Figures 1A and S1A, Table S1) [@soto-perez-crispr-cas-2019; @gregory_2020; @nayfach_2021].

- Id a bit shorter

[(Figures 1A and S1A, Table S1)]{} [@soto-perez-crispr-cas-201; @gregory_2020; @nayfach_2021].

- extra space before semi-colon

[(Figures 1A and S1A, Table S1)]{} [@soto-perez-crispr-cas-2019 ; @gregory_2020; @nayfach_2021].

- One extra character in the id

[(Figures 1A and S1A, Table S1)]{} [@soto-perezH-crispr-cas-2019; @gregory_2020; @nayfach_2021].

@apcamargo
Copy link
Author

It looks like this is somehow triggered by the words [T|t]ables and [F|f]igures.

Image

@mcanouil
Copy link
Collaborator

To me, that's a parsing issue somewhere:

[ (Figures 1A and S1A, Table S1)]{} [@soto-perez-crispr-cas-2019; @gregory_2020; @nayfach_2021].

will produce the expected output. Note the extra space [ (.

Indeed Figures/Tables seems to trigger some feature somewhere when inside [(...)]

@apcamargo
Copy link
Author

I was wondering if this could be an issue with Pandoc, but apparently it also affects other outputs. See the Typst PDF below.

Image

@mcanouil
Copy link
Collaborator

mcanouil commented Jan 29, 2025

The issue is not the output, it's before that, use format: native.
Citations are handled by Pandoc citeproc.

@mcanouil mcanouil added crossref citations Issues with citations labels Jan 29, 2025
@cderv
Copy link
Collaborator

cderv commented Jan 30, 2025

This is quite a puzzling issue but I believe this is related to Pandoc parsing.

I can reproduce using this simple doc (with this bib file https://raw.githubusercontent.com/apcamargo/quarto-citations-bug/refs/heads/main/references.bib)

---
title: "Bug demo"
bibliography: "references.bib"
---

[(xxxxxxxxxxxxxxxxxxxxxxxxxxxx)]{key=value} [@soto_perez_crispr_cas_2019; @gregory_gut_2020].

See the brackets

Image

Now remove on x

No more brackets and correct parenthesis

---
title: "Bug demo"
bibliography: "references.bib"
---

[(xxxxxxxxxxxxxxxxxxxxxxxxxxx)]{key=value} [@soto_perez_crispr_cas_2019; @gregory_gut_2020].

Image

So this is something to understand and open on pandoc side.

@cderv cderv added pandoc and removed crossref labels Jan 30, 2025
@cderv cderv self-assigned this Jan 30, 2025
@cderv cderv added external and removed external labels Jan 30, 2025
@cderv
Copy link
Collaborator

cderv commented Jan 30, 2025

There is definitely a difference in native representation

diff --git "a/.\\test2.native" "b/.\\test.native"
index 6087bf1..04ce27f 100644
--- "a/.\\test2.native"
+++ "b/.\\test.native"
@@ -1,48 +1,43 @@
 [ Para
     [ Span
         ( "" , [] , [ ( "key" , "value" ) ] )
-        [ Str "(xxxxxxxxxxxxxxxxxxxxxxxxxxxx)" ]
+        [ Str "(xxxxxxxxxxxxxxxxxxxxxxxxxxx)" ]
     , Space
-    , Str "["
     , Cite
         [ Citation
             { citationId = "soto_perez_crispr_cas_2019"
             , citationPrefix = []
             , citationSuffix = []
-            , citationMode = AuthorInText
+            , citationMode = NormalCitation
             , citationNoteNum = 1
             , citationHash = 0
             }
-        ]
-        [ Str "Soto-Perez"
-        , Space
-        , Str "et"
-        , Space
-        , Str "al."
-        , Space
-        , Str "(2019)"
-        ]
-    , Str ";"
-    , Space
-    , Cite
-        [ Citation
+        , Citation
             { citationId = "gregory_gut_2020"
             , citationPrefix = []
             , citationSuffix = []
-            , citationMode = AuthorInText
-            , citationNoteNum = 2
+            , citationMode = NormalCitation
+            , citationNoteNum = 1
             , citationHash = 0
             }
         ]
-        [ Str "Gregory"
+        [ Str "(Soto-Perez"
+        , Space
+        , Str "et"
+        , Space
+        , Str "al."
+        , Space
+        , Str "2019;"
+        , Space
+        , Str "Gregory"
         , Space
         , Str "et"
         , Space
         , Str "al."
         , Space
-        , Str "(2020)"
+        , Str "2020)"
         ]
-    , Str "]."
+    , Str "."
     ]
 , Div
     ( "refs"

So definitely something in the pandoc parser 🤔

@cderv
Copy link
Collaborator

cderv commented Jan 30, 2025

Note the extra space [ (.

Adding space after [ helps with this

This gives expected output:

---
title: "Bug demo"
bibliography: "references.bib"
---

[ (xxxxxxxxxxxxxxxxxxxxxxxxxxxx) ]{key=value} [@soto_perez_crispr_cas_2019; @gregory_gut_2020].

@cderv
Copy link
Collaborator

cderv commented Jan 30, 2025

More minimal reprex

---
title: "Bug demo"
references:
- author: Jane
  id: soto_perez_crispr_cas_2019
  issued: 2019-09
  title: Title 1
  type: article-journal
- author: John
  id: gregory_gut_2020
  issued: 2020-11
  title: Title 2
  type: article-journal
---

[(xxxxxxxxxxxxxxxxxxxxxxxxxxxx)]{key=value} [@soto_perez_crispr_cas_2019; @gregory_gut_2020].

Using {..} on the keys help

[@{soto_perez_crispr_cas_2019}; @{gregory_gut_2020}].

This is recommended in the doc https://pandoc.org/MANUAL.html#extension-citations

Unless a citation key starts with a letter, digit, or _, and contains only alphanumerics and single internal punctuation characters (:.#$%&-+?<>~/), it must be surrounded by curly braces, which are not considered part of the key.

@apcamargo you could try adding those curly braces in your project to see if this helps.

@cderv
Copy link
Collaborator

cderv commented Jan 30, 2025

I wasn't sure if the ids were related, so I reduced again.

This reproduce

---
title: "Bug demo"
references:
- author: Jane
  id: aaaaaaaaaaaaaaaaaaaaaaaaaa
  issued: 2019-09
  title: Title 1
  type: article-journal
- author: John
  id: id2
  issued: 2020-11
  title: Title 2
  type: article-journal
---

[(xxxxxxxxxxxxxxxxxxxxxxxxxxxx)]{key=value} [@aaaaaaaaaaaaaaaaaaaaaaaaaa; @id2].
  • Removing one x remove the issue
  • Removing one a remove the issue too

Adding {..} around key helps

---
title: "Bug demo"
references:
- author: Jane
  id: aaaaaaaaaaaaaaaaaaaaaaaaa
  issued: 2019-09
  title: Title 1
  type: article-journal
- author: John
  id: id2
  issued: 2020-11
  title: Title 2
  type: article-journal
---

[(xxxxxxxxxxxxxxxxxxxxxxxxxxxx)]{key=value} [@{aaaaaaaaaaaaaaaaaaaaaaaaa} ; @id2].

I think we need to look into the Haskell code in Pandoc now to understand what could be the problem here

@cderv
Copy link
Collaborator

cderv commented Jan 30, 2025

This is reported to Pandoc - we'll see.

Thanks for the report here @apcamargo !

@cderv cderv added this to the Future milestone Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working citations Issues with citations pandoc
Projects
None yet
Development

No branches or pull requests

3 participants