Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Citation parsing in Markdown depending on specific content leads to AuthorInText instead of NormalCitation #10584

Closed
cderv opened this issue Jan 30, 2025 · 1 comment
Labels

Comments

@cderv
Copy link
Contributor

cderv commented Jan 30, 2025

We've got a report on Quarto side about some puzzling Citation handling issue when Spans is used before a citation with two keys. For reference this is the one:

Here is a minimal reproducible example I managed to come up with

---
title: "Bug demo"
references:
- author: Jane
  id: some-very-long-id-26-chars
  issued: 2019-09
  title: Title 1
  type: article-journal
- author: John
  id: id2
  issued: 2020-11
  title: Title 2
  type: article-journal
---

[(string of 28 characters long)]{key=value} [@some-very-long-id-26-chars; @id2].

The citation is there will be parsed as AuthorInText, with brackets being kept as string (Str "[")

 pandoc -f markdown -t native --citeproc index.qmd
[ Para
    [ Span
        ( "" , [] , [ ( "key" , "value" ) ] )
        [ Str "(string"
        , Space
        , Str "of"
        , Space
        , Str "28"
        , Space
        , Str "characters"
        , Space
        , Str "long)"
        ]
    , Space
    , Str "["
    , Cite
        [ Citation
            { citationId = "some-very-long-id-26-chars"
            , citationPrefix = []
            , citationSuffix = []
            , citationMode = AuthorInText
            , citationNoteNum = 1
            , citationHash = 0
            }
        ]
        [ Str "Jane" , Space , Str "(2019)" ]
    , Str ";"
    , Space
    , Cite
        [ Citation
            { citationId = "id2"
            , citationPrefix = []
            , citationSuffix = []
            , citationMode = AuthorInText
            , citationNoteNum = 2
            , citationHash = 0
            }
        ]
        [ Str "John" , Space , Str "(2020)" ]
    , Str "]."
    ]
(...truncated...)

In HTML this looks like

Image

This syntax is expected to be NormalCitation which you get with a more generic example

---
title: "Bug demo"
references:
- author: Jane
  id: id1
  issued: 2019-09
  title: Title 1
  type: article-journal
- author: John
  id: id2
  issued: 2020-11
  title: Title 2
  type: article-journal
---

[(some content)]{key=value} [@id1; @id2].
 pandoc -f markdown -t native --citeproc index.qmd
[ Para
    [ Span
        ( "" , [] , [ ( "key" , "value" ) ] )
        [ Str "(some" , Space , Str "content)" ]
    , Space
    , Cite
        [ Citation
            { citationId = "id1"
            , citationPrefix = []
            , citationSuffix = []
            , citationMode = NormalCitation
            , citationNoteNum = 1
            , citationHash = 0
            }
        , Citation
            { citationId = "id2"
            , citationPrefix = []
            , citationSuffix = []
            , citationMode = NormalCitation
            , citationNoteNum = 1
            , citationHash = 0
            }
        ]
        [ Str "(Jane"
        , Space
        , Str "2019;"
        , Space
        , Str "John"
        , Space
        , Str "2020)"
        ]
    , Str "."
    ]

What is puzzling is that

  • Removing 1 char in the id leads to NormalCitation

    ---
    title: "Bug demo"
    references:
    - author: Jane
      id: some-very-long-id-25-char
      issued: 2019-09
      title: Title 1
      type: article-journal
    - author: John
      id: id2
      issued: 2020-11
      title: Title 2
      type: article-journal
    ---
    
    [(string of 28 characters long)]{key=value} [@some-very-long-id-25-char; @id2].
     pandoc -f markdown -t native --citeproc index.qmd | grep "NormalCitation"
                , citationMode = NormalCitation
                , citationMode = NormalCitation
    
  • Removing 1 char in the span content between (...) also lead to NormalCitation

    ---
    title: "Bug demo"
    references:
    - author: Jane
      id: some-very-long-id-26-chars
      issued: 2019-09
      title: Title 1
      type: article-journal
    - author: John
      id: id2
      issued: 2020-11
      title: Title 2
      type: article-journal
    ---
    
    [(string of 27 character long)]{key=value} [@some-very-long-id-26-chars; @id2].
    ❯ pandoc -f markdown -t native --citeproc index.qmd | grep "NormalCitation"
                , citationMode = NormalCitation
                , citationMode = NormalCitation
    
  • Adding space after [ at the start (i.e. [ (...)]) also lead to NormalCitation

    ---
    title: "Bug demo"
    references:
    - author: Jane
      id: some-very-long-id-26-chars
      issued: 2019-09
      title: Title 1
      type: article-journal
    - author: John
      id: id2
      issued: 2020-11
      title: Title 2
      type: article-journal
    ---
    
    [ (string of 28 characters long)]{key=value} [@some-very-long-id-26-chars; @id2].
    ❯ pandoc -f markdown -t native --citeproc index.qmd | grep "NormalCitation"
                , citationMode = NormalCitation
                , citationMode = NormalCitation
    

I don't have any more ideas to try pinpoint what triggers this and this seems like an issue, or could be something to protect against

Using @{..} syntax helps as it leads to NormalCitationtoo

---
title: "Bug demo"
references:
- author: Jane
  id: some-very-long-id-26-chars
  issued: 2019-09
  title: Title 1
  type: article-journal
- author: John
  id: id2
  issued: 2020-11
  title: Title 2
  type: article-journal
---

[(string of 28 characters long)]{key=value} [@{some-very-long-id-26-chars}; @id2].
❯ pandoc -f markdown -t native --citeproc index.qmd | grep "NormalCitation"
            , citationMode = NormalCitation
            , citationMode = NormalCitation
```

## Original Context

For context, original use case is something like this
````markdown
---
title: "Bug demo"
---

[(Figures 1A and S1A, Table S1)]{fg="#4D4D0C" bg="#F0F352"} [@soto_perez_crispr_cas_2019; @gregory_gut_2020; @nayfach_metagenomic_2021].

where Span is used to set attributes used in a Lua filter.

Information

This is on Windows with

❯ pandoc --version
pandoc.exe 3.6.2
Features: +server +lua
Scripting engine: Lua 5.4
User data directory: C:\Users\chris\AppData\Roaming\pandoc
Copyright (C) 2006-2024 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
@cderv cderv added the bug label Jan 30, 2025
@jgm
Copy link
Owner

jgm commented Jan 30, 2025

This is highly bizarre. It's not just about the number of characters, though.
This is AuthorInText:

[(Figures 1A and S1A, Table S1)]{.bar}
[@soto_perez_crispr_cas_2019; @foo]

but this is not (replacing ( and ) with .):

[.Figures 1A and S1A, Table S1.]{.bar}
[@soto_perez_crispr_cas_2019; @foo]

@jgm jgm closed this as completed in 8e18a81 Feb 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants