-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Pandoc Meta data to Context #643
Comments
+1. In my case, I have Pandoc filter that counts the number of words in blog posts, and deduces the time-to-read. This information is then stored in the Pandoc metadata. I would like to tap into this information via a template. I'm willing to draft a pull request if someone can help me understand the steps required. |
I have an idea which I'm going to try: extend the Provide type by adding a new field - |
Thank you for your work! However I don't think it is the right solution for this specific issue just yet. A more appropriate solution would be to improve how files are loaded into the store (See Given the lack of response from hakyll's maintainer @jaspervdj , I did not start working on this as I had very little hope for such a change to be merged, and thought that designing a proper solution required some more discussion. Might look at it in the future if I can find some time. Please tell me if I got your PR wrong. |
Meh, Pandoc metadata is not trivial (I personally do not want to lose title formatting - from LaTeX), someone may use Hakyll without Pandoc, etc. After all I want a universal solution. Thus just a simple |
Let me reformulate. I have no doubt your PR is useful and would love to see it merged. What I'm arguing is that it does not resolve this issue, hence I'd rather you did not put "Closes #643" in the PR comment. As for providing only a generic solution and not an additional one for pandoc documents, I don't believe the argument that "someone may use Hakyll without Pandoc" to be sufficient. Hakyll is very much made to work well with pandoc (pandoc compilers would not have been included otherwise), and I think there is value to optionally ease the handling of pandoc metadata. The upvotes this issue received suggest other people are interested as well. The performance concern due to parsing at least twice every document still stands, and if we can do better by being less generic then so be it. |
Closing this now. After @ip1981's comment and PR more than a week ago I started investigating whether it could be used as a starting point for solving this issue. Still, I was just trying to work against every abstraction Hakyll is using. All in all Hakyll was simply not a generic enough tool for what I wanted (that's not bad per se!). |
@flupe, do you mean that you tried implementing that solution you suggested, which would make Hakyll work with an |
I think this issue should be unclosed. I just ran into a similar problem where I assumed that the |
@gwern Can you post a summary of your instance of the problem, so I can poke around and understand what pieces are involved? Then we can discuss possible solutions. |
It's fairly straightforward: I use a My preferred solution would be for Hakyll to simply not erase the original Pandoc metadata. Does anyone expect it to do that? You'd expect it to read a A quick reminder of the relevant Pandoc types:
|
Using
According to the documentation, we can retrieve the raw content including the yaml preamble part, using Since |
That's not really a solution: that's an ad hoc workaround which forces all the work of parsing and munging and compiling onto the user (defeating the whole point of Hakyll, which is to not spend my time doing that piping - the sort of thing which, say, forces people to stop using Hakyll entirely or write their own libraries & close the issue...), to fix a design decision to silently destroy user data - a choice which still has not yet been given any justification at all, currently does not appear to even have been intentional (just an oversight), and which has many clear reasons to reject. |
I also think that being able to reuse metadata from Pandoc would be good. One of the reasons why this wasn't done when I initially wrote Hakyll is that:
|
If we modify the default implementation of |
I am not sure. Apparently Pandoc metadata is not necessarily just a preamble at the start of the Markdown file, and you can have multiple YAML blocks anywhere in a file. (I guess this is to support templating / concatenating files and overriding defaults.) This came up recently in trying to add a safe lint warning about the YAML metadata: jgm/pandoc#10312 |
Another example of how this bit me, and how deeply unexpected it is for Hakyll to go out of its way to erase the metadata on a A few days ago I noticed a misspelled field in one of my old essays while checking for another problem ( Only for it to crash on the first page the moment I tried to rebuild my site to winkle out all remaining typos. Because the metadata was... empty. Huh?!?!?! How can it possibly be empty? That particular page is perfect, I check it all the time because it's the first one, of course it has Pandoc metadata, it's impossible for the metadata to not be there. ...Oh right. That bug. I then looked at I will just have to spot the remaining metadata problems the hard way, it seems. |
Another example: several hundred pages on Gwern.net have an associated thumbnail image associated with them (a graph, a chart, some AI-generated art etc). Those are shown in social media 'card' previews & in popups of that page, but not in the page itself. We'd like to show the image in the page itself somewhere (since I work pretty hard on some of those), and have currently settled on appending it to the end of the abstract. But we obviously do not want to manually edit in several hundred image links, which will be redundant with the page metadata (DRY), clutter the abstracts, cause various downstream problems, etc. So the logical way would be, when compiling, to simply extract it from the page metadata, walk the Pandoc, and inject a Figure element inside the Except oh wait, right, you can't - not because the walking is infeasible or the logic is too squirrelly or any real problem, but because the metadata has been erased! So now we just do that with Javascript. It damages layout reflow and is more runtime load on the client, but at least it was easy to write and doesn't involve anything crazy like 'rereading and parsing every Markdown file twice in order to get a single metadata value to pass into the Pandoc object in order to rewrite that (after querying it to make sure the rewrite hadn't happened before)'. |
If you need a workaround, this may suffice anyway. In this way, the metadata aren't removed and you can use them within |
One interesting feature of ReST is that you can define document variables and meta data directly inside the document, without having to rely on some additional YAML header.
For example, the title of the document and its eventual subtitle can be inferred from the first headers of the file (see Markup Specification (Document Structure)),
and if the first non-comment element is a definition list, its fields update the bibliographic information of the document (see Markup Specification (Bibliographic Fields)).
Although the ReST Parser of Pandoc is far from perfect (it does not support custom directives or roles, and the Pandoc AST is quite restrictive), it does implement the aforementioned features, in standalone mode, and populates the Meta information of the Pandoc document.
However, it seems as though the Pandoc compilers provided by Hakyll completely ignore the meta information of the parsed Pandoc documents.
I don't really know if other markup languages supported by Pandoc also populate the meta information, but I do think it would be useful to provide an easier way to inject this meta information into Hakyll contexts.
For a custom site I've just set up, it is somehow working a little. Here is the relevant part (source):
Essentially, we:
Item Pandoc
ctx
populated with the metadata from the document.However, this mechanism in ReST implies that you actually need to parse the document to get the context, and I have no idea how to make this work well with other rules such as creating and
index.html
orarchives.html
page (without having to parse again for each rule).In conclusion, here's what I am suggesting:
The text was updated successfully, but these errors were encountered: