Retrieving the parsed structures for external processing by third-party tooling #5

rdw-software · 2022-08-16T18:59:53Z

Hi,

is there any way to get the parsed structures in a format that can be processed by other applications?

I'd like to use a JSON or Lua table representation of the API to generate content dynamically so that it can be embedded in a static documentation website (created by Docusaurus). While serviceable, the website that luadox generates by default doesn't lend itself well to customization, and I can think of other ways to process the API structures that would be useful to me as well.

I see they're stored in the Parser class, but I'm not sure if the internal layout is appropriate if dumped. Ideally, I would have something along the lines of Blizzard's WOW API documentation that could easily be used to populate React components in Docusaurus, but any standardized format would likely work.

Do you think this would be possible, and if yes what approach would you suggest?

The text was updated successfully, but these errors were encountered:

jtackaberry · 2022-08-16T20:02:08Z

It's certainly technically possible to dump the pre-render stage data structures to JSON. One concern is that this creates an API contract that has an expectation of stability, while for rendered content there's much more latitude for change. So that's a new thing that would require some thought.

My other main question is how to deal with markdown and references? Leave them entirely unparsed/raw in the JSON?

In LuaDox, the renderer takes care of parsing and converting markdown to HTML, and at that time all references (both @{foo} and `foo`) are resolved to hyperlinks. Because at this stage we know what things go in what files and in what sections within those files, and so references can be appropriately resolved to specific anchor tags in specific files.

How do you think this should be handled with JSON renders? Leave them raw/unresolved?

rdw-software · 2022-08-17T13:57:03Z

Thanks for the quick response! I haven't thought about the design much, but here's a few ideas:

I would simply add a schemaVersion (number) and then consumers can make sure they support the latest one
Markdown should not be modified in any way, since rendering it would be the responsibility of the consumer
References could be transformed into a unique ID/URL-style string, e.g. MyApp.MyModule.HelloWorldFunction or even table<someID> in typical Lua style, depending on the type of the serialized values

If I had a JSON of all the functions, modules, etc. organized by file, I would probably construct URLs based on them so that they fit into the existing website. For example, the MyApp.MyModule.HelloWorldFunction structure could be used to create an entry at https://mydocs.github.io/api/MyModule#HelloWorldFunction or similar (1:1 mapping).

I don't really know how luadox handles this internally so I can't comment on what would work best. But I guess if you consider this an experimental feature you would have plenty of opportunity to iterate after seeing how it turns out in practice :)

pakeke-constructor · 2022-09-26T07:33:15Z

Has anyone done any more thinking about this?
I like this feature, and I'd be willing to implement it.

~~I agree with the whole namespaced id thing, with the MyApp.MyModule.Func stuff.~~
Woops, I misunderstood the issue.
Yeah, that is a bit annoying... perhaps we could keep a file value in the json entry that keeps track of the file that each Reference was defined in? That way everything could be namespaced correctly and we wouldn't get collisions.
Each pass, we could also assign each Reference object a unique integer id, and we could use that for referencing within the JSON object.

This is a substantial refactoring with the primary goal to reduce reliance on duck typing by using more concrete types in order to benefit from static type checking. The Reference object has been broken up into untyped and typed references, where typed refs are subclasses of Reference, and which implements the various data needed for rendering. This also begins paving the way to supporting multiple renderers, which will be needed for #5. In order to use slightly more recent type hinting features, Python 3.8 is now required. That makes this commit a breaking change, meaning the next release will require a major version bump.

jtackaberry · 2023-09-12T00:03:12Z

I'm planning to implement this for the next release (LuaDox 2.0). I've begun some refactoring work to enable this (among other things, such as support for other annotation conventions), and in the process have been thinking about how best to approach it.

First, the basic idea is that the JSON structure will reflect a hierarchical layout:

Top-level elements (@module, @classand manual pages)
- Collections within the top-level element (@section and @table)
  - Functions and fields within the collection

Each element will contain an id field that uniquely identifies the element within the project. References within markdown (`symbol` and @{symbol}) will be converted to markdown links where the hyperlink is in the form luadox:<id>. (Thanks @Duckwhale for inspiring that idea.)

In terms of markdown, there are a couple wrinkles that seem to necessitate a bit of extra complexity. LuaDox has some tags that need to be parsed, but yet don't directly map to any markdown. Currently these are @see and the two admonition tags @note and @info.

So I'm thinking about handling this by representing markdown content fields in the JSON as an array instead of a string, where the array would contain a list of objects that represent either a markdown string, or some more complex parsed field such as an admonition.

For example:

{
  "id": "foo.baz",
  "type": "class",
  "content": [
    {
      "markdown": "### Some heading\n\nSome text goes here."
    },
    {
      "type": "admonition",
      "level": "warning",
      "title": "Beware!",
      "content": [
        {
          "type": "Markdown within the admonition that has a @see tag"
        },
        {
          "type": "see",
          "ids": [
            "bar.one",
            "bar.two"
          ]
        }
      ]
    },
    {
      "markdown": "More markdown after the admonition [with a link](luadox:bar.two)"
    }
  ],
  "functions": [
    "stuff goes here"
  ],
  "fields": [
    "stuff goes here"
  ]
}

Or, as yaml, because it'll be trivial to support:

id: foo.baz
type: class
content:
  - markdown: |-
      ### Some heading

      Some text goes here.
  - type: admonition
    level: warning
    title: Beware!
    content:
      - type: Markdown within the admonition that has a @see tag
      - type: see
        ids:
          - bar.one
          - bar.two
  - markdown: More markdown after the admonition [with a link](luadox:bar.two)
functions:
  - stuff goes here
fields:
  - stuff goes here

That isn't a fully baked document, just depicts how a single collection might be represented within the larger document, and how markdown content is split up into an array like that.

Let me know what you think.

rdw-software · 2023-09-12T04:01:35Z

I can't really comment on the design, but if you have a working prototype I'm happy to give it a spin to get you some feedback :)

Since this would be the input for scripts and other tools, it's probably not too important how the structures are laid out exactly.

@note

Reference resolution logic has been moved from the renderer to the parser (invoked by the prerenderer), where refs are now converted to markdown links using an intermediate `luadox:<refid>` link format. It's up to the renderer to resolve these links to whatever is appropriate. This required introducing the notion of an id to references. Ids are globally unique opaque strings that are tracked by the parser, which the renderer can consult in order to convert an id to a Reference object. This refactoring continues to pave the way for #5 and will allow for different kinds of renderers (not just HTML), where the common logic that applies to all renderers has been moved to the parser and run during the prerender stage. Additionally, tag parsing within content blocks (e.g. handling @tparam, @note, etc.) has been rewritten and hopefully simplified. (Parser.parse_raw_content()) Finally, this commit includes some optimizations: * Compiled regexp objects are now cached and reused, reducing compilation overhead * First sentence detection has been rewritten using a more naive, lower level approach that is significantly faster. During profiling, get_first_sentence() was the most disproportionately expensive functions called.

Implements #5.

jtackaberry · 2023-09-18T02:30:39Z

This is implemented in master now, if anyone's interested in trying it out.

You can install and run out of master using a pipx editable install:

git clone https://github.com/jtackaberry/luadox.git
pipx install -e luadox/
luadox [...your usual arguments...] -r json

-r (or --renderer) controls how the output is rendered. The yaml renderer is also available (which produces smaller and more readable files but is slower).

The structure isn't documented yet, but hopefully it's obvious enough to figure out. Probably the most counterintuitive thing is that for classes and modules, the sections array includes the class and module itself (as evidenced by the id key) . This is intentional because top-level classes/modules have all the same semantics as sections (they can contain documented content, and fields and functions).

Feedback welcome.

rdw-software · 2023-09-22T11:19:31Z

I tried to generate yaml and json files to test the new feature, but I'm always getting this error:

/home/rdw/.local/bin/luadox test.lua --renderer yaml
2023-09-22 13:12:39,507 [INFO] parsing /tmp/luadox-json-test/test.lua
2023-09-22 13:12:39,510 [INFO] prerendering 1 pages
2023-09-22 13:12:39,511 [ERROR] unhandled error rendering around /tmp/luadox-json-test/test.lua:-1: No option 'name' in section: 'project'
Traceback (most recent call last):
  File "/usr/lib/python3.11/configparser.py", line 805, in get
    value = d[option]
            ~^^^^^^^^
  File "/usr/lib/python3.11/collections/__init__.py", line 1004, in __getitem__
    return self.__missing__(key)            # support subclasses that define __missing__
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/collections/__init__.py", line 996, in __missing__
    raise KeyError(key)
KeyError: 'name'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/luadox-json-test/luadox/luadox/main.py", line 249, in main
    renderer.render(toprefs, out)
  File "/tmp/luadox-json-test/luadox/luadox/render/yaml.py", line 47, in render
    project = self._generate(toprefs)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/luadox-json-test/luadox/luadox/render/json.py", line 33, in _generate
    name = self.config.get('project', 'name')
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/configparser.py", line 808, in get
    raise NoOptionError(option, section)
configparser.NoOptionError: No option 'name' in section: 'project'

A few observations:

Using --renderer html works (but yaml and json cause the above error)
I tried passing an empty file, the middleclass example, setting --name TEST and adding the example luadox.conf

This was on WSL (Kali Linux). I could test on other systems as well, but it doesn't seem like a platform-specific issue.

jtackaberry · 2023-09-22T18:49:29Z

@Duckwhale silly oversight on my part, sorry about that. Just committed a fix.

jtackaberry · 2023-09-22T18:59:45Z

BTW @Duckwhale, a specific renderer for Docusaurus is theoretically possible now, and I'd like to have that capability natively in Luadox. So I'm quite interested in your findings here, and really more generally any advice or thoughts you might have on the subject. I've not used Docusaurus yet (and it certainly generates significantly more polished output than LuaDox's current html renderer :)) so I don't yet have any intuitions on the ideal approach.

rdw-software · 2023-09-26T23:47:06Z

Just FYI, I've started building a prototype to see if I can use the JSON output to generate something remotely close to my manually-created docs. I've written down a bunch of feedback already, but it'll take some time to get more insights.

One thing that I can say already is that I wanted a way to find out which source file a (top-level entry) originates from. This is so I can add project specific tags that likely wouldn't have to be added to the tool itself, such as "FFI/Unsafe API" or "External", which are useful things to list in a documentation but needn't be custom tags necessarily. Or maybe it's already possible to get this info?

I guess it would be possible to chain find into luadox and then save the file path alongside the output. Pretty awkward, though.

jtackaberry · 2023-09-26T23:57:14Z

I can definitely add a source key or some such to the top-level entries. Good idea. Looking forward to learning more about your experience with the prototype!

LeighMcRae · 2024-03-12T22:31:24Z

This is implemented in master now, if anyone's interested in trying it out.

You can install and run out of master using a pipx editable install:
git clone https://github.com/jtackaberry/luadox.git
pipx install -e luadox/
luadox [...your usual arguments...] -r json
-r (or --renderer) controls how the output is rendered. The yaml renderer is also available (which produces smaller and more readable files but is slower).

I was having trouble getting this to run from source. I'm sure it was my lack of python experience. Maybe add this to the front page for other people. It was really useful for me.

jtackaberry added a commit that referenced this issue Sep 18, 2023

Add JSON and YAML renderers

7211b1a

Implements #5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieving the parsed structures for external processing by third-party tooling #5

Retrieving the parsed structures for external processing by third-party tooling #5

rdw-software commented Aug 16, 2022

jtackaberry commented Aug 16, 2022

rdw-software commented Aug 17, 2022 •

edited

Loading

pakeke-constructor commented Sep 26, 2022 •

edited

Loading

jtackaberry commented Sep 12, 2023

rdw-software commented Sep 12, 2023

jtackaberry commented Sep 18, 2023 •

edited

Loading

rdw-software commented Sep 22, 2023 •

edited

Loading

jtackaberry commented Sep 22, 2023

jtackaberry commented Sep 22, 2023

rdw-software commented Sep 26, 2023

jtackaberry commented Sep 26, 2023

LeighMcRae commented Mar 12, 2024

Retrieving the parsed structures for external processing by third-party tooling #5

Retrieving the parsed structures for external processing by third-party tooling #5

Comments

rdw-software commented Aug 16, 2022

jtackaberry commented Aug 16, 2022

rdw-software commented Aug 17, 2022 • edited Loading

pakeke-constructor commented Sep 26, 2022 • edited Loading

jtackaberry commented Sep 12, 2023

rdw-software commented Sep 12, 2023

jtackaberry commented Sep 18, 2023 • edited Loading

rdw-software commented Sep 22, 2023 • edited Loading

jtackaberry commented Sep 22, 2023

jtackaberry commented Sep 22, 2023

rdw-software commented Sep 26, 2023

jtackaberry commented Sep 26, 2023

LeighMcRae commented Mar 12, 2024

rdw-software commented Aug 17, 2022 •

edited

Loading

pakeke-constructor commented Sep 26, 2022 •

edited

Loading

jtackaberry commented Sep 18, 2023 •

edited

Loading

rdw-software commented Sep 22, 2023 •

edited

Loading