Skip to content
/ mdq Public

like jq but for Markdown: find specific elements in a md doc

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

yshavit/mdq

Repository files navigation

mdq: jq for Markdown

Code Coverage Build status Pending TODOs Ignored tests

What is mdq?

mdq aims to do for Markdown what jq does for JSON: provide an easy way to zero in on specific parts of a document.

For example, GitHub PRs are Markdown documents, and some organizations have specific templates with checklists for all reviewers to complete. Enforcing these often requires ugly regexes that are a pain to write and worse to debug. Instead, you can (for example) ask mdq for all uncompleted tasks:

mdq '- [ ]'

mdq is available under the Apache 2.0 or MIT licenses, at your option. I am open to other permissive licenses, if you have one you prefer.

Installation

Any of these will work:

  1. # (Mac and Linux, with brew installed)
    brew install mdq
  2. docker pull yshavit/mdq
    echo 'My [example](https://github.com/yshavit/mdq) markdown' | docker run --rm -i yshavit/mdq '[]()'
  3. Download binaries from the latest release (or any other release, of course).

    • Macs quarantine downloads from the internet by default. If you get an error saying that Apple cannot check the the binary for malicious software, you can remove this flag by running the following on the binary after extracting it from the artifact zip:
      xattr -d com.apple.quarantine mdq
    • You can also grab the binaries from the latest build-release workflow run. You must be logged into GitHub to do that (this is GitHub's limitation, not mine). You'll have to chmod +x them before you can run them.
  4. cargo install --git https://github.com/yshavit/mdq

    Requires rustc >= 1.78.0

Security concerns The release and latest-workflow binaries are built on GitHub's servers, so if you trust my code (and dependencies), and you trust GitHub, you can trust the binaries. See https://github.com/yshavit/mdq/wiki/Release-binaries for information on how to verify them.

Basic Usage

Simple example to select sections containing "usage":

cat example.md | mdq '# usage'

Use pipe (|) to chain filters together. For example, to select sections containing "usage", and within those find all unordered list items:

cat example.md | mdq '# usage | -'

The filter syntax is designed to mirror Markdown syntax. You can select...

Element Syntax
Sections # title text
Lists - unordered list item text
" 1. ordered list item text
" - [ ] uncompleted task
" - [x] completed task
" - [?] any task
Links [display text](url)
Images ![alt text](url)
Block quotes > block quote text
Code blocks ```language <code block text>
Raw HTML </> html_tag
Plain paragraphs P: paragraph text
Tables :-: header text :-: row text

(Tables selection differs from other selections in that you can actually select only certain headers and rows, such that the resulting element is of a different shape than the original. See the example below, or the wiki for more detail.)

In any of the above, the text may be:

  • an unquoted string that starts with a letter; this is case-insensitive
  • a "quoted string" (either single or double quotes); this is case-sensitive
  • a string (quoted or unquoted) anchored by ^ or $ (for start and end of string, respectively)
  • a /regex/
  • omitted or *, to mean "any"

See the tutorial for a bit more detail, and user manual for the full picture.

Examples

Ensuring that people have searched existing issues before submitting a bug report

Many projects have bug report templates that ask the submitter to attest that they've checked existing issues for possible duplicates. In mdq, you can do:

if echo "$ISSUE_TEXT" | mdq -q '- [x] I have searched for existing issues' ; then
  ...

(The -q option is like grep's: it doesn't output anything to stdout, but exits 0 if any items were found, or non-0 otherwise.)

This will match:

  • I have searched for existing issues

... but will fail if the checkbox is unchecked:

  • I have searched for existing issues

Extracting a referenced ticket

Some organizations use GitHub Actions to update their ticket tracker, if a PR mentions a ticket. You can use mdq to extract the link from Markdown as JSON, and then use jq to get the URL:

TICKET_URL="$(echo "$PR_TEXT"
  | mdq --output json '# Ticket | [](^https://tickets.example.com/[A-Z]+-\d+$)'
  | jq -r '.items[].link.url')"

This will match Markdown like:

Ticket

https://tickets.example.com/PROJ-1234

Whittling down a big table

Let's say you have a table whose columns reference people in an on-call schedule, rows correspond to weeks in YYYY-MM-DD format:

On-Call Alice Bob Sam Pat
2024-01-08 x
2024-01-15 x
2024-01-22 x

To find out when Alice is on call:

cat oncall.md | mdq ':-: /On-Call|Alice/:-: *'
|  On-Call   | Alice |
|:----------:|:-----:|
| 2024-01-08 |   x   |
| 2024-01-15 |       |
| 2024-01-22 |       |

Or, to find out who's on call for the week of Jan 15:

cat oncall.md | mdq ':-: * :-: 2024-01-15'
|  On-Call   | Alice | Bob | Sam | Pat |
|:----------:|:-----:|:---:|:---:|----:|
| 2024-01-15 |       |     |  x  |     |

Development

Requires rustc >= 1.78.0

cargo build
cargo test

About

like jq but for Markdown: find specific elements in a md doc

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Languages