Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #1

Open
14 of 62 tasks
TiarkRompf opened this issue Nov 22, 2023 · 0 comments
Open
14 of 62 tasks

Roadmap #1

TiarkRompf opened this issue Nov 22, 2023 · 0 comments
Labels

Comments

@TiarkRompf
Copy link
Member

TiarkRompf commented Nov 22, 2023

Collecting some notes on possible short- and medium-term todo items here:

Short-Term: Initial Open Source Release

Logistics

  • Publish NPM package #2
  • Reorganize files (split into front-end, IR, codegen)
  • Add test suite (use Node.js test runner)
  • CI setup (GitHub Actions), automate testing on pull requests, automate website update etc
  • Basic command-line tool (similar to jq, awk, ...)

Presentation

  • Readme files
  • Website: set up a suitable template such as https://www.docsy.dev
  • Logo, some images (I got some not-terrible results from ChatGPT/DALL-E asking for visualizations of code poetry)
  • Propagate website with documentation: getting started, examples, howto, API reference

Straightforward improvements

  • Basic parser for expressions (e.g. data.*.foo / total)
  • JS template literal parser (e.g. rh"data.${q}.b")
  • Surface syntax: extend to incl pipe syntax, e.g. for Advent of Code
  • Consistent internal naming (xxkey etc, use something like _rhkey?)
  • Ensure reasonably complete set of operators (e.g. string split, string join, to number, to string, comparison ops, ...)
  • Consistent use and interop between different APIs (standard, pipe, template literal)
  • Improve error handling, ensure sensible error messages

Near-Term

Improve UX

  • Surface syntax: include let/def, string literals with escape chars, source comments (important for stand-alone / CLI usage)
  • Standardize API syntax (for integrations, should not rely on meta-language variables and functions for composition)

Performance

  • array with multiple generator elements: efficient flatten!
  • function calls like 'split': need cse!
  • tmp[4] could be a local var tmp4

Correctness/expressiveness

  • handling of undefined/null/NaN
    • specifically: array.push currently accumulates 'undefined' but should
  • related: observe 'undefined'
    • comparison as failure
    • if/else via '??' (else/or) and '&&' (if/and), e.g. data.*.a ?? "(n/a)" or *a % 3 == 1 && sum(data.*a)
      • design questions: should sum of an empty collection be 0 or undefined? (currently impl is 0)`
    • outer joins

Examples/demos

  • joins across rest calls: example with 'fetch' primitive/udf
  • release visualization package
  • React Todo app (first cut)

Medium-Term: Integrations & Features

Integrations

  • JS Ecosystem: React, Vega, ...
  • Python ecosystem: numpy, Jax, einsum/einops (as front-end or back-end)

Features I

Sorting

  • Sorted tree data structures in addition to hash maps (towards RPAI)

Null, False (see above, some questions may be postponed here)

  • Missing values: deal with null, undefined in a principled way (see note on reification of failure below). Default should be to produce no result on any path hitting undefined (inner join semantics), but we could have .*A? or something similar as marker to propagate undefined.
  • Explicit conditionals (observe when a path produces no value)
  • Outer joins

Features II

Structural Recursion

  • Variables that can abstract over multiple index steps, i.e. data..*A.foo for data.foo, data.a.foo, data.a.b.foo, etc (similar to https://jsonpath.com, also related to shape-polymorphic arrays in APL, J, Q, K)
  • Question: longest or shortest match? (compare greedy vs lazy operators in regexps)

Generative Recursion (Fixpoints)

  • Datalog-style seminaive evaluation, combining structural recursion and incrementality (below)

Type and schema support

  • Key idea: same language for expressing schema, e.g. { foo: { *: Int } }
  • Identify dense arrays for tensor workloads
  • Identify sorted collections

Incrementality

  • Generate update triggers as alternative to loops
  • Pick a representation for delta inputs (insertion and deletion, explicit null could mark deletion), take a look at Delta-JSON, JSONDIff, etc
  • Fuse multiple updates into a single query

Bidirectionality

  • Track provenance of output values
  • Partial re-evaluation based on output changes, e.g. filter or select in UI, re-evaluate incrementally
  • AutoDiff for numeric computations?

Semantics

Reification of the search process

  • Semantic view of data.*.key as stream, sum(data.*.key) as reification of entire stream
  • last and first as key reification operators (sum has built-in last, but could also be used as running sum, first corresponds to one in Verse).
  • Empty stream = missing value, implement proper outer joins by making this observable (reification of failure)
  • Should actual errors be observable as well? Exception handling as another form or reification

Unification of functions and data

  • Equivalence of {*x: e} and $\lambda x. e$
  • Equivalence of f: { data.*x.key: e} and $\forall x. f(data.x.key) = e$

Optimization

Graph IR implementation

  • Base implementation more directly on variants of LMS-IR, extended with first, last reification operators to guide loop reconstruction
  • How do these reification operators fit into a framework of 'gen, use, kill' effects as in reachability types?

Loop scheduling algorithms

  • Incorporate query planning algorithms for database-like workloads
  • Incorporate polyhedral approaches (or similar) for numeric workloads

Instruction selection for pre-existing kernels

  • Pattern match to target BLAS, cuDNN, etc.

Back-ends

  • C, CUDA, WASM, WebGPU
  • parallelism, distributed execution
@TiarkRompf TiarkRompf pinned this issue Nov 26, 2023
@TiarkRompf TiarkRompf unpinned this issue Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant