Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments on Matrix Autodiff Type #32

Open
betanalpha opened this issue Jan 5, 2021 · 16 comments
Open

Comments on Matrix Autodiff Type #32

betanalpha opened this issue Jan 5, 2021 · 16 comments

Comments

@betanalpha
Copy link

betanalpha commented Jan 5, 2021

A few comments for consideration in the matrix autodiff design document.

  1. As much as the matrix of vars enables better performance it also greatly facilities the implementation matrix-valued reverse mode updates, as demonstrated in the matrix product example. The resulting code is not only easier to develop but also easier to read -- all in addition to likely being much faster. Perhaps worthy of mentioning in the summary.

  2. Given the amount of compiler logic proposed it would be great to have a compiler option that returns an annotated Stan program indicating with variables were assigned matrix<var> and which were assigned var<matrix>. This would help users track down undesired casts down to matrix<var> and optimize their code without having to guess at the underlying compiler logic.

  3. It would also be great to have some way to force a conversion from a matrix<var> to a var<matrix>, even if only with a function meant for advanced users. I'm thinking in particular of Gaussian process applications where custom Gram matrices are built element by element in a user-defined function, but afterwards are used used as only monolithic matrices. My understanding of the proposed logic is that this would propagate a matrix<var> throughout the Stan program and miss out on potential performance unless there is some way to force a user-defined function to return a var<matrix> to cast the return, for example

functions {
  matrix custom_covar(...);
}
...
model {
  matrix[N, N] custom_gram = to_varmat(custom_covar(...));
}
@rok-cesnovar
Copy link
Member

rok-cesnovar commented Jan 5, 2021

Given the amount of compiler logic proposed it would be great to have a compiler option that returns an annotated Stan program indicating with variables were assigned matrix and which were assigned var. This would help users track down undesired casts down to matrix and optimize their code without having to guess at the underlying compiler logic.

A user could do this by printing the transformed MIR from stanc3. bin/stanc --debug-transformed-mir model.stan or --debug-transformed-mir-pretty

var<matrix> parameters will have a _varmat__ suffix, which will annotate internally that the parameter C++ code should be generated differently. Is that enough or should there be more "user-friendly"?

My understanding of the proposed logic is that this would propagate a matrix throughout the Stan program and miss out on potential performance unless there is some way to force a user-defined function to return a var to cast the return, for example

Your understanding is correct, it would propagate as matrix<var>. Such opportunities are missed because we did not want to get into the compiler analyzing potential opportunities where doing a to_varmat might actually be worth it because its much harder to validate that there will be no performance regressions for current models. I like the to_varmat call honestly, though it should come with a warning that the C++ might not compile.

@SteveBronder
Copy link
Collaborator

I like the to_varmat call honestly, though it should come with a warning that the C++ might not compile.

Yeah I'd be fine with having that exposed

@betanalpha
Copy link
Author

var parameters will have a varmat_ suffix, which will annotate internally that the parameter C++ code should be generated differently. Is that enough or should there be more "user-friendly"?

I think that would be friendly enough for the more advanced users who would be digging into these optimizations, in which case it's more of a documentation concern. A good opportunity for a nice section in the manual, users' guide, or even Discourse about effective use of Stan's linear algebra library.

Such opportunities are missed because we did not want to get into the compiler analyzing potential opportunities where doing a to_varmat might actually be worth it because its much harder to validate that there will be no performance regressions for current models.

I totally understand why this would be hard for the compiler logic (although a great example of why Stan3 sets the stage for all kinds of potential sophisticated optimizations going forwards). My only concern is that there are likely to be enough exceptions where advanced users would know safe places to cast that the functionality would be beneficial.

I like the to_varmat call honestly, though it should come with a warning that the C++ might not compile.

The more warnings the better!

@bob-carpenter
Copy link
Collaborator

I do not think we should add a to_varmat() function to the language, primarily because it's using the arbitrary name given to a class in Stan math rather than something in the user's language.

I would much prefer to add a notion of const variable to Stan. So that'd be something like:

const matrix[M, N] Sigma = my_funky_gm_covariance(...);

Then it wopuld be illegal to subsequently reassign elements Sigma[m, n]. This is more restrictive than we need---there's no problem with reassigning all of Sigma, it's just the elements that are problematic. But I think that'll be OK. It'd also support meaningful error messages for users trying to reassign.

I'd even be OK with const_matrix as a unified type if we don't want to do this for other types, but we probably want to allow anything to be declared const if we're going to allow const.

From the compiler perspective, any matrix that is never assigned to again can be assigned a const type. This is just like const in C++. I don't see much room for useful user choice here. I guess for a matrix that doesn't ever get used for arithmetic or linear algebra, it would be wasteful to eagerly reassign to varmat.

@betanalpha
Copy link
Author

betanalpha commented Jan 5, 2021 via email

@SteveBronder
Copy link
Collaborator

I'd be fine with adding a new type to Stan for these, but tbh the biggest thing is what the gosh dang to name these! It's really just moving from an array of structs to a struct of arrays. const is okay and the sparse matrix prototype in the Stan compiler has the doo-dads to figure out when a const matrix is being assigned to illegally. Though I kind of worry with a const keyword that people will start writing their programs in weird ways so that they can trick the compiler to give them the new matrix type which is why I'd rather keep it 99% a thing the compiler just does and then have some weird esoteric function/keyword for advanced users.

I kind of like bobs suggestion of encase, though it would be nice to avoid a new keyword if possible. It might be easier on the compiler side to have some function that, when the compiler sees it, it knows to force the use of the new matrix type

@betanalpha
Copy link
Author

betanalpha commented Jan 5, 2021 via email

@SteveBronder
Copy link
Collaborator

Let me flip the table, what if we gave it a bad name on purpose so it looks smelly? I kind of like exposing to_var_matrix() then we have in the docs something like

soa_matrix to_var_matrix(aos_matrix)
soa_matrix to_var_matrix(soa_matrix)

WARNING: This function is intended only for advanced users.

Takes in the standard Stan matrix type as an array of structs (aos) and forces the compiler to convert it to the the struct of arrays (aos) format (Aka from an Eigen::Matrix<var, R, C> to an var_value<Eigen::Matrix<double, R, C>>. This function can cause the Stan compiler to produce code that will not compile. This function has O(N^2) cost.

I like that because it sounds kind of scary / technical which should hopefully dissuade anyone that doesn't know what they are doing from using it. Though the only bad thing about not having a keyword is that this can't be used to make parameters default to var<Matrix>

@betanalpha
Copy link
Author

betanalpha commented Jan 6, 2021 via email

@bob-carpenter
Copy link
Collaborator

I kind of like exposing to_var_matrix()

This is in developer language, not user language. We could change the name, but I think having something that looks more like a compiler hint as @betanalpha is suggesting without adding a new type is the way to go.

I kind of like bobs suggestion of encase

I don't remember making that suggestion. I prefer "immutable" as the term for the new matrix structures. That's because we can allow holistic assignment, just not element-level assignment. I imagine something like this.

immutable matrix[M, M] a;
vector[M] v;
a = diag_matrix(v);  // LEGAL: assign a matrix to an immutable matrix
a[i, j] = y; // ILLEGAL: assign an element of an immutable matrix

This is a constraint on how a variable is used. It does not introduce a new type of variable.

What's the state of functions accepting matrices? Have they all been updated to allow var_mat? If so, then declaring immutable guarantees that var_mat is an acceptable target. Otherwise, it's just a compiler hint.

@rok-cesnovar
Copy link
Member

That's because we can allow holistic assignment, just not element-level assignment.

Just to clarify, we do allow element-level assignment to varmat as well. Its just a lot slower (2-4 times was the estimate from a benchmark @bbbales2 made).

Its non trivial for the compiler to "guess" when varmat would actually still be faster for a given expression even if it contains some element-level accesses which is why we resorted to a conservative approach to back-fall to matvar in these cases because that guarantees the model will at worst have the same performance as it had in the last release.

For example

matrix[N, N] a;
matrix[N, N] b;

matrix[N, N] c = a*b;

a[10,10] = a[10,10] + 2.0;

would definitely be faster with varmat, but as a has element-level access, the compiler will make it a matvar.

We have not figured out any simple and clear set of rules that would guarantee no performance regression and would also be easy to test. I am personally scared of any sort of heuristic rules in the compiler, but I am sure there are some optimization strategies that someone with more knowledge of compiler design can figure out in the future.

Have they all been updated to allow var_mat?

Not all of them, but a lot of them have. The compiler implementation (on a branch, nothing is on master yet) uses a list of varmat-supported functions to figure out when varmat can be used.

@rok-cesnovar
Copy link
Member

rok-cesnovar commented Jan 6, 2021

Another idea that does come to mind is that we would have a --varmat flag (name not final :)) for the compiler that would use var<matrix> anywhere legally possible and only convert to matrix<var> when there is no other option (like for a function that does not support var<matrix>.

This would work and be faster under the assumption that the user writes the model using vectorized functions wherever possible.

My hunch is that once all or almost all functions support var<matrix> this will always be faster then any conservative approach apart from models written inefficiently. There is just no way of proving that (EDIT: by that I mean I dont have a way of proving that) which is why I think this mode can not be the default one.

@bob-carpenter
Copy link
Collaborator

Just to clarify, we do allow element-level assignment to varmat as well. Its just a lot slower (2-4 times was the estimate from a benchmark @bbbales2 made).

Thanks for clarifying. But why allow that? It seems like it'd be a huge hassle on the math lib side and be really inefficient in the case of either a lot of assignments or big matrices, depending on how it's implemented.

Is there a link to the benchmark somewhere? Where do I find the implementation of var that allows assignment?

I somehow missed most of this discussion as it was going down.

element-level access

Access is fine for getting values, just not for assigning.

once all or almost all functions support var this will always be faster then any conservative approach

How do we build up a covariance matrix for GPs if the only type we have is var? I would think we'd need something like comprehensions.

apart from models written inefficiently.

The way to make that precise would be to claim that the most efficient way to write any model is using var<mat>. That's why I'm asking about GPs.

@rok-cesnovar
Copy link
Member

But why allow that? It seems like it'd be a huge hassle on the math lib side and be really inefficient in the case of either a lot of assignments or big matrices, depending on how it's implemented.

By "we allow it", I mean it can be done in Stan Math. The current conservatively-envisioned compiler implementation would not allow it in Stan. For benchmarks see #29 (comment) and two posts below that. For row and column-wise see #30 (comment) (those are faster for varmat).

For where to find all the parts of that implementation I am going to defer to @t4c1 @SteveBronder and @bbbales2. I would just point you to the wrong place :)

@bbbales2
Copy link
Member

bbbales2 commented Jan 6, 2021

Here's the benchmark of reading or writing scalars: link. There's another one that reads + writes below it.

As an example of why accessing subsets is handy, here are benchmarks of similar operations on rows/columns: link

This is the implementation (could be other versions of this elsewhere, but this is one place where this is done).

The routine is, on assignment to a varmat, add a vari to the stack that zeroes the adjoints and undoes the assignment in the reverse pass. I think this changes the behavior of doing two reverse passes without a zero, but I don't know of a reason to do that.

How do we build up a covariance matrix for GPs if the only type we have is var?

Slowly

I would think we'd need something like comprehensions.

The latest on that conversation is the question here and the response here.

Edit: The short answer on comprehensions is that varmat ended up being useful without them and I'm not actually sure how to do them in a way that would be much more efficient than the way we build GPs in Stan code now -- there's still be tons of varis everywhere.

@bbbales2
Copy link
Member

Relevant to this discussion, we're postponing the varmat release until 2.27 (check here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants