You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to write an absolute minimal amount of code to implement a multi-headed self-attention layer. I want to try to do this with TensorCast.jl, both to learn the syntax better, and perhaps as a nice demo of ML in Julia.
Right now I am trying to compute the query matrix in one go. This works:
batch =4
length =100
m =32# Data:
x =randn(Float32, length, m, batch)
heads =10# Layer to compute Q for all heads:
Q =Dense(m, m*heads)
# Computation:@cast q1[ℓ,ch,n] :=Q(x[ℓ,:,n])[ch]
@cast q2[ℓ,c,h,n] := q1[ℓ,(c,h),n] (h in1:heads)
I think it's not meant to work, but should have a friendlier error. A simpler example which does:
julia>using TensorCast
julia> slices =eachcol(rand(12, 12));
julia>size(@cast _[a,b,c] := slices[(a,b)][c] a in1:3)
(3, 4, 12)
julia>size(@cast _[a,b,c] := slices[a][(b,c)] b in1:3)
ERROR: LoadError: can't tensor product inner indices
@cast _[a, b, c] := (slices[a])[(b, c)] b in1:3
@ Main REPL[62]:1
The reason (if I remember right) is that reshape comes before stack in its list of possible operations. I don't think there's a strong reason it couldn't allow this, allow two different reshapes, except a bit more complexity.
I'm trying to write an absolute minimal amount of code to implement a multi-headed self-attention layer. I want to try to do this with TensorCast.jl, both to learn the syntax better, and perhaps as a nice demo of ML in Julia.
Right now I am trying to compute the query matrix in one go. This works:
However, if I try to combine them in one go:
it gives me the error:
Is this sort of thing possible? Or maybe it's too tricky to stack notation like this?
Thanks,
Miles
The text was updated successfully, but these errors were encountered: