-
-
Notifications
You must be signed in to change notification settings - Fork 42
Sum Factorization
The model of a single quadrature loop for all computations in a kernel is not suitable for sum-factorization. So the first step is to rearrange the computations in separate micro-kernels, and apply the sum-factorization technique to the relevant kernels.
We have identified three micro kernels (MK) that could be computed in "sequence":
Evaluate coefficient functions at quadrature points.
- Input: Nc pairs of coefficients + tabulated basis functions
- W_0[N0], W_1[N1], ... , W_Nc[N]
- Phi_0[Nq][N0], Phi_1[Nq][N0], .., Phi_0[Nq][N0]
- Output: M arrays of coefficient at quadrature points
- w_0[Nq], w_1[Nq], ..., w_M[Nq]
Note that M != Nc
is allowed. For a simple matrix free poisson kernel M = 3
and Nc = 1
in 3d.
Compute and store Jacobian for each quadrature point
- Input: Cell coordinates, 1st order derivative coordinate element basis.
- Output: J[Nq][gdim][tdim]
Storage: Max = Nq * 9 Currently the Jacobians for non-affine coordinate maps are compute on the fly, one per quadrature point.
Scale and transform quadrature data using Jacobian.
- Input: Jacobian
J
and coefficient data at quadrature pointsw_i
(0 <= i <= M-1)
.- J[Nq][gdim][tdim], w_0[Nq], w_1[Nq], ..., w_M[Nq]
- Output: Q arrays with scaled and transformed data at quadrature points, where Q is the number of basis functions + the number of basis derivatives in the weak form.
- fw_0[Nq], fw_1[Nq], .., fw_Q[Nq]
For example Poisson in 3d:
- fw0[Nq], fw1[Nq], fw2[Nq] for Dphi0, Dphi1, Dphi2 respectively.
- For mass action in 3d: fw0[Nq] for Phi0
Compute contributions to local tensor
- Input: Q pairs of coefficients + basis functions.
- fw_0[Nq], fw_1[Nq], .., fw_Q[Nq], Phi_0, Phi_1, Phi_Q
- Output: Local tensor
- A[Nd]
Data from MK2 can be pre-computed and stored, since it only depends on geometry and coordinate element type.
Once the kernels are rearranged in micro kernels one can apply the sum-factorization on kernesl 1, 2 and 4 separately.