Skip to content

Sum Factorization

Igor Baratta edited this page Dec 16, 2021 · 5 revisions

The model of a single quadrature loop for all computations in a kernel is not suitable for sum-factorization. So the first step is to rearrange the computations in separate micro-kernels, and apply the sum-factorization technique to the relevant kernels.

We have identified three micro kernels (MK) that could be computed in "sequence":

MK 1:

Evaluate coefficient functions at quadrature points.

  • Input: Nc pairs of coefficients + tabulated basis functions
    • W_0[N0], W_1[N1], ... , W_Nc[N]
    • Phi_0[Nq][N0], Phi_1[Nq][N0], .., Phi_0[Nq][N0]
  • Output: M arrays of coefficient at quadrature points
    • w_0[Nq], w_1[Nq], ..., w_M[Nq]

Note that M != Nc is allowed. For a simple matrix free poisson kernel M = 3 and Nc = 1 in 3d.

MK 2:

Compute and store Jacobian for each quadrature point

  • Input: Cell coordinates, 1st order derivative coordinate element basis.
  • Output: J[Nq][gdim][tdim]

Storage: Max = Nq * 9 Currently the Jacobians for non-affine coordinate maps are compute on the fly, one per quadrature point.

MK 3:

Scale and transform quadrature data using Jacobian.

  • Input: Jacobian J and coefficient data at quadrature points w_i (0 <= i <= M-1).
    • J[Nq][gdim][tdim], w_0[Nq], w_1[Nq], ..., w_M[Nq]
  • Output: Q arrays with scaled and transformed data at quadrature points, where Q is the number of basis functions + the number of basis derivatives in the weak form.
    • fw_0[Nq], fw_1[Nq], .., fw_Q[Nq]

For example Poisson in 3d:

  • fw0[Nq], fw1[Nq], fw2[Nq] for Dphi0, Dphi1, Dphi2 respectively.
  • For mass action in 3d: fw0[Nq] for Phi0

MK 4:

Compute contributions to local tensor

  • Input: Q pairs of coefficients + basis functions.
    • fw_0[Nq], fw_1[Nq], .., fw_Q[Nq], Phi_0, Phi_1, Phi_Q
  • Output: Local tensor
    • A[Nd]

Depency graph:

Data from MK2 can be pre-computed and stored, since it only depends on geometry and coordinate element type.

Once the kernels are rearranged in micro kernels one can apply the sum-factorization on kernesl 1, 2 and 4 separately.

Clone this wiki locally