Loop Domain¶
Note
(Vaporware Warning). This section mostly contain pseudo-codes that are not implemented yet. The plans may change at anytime.
The loop domain of a lazy array is an integer set of all work items.
In lappy
, it is represented as a set expression.
Expression Tree¶
The expression tree of an array domain_expr
starts from leaf nodes
of type SetVar
, SetParam
or SetZero
.
It has two levels:
(Affine Expressions). The nodes close to the leaf level that are of type
PwAff
. They consist the expression trees of the (irreducible) low-dimensional polygons of typePwAffComparison
. To avoid branches, the polygons’ subtrees should have disjoint variables (inames), but may share parameters (loop bounds).(Integer Set Expression). The remaining nodes of type
SetIntersection
andSetUnion
represent the expression from the basic polygons to the loop domain.
The integer set expression exposes structures within the loop domain that could benefit array manipulations, e.g.,
The
PwAff
objects are labeled consistently. ISL treat inames as purely positional placeholders, which may not be as handy. For example, an FEM code may want to tag an iname as'i_q_point'
. Position of the iname may change as the array is manipulated (e.g. transpose).The affine expressions are nonlinearity-tolerant w.r.t paramters. ISL only allow affine operations among paramters. When assembling the array expression, some loop bounds do not become affine until more shaped information are fixed (e.g. reshape).
Temporaries and CSEs¶
Here is a sample integer set expression, where
OR
represents set union, AND
represents set intersection.
Consider the following pseudo-code:
n = Unsigned('n')
k = arange(n)
c = outer(sin(k), cos(k))
The expression for c
is flattened by default, yielding a single instruction like
(names are replaced for readability).
for i, j
c[i, j] = sin(i) * cos(j)
end
If the user choose to make k
a temporary,
n = Unsigned('n')
k = arange(n)
k.pin()
c = outer(sin(k), cos(k))
The instructions for c
shall be like
for i, j
<> k[i] = i {id=k}
c[i, j] = sin(k[i]) * cos(k[j]) {dep=k}
end
The loop domain is unchanged,
but depending on the loop nesting order, k
may be computed multiple times.
AND
/-------/ \-------\
| |
{ 0 <= i < n } { 0 <= j < n }
If the user choose to make k
a CSE,
n = Unsigned('n')
k = arange(n)
k.pin_cse()
c = outer(sin(k), cos(k))
The inames and temporaries of k
are then duplicated.
It should produce the same results as evaluating k
separately
by calling k.eval()
and pass in its value as an argument.
for i_dup
<> k[i_dup] = i_dup {id=k}
end
for i, j
c[i, j] = sin(k[i]) * cos(k[j]) {dep=k}
end
And the loop domain acquires a new branch,
OR
/------------------/ \-------------\
| |
AND { 0 <= i_dup < n }
/-------/ \-------\
| |
{ 0 <= i < n } { 0 <= j < n }
Diamond Problem¶
The following code produces a diamond structure in the computation graph:
A = ndarray('A', shape='n, n')
x = ndarray('x', shape='n')
k = A @ x
sk = sin(k)
ck = cos(k)
C = outer(sk, ck)
The data flow looks like (top to bottom):
A @ x
|
k
/\
/ \
sk ck
\ /
\/
c
By default, the lazy evaluation expands all expressions, yielding redundant computations.
c[i, j] = sin(sum([k], A[i, k] * x[k])) * cos(sum([k], A[j, k] * x[k]))
Note that the matvec
is computed twice above.
Besides Loopy’s ability to recognize optimization opportunities, user can
also make k
a temporary or a CSE.
A = ndarray('A', shape='n, n')
x = ndarray('x', shape='n')
k = A @ x
k.pin()
sk = sin(k)
ck = cos(k)
C = outer(sk, ck)
Making k
a temporary yields:
<> k[i] = sum([k], A[i, k] * x[k]) {id=k}
c[i, j] = sin(k[i]) * cos(k[j]) {dep=k}
Similarly,
A = ndarray('A', shape='n, n')
x = ndarray('x', shape='n')
k = A @ x
k.pin_cse()
sk = sin(k)
ck = cos(k)
C = outer(sk, ck)
Making k
a CSE yields:
for i_dup
<> k[i_dup] = sum([k], A[i_dup, k] * x[k]) {id=k}
end
for i, j
c[i, j] = sin(k[i]) * cos(k[j]) {dep=k}
end