NJU Static Program Analysis 06: Data Flow Analysis IV

NJU Static Program Analysis 06: Data Flow Analysis IV

Abstract

  • Understand the functional view of iterative algorithm
  • The definitions of lattice and complete lattice
  • Understand the fixed-point theorem
  • How to summarize may and must analyses in lattices
  • The relation between MOP and the solution produced by the iterative algorithm
  • Constant propagation analysis
  • Worklist algorithm

Notes

Following the previous lecture, we continue to answer the question:

  • Is the algorithm guaranteed to terminate or reach the fixed point, or does it always have a solution?

  • If so, is there only one solution or only one fixed point? If more than one, is our solution the best one(most precise)?

For the first question, we have already described that it is related to the monotonicity of function \(F\). While for the second question, after the introduction of fixed point theorem, we arrive at the conclusion that the iterative algorithm will finally arrive at a best fixed point on the lattice, if \(F\) is monotonic.

Thus we continue to prove the monotonicity of \(F\), which consists of:

  • transfer function \(f_i: L \to L\) for every node, which is monotonic
  • join/meet function \(\sqcap, \sqcup: L \to L\) for control-flow confluence

To prove this, we only need to prove the monotonicity of \(\sqcup\) and \(\sqcap\). Here given is the proof of \(\sqcup\), that of \(\sqcap\) is obviously similar.


\(P.f.\)

\(\forall~x, y, z \in L, x \sqsubseteq y,\) by the definition of \(\sqcup\) we have:

\[y \sqsubseteq y \sqcup z \]

and by the transitivity of \(\sqsubseteq\) we have:

\[x \sqsubseteq y \sqcup z \]

thus \(y \sqcup z\) is an upper bound for \(x\) and \(z\),

as \(x \sqcup z\) is the least upper bound of \(x\) and \(z\).

thus \(x \sqcup z \sqsubseteq y \sqcup z\), which means \(\sqcup\) is monotonic.


Additionally, there is one more general question:

  • When will the algorithm reach the fixed point, or when can we get the solution?

    To answer this question, we need to introduce the height of a lattice \(\bf{h}\), the length of the longest path form \(\top\) to \(\bot\) in the lattice.

    In each iteration, assume the worst case that in a single node, only one step in the lattice is made. Note the height of the lattice is \(h\) and the number of nodes in the CFG is \(k\), naturally we only need \(h \cdot k\) iterations at most before the algorithm reach the fixed point.

So far, we can illustrate a conclusion for all the previous points:

In this context, another question emerges that how can we measure the precision of our algorithm? To answer this, we again need to introduce a new concepts:

  • Meet-Over-All-Paths(MOP) Solution

    Provided a path from \(entry\) to a specific node \(S_i\), considering a path on the graph from \(entry\) to \(S_i\) as \(P\), we can define a transfer function \(F_P = f_{S_1} \circ f_{S_2} \circ \dots \circ f_{S_i}\). Then we have:

    \[{\rm MOP}[S_i] = \bigsqcup_{P:~entry \to S_i} F_P({\rm OUT}[entry]),\text{ (or use } \sqcap \text{)} \]

    The paths may be not executable, which means MOP is not fully precise; the path can even be unbounded or not enumerable, which means MOP is somehow impractical.

    In this way we can find that the solution of iterative algorithm will be in the form of \(F(x \sqcup y)\), and that of MOP solution will be \(F(x) \sqcup F(y)\), which indicates that the iterative algorithm is no more precise than the MOP solution.


\(P.f.\)

By the definition of lub \(\sqcup\), we have:

\[x,y \sqsubseteq x \sqcup y \]

As the transfer function \(F\) is monotonic, we have:

\[F(x),F(y) \sqsubseteq F(x \sqcup y) \]

Then \(F(x \sqcup y)\) will be an upper bound of \(F(x)\) and \(F(y)\), so that

\[F(x) \sqcup F(y) \sqsubseteq F(x \sqcup y) \]


From the proof we can see that when \(F\) is distributive, the iterative algorithm will be as precise as MOP. We have a classic analysis whose \(F\) is not distributive: the constant propagation analysis.

  • Constant Propagation

    Definition: Given a variable \(x\) at program point \(p\), determine whether \(x\) is guaranteed to hold a constant value at \(p\)

    By the definition, we can see that the \({\rm OUT}\) of each node in the CFG is a set of pairs \((x, v)\), standing for the variable \(x\) holding the value \(v\). Next, we consider the framework of a data flow analysis \((D,L,F)\):

    • \(D\): the direction. The analysis should be a forwards analysis.

    • \(L\): the lattice.

      • The domain of the lattice should be like:

        image-20210717233721631

        Here NAC is short for not a constant.

      • The meet operator \(\sqcap\) will be:

        \({\rm NAC} \sqcap v = {\rm NAC},~{\rm UNDEF} \sqcap v = v\)

        \(c_1 \sqcap c_2 = {\rm NAC},~ c \sqcap c = c,~\text{where } c \text{ is a constant value}\)

    • \(F:\) the transfer function.

      For a given statement \(s\), the function will be like:

      \[F: {\rm OUT}[s] = gen \cup ({\rm IN}[s] \setminus \{(x, v)\}) \]

      and there are:

      \(s\) \(gen\)
      x = c \(\{(x, c)\}\)
      x = y \(\{(x, val(y)\}\)
      x = y op z \(\{(x, f(y,z))\}\) , where
      \(f(y,z) = \begin{cases} val(y)~op~val(z) & val(y),val(z)~\text{are both contants} \\ {\rm NAC} & \text{either } val(y), val(z) \text{ is NAC} \\ {\rm UNDEF} & \text{otherwise} \end{cases}\)

      As we have mentioned, \(F\) is non-distributive. Here is an example:

      image-20210718000702111

Based on these contents, we shall be able to design a constant propagation analysis algorithm, and that will be the Lab 01 for our curriculum.

So far we have learned a lot about the iterative algorithm, then we start to consider a optimization for it -- the Worklist Algorithm:

image-20210718001128839

Once upon the time, Mr. jpggg said that:

那个迭代算法不就是个 Bellman-Ford 吗,那个 Worklist Algorithm 也就是个 SPFA!

Definitely true and inspiring his words are that it can be a key for our comprehension.

posted @ 2021-07-18 00:15  Jane_leaves  阅读(59)  评论(0编辑  收藏  举报