NJU Static Program Analysis 05: Data Flow Analysis III

NJU Static Program Analysis 05: Data Flow Analysis III

Abstract

  • Understand the functional view of iterative algorithm
  • The definitions of lattice and complete lattice
  • Understand the fixed-point theorem
  • How to summarize may and must analyses in lattices (in next lecture)
  • The relation between MOP and the solution produced by the iterative algorithm (in next lecture)
  • Constant propagation analysis (in next lecture)
  • Worklist algorithm (in next lecture)

Notes

In this lecture, we will learn about some mathematical, formal and theoretical foundations about data flow analysis. To start with, recalling the following concepts is necessary:

  • Partial Order

    A poset is defined as a pair \((P, \sqsubseteq)\), where \(P\) is a set, and \(\sqsubseteq\) is a binary relation that defines a partial ordering over \(P\) which has the properties of Reflexivity, Antisymmetry and Transitivity.

  • Upper and Lower Bounds

    Given a poset \((P, \sqsubseteq)\) and its subset \(S, S \subseteq P\), we can say that:

    • an upper bound of \(S\) is defined as a \(u, u \in P\) satisfying \(\forall x \in S, x \sqsubseteq u\).
    • a lower bound of \(S\) is defined as a \(l, l \in P\) satisfying \(\forall x \in S, l \sqsubseteq x\).
    • the least upper bound(lub, or join) of \(S\), written \(\sqcup S\) satisfy that for every upper bound \(u\) of \(S\), \(\sqcup S \sqsubseteq u\).
    • the greatest lower bound(glb, or meet) of \(S\), written \(\sqcap S\), satisfy that for every lower bound \(l\) of \(S\), \(l \sqsubseteq \sqcap S\).

    Note that not every poset has the lub or glb, and if a poset has, it will be unique.
    The proof can be simple. Firstly for example for a poset \((P, \subseteq)\), where \(P = \left\{ \left\{a, b,c\right\}, \left\{a, c\right\}, \left\{a\right\}, \left\{c\right\} \right\}\), the poset do not have the glb. Secondly, for a poset \(P\) assume \(\exist \sqcap P_1 \not= \sqcap P_2\), then we have \(\sqcap P_1 \sqsubseteq \sqcap P_2\) and \(\sqcap P_1 \sqsubseteq \sqcap P_2\), which indicating a contradiction that \(\sqcap P_1 = \sqcap P_2\).

    What's more, if \(S = \{a, b\}\), then \(\sqcap S\) can be written as \(a \sqcap b\), \(\sqcup S\) can be written as \(a \sqcup b\).

  • Lattice

    Given a poset \((P, \sqsubseteq)\), \(\forall~ a, b \in P, \exist~ a \sqcup b, a \sqcap b\), we call \(P\) a lattice.

    • Semilattice

      Given a poset \((P, \sqsubseteq)\), \(\forall~a,b \in P\),

      • if only \(a \sqcup b\) exists, then \((P, \sqsubseteq)\) is called a join semilattice。
      • if only \(a \sqcap b\) exists, then \((P, \sqsubseteq)\) is called a meet semilattice.
    • Complete Lattice

      If a lattice \((P, \sqsubseteq)\) satisfy that \(\forall S \subseteq P, \exist~\sqcap S, \sqcup S\), it is called a complete lattice, in which \(\top = \sqcup P\) called top and \(\bot = \sqcap P\) called bottom.

      \((\Z, \subseteq)\) is not a complete lattice, considering \(\N \subseteq \Z, \not \exist \sqcup \N (+ \infty)\). Moreover, every finite lattice is a complete lattice, but not every complete lattice is finite.

      For most of our data flow analysis, we focus on the finite complete lattices.

    • Product Lattice

      Given lattices \(L_1 = (P_1, \sqsubseteq), L_2 = (P_2, \sqsubseteq), \dots, L_n = (P_n, \sqsubseteq)\), if \(\forall i, \exist \sqcup L_i, \sqcap L_i\), then we can have a product lattice \(L^n = (P,\sqsubseteq)\), where \(P = P_1 \times P_2 \times \dots \times P_n\).

      Moreover, there are

      • \((x_1,\dots,x_n) \sqsubseteq (y_1,\dots,y_2) \Leftrightarrow (x_1 \sqsubseteq y_1) \and \dots \and (x_n \sqsubseteq y_n)\)
      • \((x_1,\dots,x_n) \sqcap (y_1,\dots,y_2) = (x_1 \sqcap y_1, \dots, x_n \sqcap y_n)\)
      • \((x_1,\dots,x_n) \sqcup (y_1,\dots,y_2) = (x_1 \sqcup y_1, \dots, x_n \sqcup y_n)\)

      If every \(L_i\) is complete, then \(L^n\) is complete.

On the abstraction basis of lattice, a data flow analysis can be viewed as a framework \((D,L,F)\) consisting:

  • \(D\): a direction of data flow, whether forwards or backwards.
  • \(L\): a lattice including domain of the values \(V\) with two operators \(\sqcap\) and \(\sqcup\).
  • \(F\): a family of transfer functions \(F: V \to V\).

Add some details to the framework, we can get a formal expression of our iterative algorithm:

  • Given a CFG with \(k\) nodes, the iterative algorithm updates \({\rm OUT}[n]\) for every node \(n\) in each iteration.

  • Assume the domain of the values in data flow analysis is \(V\), then we can define a k-tuple

    \[({\rm OUT}[n_1], {\rm OUT}[n_2],\dots,{\rm OUT}[n_k]) \]

    as an element of set \((V_1 \times V_2 \times \dots \times V_k)\) denoted as \(V^k\), to hold the values of the analysis after each iteration.

  • Each iteration can be considered as taking an action to map an element of \(V^k\) to a new element of \(V^k\), through applying the transfer functions and control-flow handing, abstracted as a function \(F:V^k \to V^k\)

  • Then the algorithm outputs a series of k-tuples iteratively until a k-tuple is the same as the last one in two consecutive iterations.

In this context, the algorithm would begin with the state \(X_0 = (\bot, \bot, \dots, \bot)\) and iteratively apply a function \(F\) to the current state. Then we got \(X_1 =(v_1^1, v_2^1, \dots, v_k^1) \to X_2 = (v_1^2, v_2^2, \dots, v_k^2) \to \dots \to X_s\), where for each \(i\), \(X_{i+1} = F(X_i)\). And for the final state, there is \(F(X_s) = X_s\), that is to say, \(X_s\) is a fixed point of the function \(F\).

After all the work done previously, we can eventually start to use mathematical tricks to answer a few important questions:

  • Is the algorithm guaranteed to terminate or reach the fixed point, or does it always have a solution?

    To answer this question, we should inspect the monotonicity of the function \(F\), as we do in Lecture 03.

    A function \(F: L \to L\) is monotonic if \(\forall~x, y \in L, x \sqsubseteq y \Rightarrow f(x) \sqsubseteq f(y)\). (\(L\) is a lattice)

  • If so, is there only one solution or only one fixed point? If more than one, is our solution the best one(most precise)?

    We have a Fixed-Point Theorem that:

    Given a complete lattice \((L, \sqsubseteq)\), if \(F: L \to L\) is monotonic and \(L\) is finite, then

    • the least fixed point of \(F\) can be found by iterating \(F(\bot), F(F(\bot)), \dots, F^k(\bot)\) until a fixed point is reached.
    • the greatest fixed point of \(F\) can be found by iterating \(F(\top), F(F(\top)), \dots, F^k(\top)\) until a fixed point is reached.

    We can give a proof for the least fixed point, that of the greatest is similar.


    \(P.f.\)

    1. We firstly prove the existence of a fixed point.

      By the definition of \(\bot\) and \(F: L \to L\), we have:

      \[\bot \sqsubseteq F(\bot) \]

      As \(F\) is monotonic, we have:

      \[F(\bot) \sqsubseteq F(F(\bot)) \sqsubseteq \dots \sqsubseteq F^i(\bot) \]

      As \(L\) is finite, for some \(k\), we have:

      \[F^{\rm Fix} = F^k(\bot) = F^{k+1}(\bot) \]

      Thus we found the fixed point.

    2. Secondly, we prove the fixed point is the least one.

      Assume we have another fixed point \(x\). By the definition of \(\bot\), we have \(\bot \sqsubseteq x\).

      As \(F\) is monotonic, using induction, we can easily get

      \[\forall~i,F^i(\bot) \sqsubseteq F^i(x) \]

      Thus \(\exist~k,F^k(\bot) \sqsubseteq F^k(x) = x\), which means

      \[F^{\rm Fix} = F^k(\bot) \sqsubseteq x \]

      So that the fixed point is uniquely the least one.


    Using the fixed point theorem, we cal easily describe that the iterative algorithm will finally arrive at a best fixed point on the lattice, if \(F\) is monotonic.

posted @ 2021-07-14 23:50  Jane_leaves  阅读(91)  评论(0编辑  收藏  举报