PGM_Foundations

Chain Rule and Bayesian Rule

From the definition of the conditional distribution, we see that

$$P(\alpha_1 \cap ... \cap \alpha_k)=P(\alpha_1)P(\alpha_2 \vert \alpha_1)...P(\alpha_k \vert \alpha_1 \cap ...\cap \alpha_{k-1})~~(Chain~Rule)$$

$$P(\alpha \lvert \beta)=\frac{P(\beta \lvert \alpha)P(\alpha)}{P(\beta)}~~(Bayesian~Rule)$$

A more general conditional version of Bayes’ rule, where all our probabilities are conditioned on some background event $\gamma$, also holds $$P(\alpha \vert \beta \cap \gamma)=\frac{P(\beta \vert \alpha \cap \gamma)P(\alpha \cap \gamma)}{P(\beta \cap \gamma)}$$

 Independence and Conditional Independence

Independent Events

Event $\alpha$ is independent of $\beta$ in $P$ is denoted by $P \vDash \alpha  \bot \beta$. Then $$P \vDash \alpha  \bot \beta~if~P(\alpha \lvert \beta)=P(\alpha)$$ or $$P \vDash \alpha  \bot \beta~if~and~only~if~P(\alpha \cap \beta)=P(\alpha)P(\beta)$$. The 2 versions are equal, because $P(\alpha \cap \beta)=P(\alpha \lvert \beta)P(\beta)$. lets toss a coin, $\alpha=the~first~toss~results~in~a~head$, $\beta=the~second~toss~results~in~a~head$, this is a case where 2 different physical processes lead independence. $\alpha=the~die~outcome~is~even$, $\beta=the~die~outcome~is~1~or~2$, this is a case where the same process leads independence(at first sight it seems $\alpha$ and $\beta$ is dependent, but they are not).

Conditional Independence 

While independence is a useful property, it is not often that we encounter two independent events. A more common situation is when two events are independent given an additional event. Event $\alpha$ is conditionally independent of $\beta$ given $\gamma$ in $P$ is denoted by $P \vDash (\alpha \bot \beta \lvert \gamma)$. Then $$P \vDash (\alpha \bot \beta \lvert \gamma)~if~P(\alpha \lvert \beta \cap \gamma)=P(\alpha \lvert \gamma)$$ or $$P \vDash (\alpha \bot \beta \lvert \gamma)~if~and~only~if~P(\alpha \cap \beta \lvert \gamma)=P(\alpha \lvert \gamma)P(\beta \lvert \gamma)$$

Independent Variables

Until now, we have focused on independence between events. Thus, we can say that two events, such as one toss landing heads and a second also landing heads, are independent. However, we would like to say that any pair of outcomes of the coin tosses is independent. To capture such statements, we can examine the generalization of independence to sets of random variables. Let $X,Y,Z$ be 3 random variables, we say $X$ is conditionally independent of $Y$ given $Z$ in $P$ if $P$ satisfies ($X=x \bot Y=y \lvert Z=z$) for all states $x\in X~,~y \in Y~,~z \in Z$. Some properties hold for conditional independence:

(1) Decomposition: $(X \bot Y,W \lvert Z)\Rightarrow (X \bot Y \lvert Z)$

PROOF: IF $(X \bot Y,W \lvert Z)$, THEN $P(X,Y,W \lvert Z)=P(X \lvert Z)P(Y,W \lvert Z)$.

Therefore, $P(X,Y \lvert Z)=\sum_w P(X,Y,w \lvert Z)=\sum_w P(X \lvert Z)P(Y,w \lvert Z)=P(X \lvert Z)\sum_w P(Y,w \lvert Z)=P(X \lvert Z)P(Y \lvert Z)$

(2) Weak Union: $(X\bot Y,W \lvert Z)\Rightarrow (X\bot Y \lvert Z,W)$

PROOF: IF $(X \bot Y,W \lvert Z)$, THEN $P(X,Y,W \lvert Z)=P(X \lvert Z)P(Y,W \lvert Z)$. Therefore,

$P(X,Y \lvert Z,W)=P(X \lvert Y,W,Z)P(Y \lvert Z,W)~(Bayesian Rule)$

$=P(X \lvert Z)P(Y \lvert Z,W)~(X \bot Y,W \lvert Z)$

$=P(X \lvert Z,W)P(Y \lvert Z,W)~(Decomposition: X \bot W \lvert Z)$

(3) Contraction: $(X\bot W \lvert Z,Y) \& (X \bot Y \lvert Z) \Rightarrow (X \bot Y,W \lvert Z)$

PROOF: $P(X,Y,W \lvert Z)=P(X \lvert Y,W,Z)P(Y,W \lvert Z)~(Bayesian Rule)$

$=P(X \lvert Y,Z)P(Y,W \lvert Z)~(X\bot W \lvert Z,Y)$

$=P(X \lvert Z)P(Y,W \lvert Z)~(X \bot Y \lvert Z)$

(4) Intersection: For positive distributions(probability values are all positive), and for mutually disjoint sets $X,Y,Z,W$: $(X \bot Y \lvert Z,W) \& (X \bot W \lvert Z,Y) \Rightarrow (X \bot Y,W \lvert Z)$.

PROOF: $P(X,Y,W \lvert Z)=P(X \lvert Y,W,Z)P(Y,W \lvert Z)~(Bayesian Rule)$

$=P(X \lvert W,Z)P(Y,W \lvert Z)$

Querying a Distribution

Our focus throughout this book is on using a joint probability distribution over multiple random variables to answer queries of interest.

Probability Queries

The evidence: a subset $E$ of random variables in the model, and an instantiation $e$ to these variables;

the query variables: a subset $Y$ of random variables in the network.

Our task is to compute $P(Y \lvert E=e)$, the posterior probability distribution over the values $y$ of $Y$ , conditioned on the fact that $E=e$.

MAP Queries

The aim is to find the MAP assignment(the most likely assignment) to all of the non-evidence variables $W$ where $W=\mathcal{X}-E$, $E$  is the evidence variables. The task is to find the most likely assignment to the variables in $W$ given the evidence $E = e$: $$MAP(W \lvert e)=\mathop{argmax}\limits_{w} P(w,e)$$

Marginal MAP Queries

In the marginal MAP query, we have a subset of variables $W$ that forms our query. The task is to find the most likely assignment to the variables in $W$ given the evidence $E = e$: $$MAP(W \lvert e)=\mathop{argmax}\limits_{w}P(w \lvert e)$$. If we let $Z=\mathcal{X}-Y-E$, the marginal MAP task is to compute: $$MAP(Y \lvert e)=\mathop{argmax}\limits_{w} \sum_{Z}P(Y,Z \lvert e)$$

 Graphs

Nodes and Edges

For directed graphs, there are notions like child and parent of a node $X$, indgree and outdegree of a node $X$.

For undirected graphs, there are notions like neiogbors of $X$.

For both graphs, there are notions like boundary of a node $X$ which is Pa($X$) for directed graphs and Nb($X$) of undirected graphs, degree of a graph which is the maximal degree of a node in the graph.

Subgraphs

Clique(also called complete subgraph) and Maximal Clique:

Upward Closure: For a subset of nodes $X$, If $\forall x \in X~,~Boundry_x~\subset X$, we say $X$ is upwardly closed in $\mathcal{K}$. We define the upward closure of $X$ to be the minimal upwardly closed subet $Y$ that contains $X$. We define the upwardly closed subgraph of $X$, denoted $\mathcal{K}^{+}[X]$, to be the induced subgraph over $Y$, $\mathcal{K}[Y]$.

For example, the set A, B, C, D, E is the upward closure of the set {C} in K. The upwardly closed subgraph of {C} is shown in figure 2.4b. The upwardly closed subgraph of {C, D, I} is shown in figure 2.4c.

Paths and Trails

$X_1,...,X_k$ forms a path in $\mathcal{K}=(\mathcal{X},\varepsilon)$, if, for every $i=1,...,k-1$ we have that either $X_i \rightarrow X_{i+1}$ or $X_i - X_{i+1}$. A path is directed if, for at least one $i$, we have $X_i \rightarrow X_{i+1}$ . (Note directed path is not that every edge should be directed.)

$X_1,...,X_k$ forms a trail in $\mathcal{K}=(\mathcal{X},\varepsilon)$, if, for every $i=1,...,k-1$ we have $X_i \rightleftharpoons X_{i+1}$.

In the graph K of figure 2.3, A, C, D, E, I is a path, and hence also a trail. On the other hand, A, C, F, G, D is a trail, which is not a path.

connected graph: between every $X_i, X_j$ there is a trail.

We say that $X$ is an ancestor of $Y$ and $Y$ is a descendant of $X$, if there exists a directed path $X_1,..., X_k$ with $X_1=X~,~X_k=Y$. F, G, I are descendants of C. The ancestors of C are A, via the path A, C, and B, via the path B, E, D, C.

Cycles and Loops

A cycle is a directed path $X_1,...,X_k$ where $X_1=X_k$.  A graph is acyclic if it contains no cycles.

A directed acyclic graph(DAG) is one of the central concepts in this book. An acyclic graph containing both directed and undirected edges is called a partially DAG(PDAG).

Exercises

exercise 2.2 

2.2.1. Show that for binary random variables $X, Y$ , the event-level independence $(x^0 \perp y^0)$ implies random variable independence ($X\perp Y$).

Given: 

$(x^0 \perp y^0) \Rightarrow P(x^0 \lvert y^0)=P(x^0), P(y^0 \lvert x^0)=P(y^0)$

Find:

$P(x^1 \lvert y^0)=1-P(x^0 \lvert y^0)=1-P(x^0)=P(x^1)$

$P(y^1 \lvert x^0)=1-P(y^0 \lvert x^0)=1-P(y^0)=P(y^1)$

$P(x^1 \lvert y^1)=1-P(x^0|y^1)=1-\frac{P(y^1 \lvert x^0)P(x^0)}{P(y^1)}=1-\frac{P(y^1)P(x^0)}{P(y^1)}=1-P(x^0)=P(x^1)$

using Bayes rule we can know  $P(y^0 \lvert x^1)=P(y^0)~,~P(x^0 \lvert y^1)=P(x^0)~,~P(y^1 \lvert x^1)=P(y^1)$

2.2.2 Show a counterexample for nonbinary variables.

  $x^0$=0.5 $x^1$=0.25 $x^2$=0.25
$y^0$=0.5 0.25 0.1 0.125
$y^1$=0.5 0.25 0.125 0.15

2.2.3 Is it the case that, for a binary-valued variable $Z$, we have that $(X \perp Y) \lvert z^0$ implies $((X \perp Y) \lvert Z)$?

No. The distribution $P(X\cap Y)=P(X)P(Y)$ conditioned on $z^0$ has nothing to do with the independence of $X,Y$ conditioned on $z^1$, so $P(X \cap Y)$ is not necessarily equal to $P(X)P(Y)$.

exercise 2.5

Let $X,Y,Z$ be 3 disjoint subsets of variables such that $\mathcal{X}=X\cup Y\cup Z$. Prove that $P\models (X \perp Y \lvert Z)$ if and only if we can write $P$ in the form: $P(\mathcal{X}=\phi_1(X,Z)\phi_2(Y,Z))$ 

posted on 2017-04-12 23:47  chaseblack  阅读(357)  评论(0编辑  收藏  举报

导航