In [3]:

%matplotlib inline

Note: I wrote this post in an IPython notebook. It might be rendered better on NBViewer.

Dirichlet Distribution

The symmetric Dirichlet distribution (DD) can be considered a distribution of distributions. Each sample from the DD is acategorial distribution over

The expected value of the DD is

We demonstrate below by setting

In [10]:

import numpy as np
from scipy.stats import dirichlet
np.set_printoptions(precision=2)

def stats(scale_factor, G0=[.2, .2, .6], N=10000):
    samples = dirichlet(alpha = scale_factor * np.array(G0)).rvs(N)
    print "                          alpha:", scale_factor
    print "              element-wise mean:", samples.mean(axis=0)
    print "element-wise standard deviation:", samples.std(axis=0)
    print
    
for scale in [0.1, 1, 10, 100, 1000]:
    stats(scale)

                          alpha: 0.1
              element-wise mean: [ 0.2  0.2  0.6]
element-wise standard deviation: [ 0.38  0.38  0.47]

                          alpha: 1
              element-wise mean: [ 0.2  0.2  0.6]
element-wise standard deviation: [ 0.28  0.28  0.35]

                          alpha: 10
              element-wise mean: [ 0.2  0.2  0.6]
element-wise standard deviation: [ 0.12  0.12  0.15]

                          alpha: 100
              element-wise mean: [ 0.2  0.2  0.6]
element-wise standard deviation: [ 0.04  0.04  0.05]

                          alpha: 1000
              element-wise mean: [ 0.2  0.2  0.6]
element-wise standard deviation: [ 0.01  0.01  0.02]

Dirichlet Process

The Dirichlet Process can be considered a way to generalizethe Dirichlet distribution. While the Dirichlet distribution is parameterized by a discrete distribution

We can construct a sample

H = \sum k = 1 \infty π k \cdot δ (x - θ k)

where the

The weights (

Gregor Heinrich writes:

The defining property of the DP is that its samples have weights

As an example, Heinrich imagines a DP with a standard normal base measure

H (S 1), H (S 2), H (S 3) \sim Dir (α

where

These

There are several equivalent ways to choose the

To generate

π k = β k \prod j = 1 k - 1 (1 - β j) .

Thus, if we want to draw a sample from a Dirichlet distribution, we could, in theory, sample an infinite number of

However, by noting that the

We use this method below to draw approximate samples from several Dirichlet processes with a standard normal (

Recall that a single sample from a Dirichlet process is a probability distribution over a countably infinite subset of the support of the base measure.

The blue line is the PDF for a standard normal. The black lines represent the

We generate enough

In [13]:

import matplotlib.pyplot as plt
from scipy.stats import beta, norm

def dirichlet_sample_approximation(base_measure, alpha, tol=0.01):
    betas = []
    pis = []
    betas.append(beta(1, alpha).rvs())
    pis.append(betas[0])
    while sum(pis) < (1.-tol):
        s = np.sum([np.log(1 - b) for b in betas])
        new_beta = beta(1, alpha).rvs() 
        betas.append(new_beta)
        pis.append(new_beta * np.exp(s))
    pis = np.array(pis)
    thetas = np.array([base_measure() for _ in pis])
    return pis, thetas

def plot_normal_dp_approximation(alpha):
    plt.figure()
    plt.title("Dirichlet Process Sample with N(0,1) Base Measure")
    plt.suptitle("alpha: %s" % alpha)
    pis, thetas = dirichlet_sample_approximation(lambda: norm().rvs(), alpha)
    pis = pis * (norm.pdf(0) / pis.max())
    plt.vlines(thetas, 0, pis, )
    X = np.linspace(-4,4,100)
    plt.plot(X, norm.pdf(X))

plot_normal_dp_approximation(.1)
plot_normal_dp_approximation(1)
plot_normal_dp_approximation(10)
plot_normal_dp_approximation(1000)

Often we want to draw samples from a distribution sampled from a Dirichlet process instead of from the Dirichlet process itself. Much of the literature on the topic unhelpful refers to this as sampling from a Dirichlet process.

Fortunately, we don't have to draw an infinite number of samples from the base distribution and stick breaking process to do this. Instead, we can draw these samples as they are needed.

Suppose, for example, we know a finite number of the

π 1 = 0.5,

To sample from

The class below will take a base distribution

In [20]:

from numpy.random import choice

class DirichletProcessSample():
    def __init__(self, base_measure, alpha):
        self.base_measure = base_measure
        self.alpha = alpha
        
        self.cache = []
        self.weights = []
        self.total_stick_used = 0.

    def __call__(self):
        remaining = 1.0 - self.total_stick_used
        i = DirichletProcessSample.roll_die(self.weights + [remaining])
        if i is not None and i < len(self.weights) :
            return self.cache[i]
        else:
            stick_piece = beta(1, self.alpha).rvs() * remaining
            self.total_stick_used += stick_piece
            self.weights.append(stick_piece)
            new_value = self.base_measure()
            self.cache.append(new_value)
            return new_value
        
    @staticmethod 
    def roll_die(weights):
        if weights:
            return choice(range(len(weights)), p=weights)
        else:
            return None

This Dirichlet process class could be called stochastic memoization. This idea was first articulated in somewhat abstruse terms by Daniel Roy, et al.

Below are histograms of 10000 samples drawn from samplesdrawn from Dirichlet processes with standard normal base distribution and varying

In [22]:

import pandas as pd

base_measure = lambda: norm().rvs()
n_samples = 10000
samples = {}
for alpha in [1, 10, 100, 1000]:
    dirichlet_norm = DirichletProcessSample(base_measure=base_measure, alpha=alpha)
    samples["Alpha: %s" % alpha] = [dirichlet_norm() for _ in range(n_samples)]

_ = pd.DataFrame(samples).hist()

Note that these histograms look very similar to the corresponding plots of sampled distributions above. However, these histograms are plotting points sampled from a distribution sampled from a Dirichlet process while the plots above were showing approximate distributions samples from the Dirichlet process. Of course, as the number of samples from each

In a future post, I will look at how this DirichletProcessSampleclass can be used to draw samples from a hierarchicalDirichlet process.

菜鸡一枚

Notes on the Dirichlet Distribution and Dirichlet Process

Notes on the Dirichlet Distribution and Dirichlet Process

Dirichlet Distribution

Dirichlet Process

公告