Generalized Linear Models
1. Guide
So far, we’ve seen a regression example, and a classification example. In the regression example, we had y|x; θ ∼ N(μ, σ2), and in the classification one, y|x; θ ∼ Bernoulli(φ), where for some appropriate definitions of μ and φ as functions of x and θ.(μ=θTx, φ=g(θTx))
2. The exponential family
We say that a class of distributions is in the exponential family if it can be written in the form:
p(y; η) = b(y) exp(ηTT(y) − a(η))
Here, η is called the natural parameter (also called the canonical parameter) of the distribution; T(y) is the sufficient statistic (for the distributions we consider, it will often be the case that T(y) = y); and a(η) is the logpartition function. The quantity e−a(η) essentially plays the role of a normalization constant, that makes sure the distribution p(y; η) sums/integrates over y to 1.
A fixed choice of T, a and b defines a family (or set) of distributions that is parameterized by η; as we vary η, we then get different distributions within this family.
3. Examples
a. Bernoulli distribution:(we consider η as a real num)
Thus, the natural parameter is given by η = log(φ/(1 − φ)). Interestingly, if we invert this definition for η by solving for φ in terms of η, we obtain φ = 1/(1 + e−η). This is the familiar sigmoid function! This will come up again when we derive logistic regression as a GLM:
η = log(φ/(1 − φ))
T(y) = y
a(η) = −log(1 − φ)= log(1 + eη)
b(y) = 1
b. Gaussian distribute:(let's set σ2=1)
Thus, we see that the Gaussian is in the exponential family:
η = μ
T(y) = y
a(η) = μ2/2 = η2/2
b(y) = (1/√2π) exp(−y2/2).