Generalized Linear Models & Softmax Regression

η is called the natural parameter (also called the canonical parameter) of the distribution;

T (y) is the sufficient statistic (for the distributions we consider, it will often be the case that T (y) = y);

a(η) is the log partition function. The quantity e^(−a(η)) essentially plays the role of a normalization constant, that makes sure the distribution p(y; η) sums/integrates over y to 1.

A fixed choice of T , a and b defines a family (or set) of distributions that is parameterized by η; as we vary η, we then get different distributions within this family.