Discrete Choice Model - Multinomial Choice Models

1. Multinomial Choice Models

Let $C$ denote travel mode choice set (including all potential mode choices for some population), and define $J$ to be the number of travel modes in the choice set.

Example: $C$ may consist of more than two travel modes, such as:

Driving alone,
Sharing a ride
Taxi
Motorcycle
Bicycle
Walking
Transit bus
Rail rapid transit

For a particular traveler $n$, the actual choice set, $C_n$, may be considerably smaller. (e.g., do not own a car, then driving alone is infeasible)

Given that each individual has a feasible choice set denoted by $C_n$ , we define $J_n \leq J$ to be the number of feasible choices. From the maximum random utility based travel model choice modelling, we have

\[\begin{align*} \Pr{}_n(i) &= \Pr(U_{in} \geq U_{in}, \forall j \in C_{n}, j \neq i) \\ &= \Pr(V_{in} + \varepsilon_{in} \geq V_{jn} + \varepsilon_{jn}, \forall j \in C_{n}, j \neq i) \\ &= \Pr(\varepsilon_{jn} \leq V_{in} - V_{jn} + \varepsilon_{in}, \forall j \in C_{n}, j \neq i) \end{align*} \]

Any particular multinomial choice model can be derived using the above equation given the specific joint distribution of random terms.

Suppose the joint density function of the disturbance terms is $f(\varepsilon_{1n}, \varepsilon_{2n}, \cdots, \varepsilon_{J_n n})$. Without loss of generality, consider alternative $i$ to be the first alternative in $C_n$

\[\begin{aligned} P_n(1)= & \int_{\varepsilon_{1 n}=-\infty}^{\infty} \int_{\varepsilon_{2 n} = -\infty}^{V_{1n} - V_{2 n} + \varepsilon_{1n}} \ldots \\ & \int_{\varepsilon_{J_n n}= -\infty}^{V_{1n} - V_{J_n n} + \varepsilon_{1n}} f \left(\varepsilon_{1 n}, \varepsilon_{2 n}, \ldots, \varepsilon_{J_n n} \right) \mathrm{d} \varepsilon_{J_n n} \mathrm{d} \varepsilon_{J_{n}-1, n} \ldots \mathrm{d} \varepsilon_{1 n} \end{aligned} \]

We can also write

\[\Pr{}_n(i) = \Pr{} \left[ V_{i n} + \varepsilon_{in} \geq \max_{j \in C_n, j \neq i} (V_{j n} + \varepsilon_{j n} ) \right] \]

1.1 Multinomial Logit (MNL) Model

(1) Assumptions

\[U_{in} = V_{in} + \varepsilon_{in} \]

The multinomial logit model can be obtained based on the following assumptions on the random terms $\varepsilon_{in}$:

They are independent,
Identically distributed
Gumbel variant possessing same parameters, $\text{Gumbel}(\eta, \mu)$
- with mean of $\frac{\gamma}{\mu}$
- with variance of $\frac{\pi^2}{6\mu^2}$
- where Euler-Mascheroni constant $\gamma=0.57721\ldots$

Remarks: The assumption that the random terms follow Gumbel distribution can be defended as an approximation of the normal distribution. It is used for the reason of analytic convenience.

However, The assumption that the random terms are Independent and Identically Distributed (IID) has a **large restriction:

variances of the random components of the utilities are equal;
the random terms are, in some situations, not independent.

(2) Calculation of probability $\Pr{}_n(i)$

Under the IID assumption, the multinomial logit (MNL) model can be derived as:

\[\Pr{}_n(i) = \Pr(V_{in}+\varepsilon_{in} \geq V_{jn}+\varepsilon_{jn}, \forall j \neq i, j \in C_n) = \frac{\exp(\mu V_{in})}{\displaystyle \sum_{j \in C_n} \exp(\mu V_{jn})} \]

where parameter $\mu$ is related to the common standard deviation (variance) of the Gumbel variate by

\[\sigma^2 = \frac{\pi^2}{6 \mu^2}, \quad \text{ or } \quad \mu = \frac{\pi}{\sqrt{6}\sigma} \]

where $\mu$ is a scaling parameter.

Proof: Formulation of multinomial logit model

For convenience, assume $\eta=0$ for all the disturbances $\varepsilon_{in}$, i.e., $\varepsilon_{in} \overset{I.I.D.}{\sim} \text{Gumbel}(0, \mu)$. Then, order the alternatives so that $i = 1$, then we have:

\[\begin{aligned} \Pr{}_n(i) &= \Pr{} \left[ V_{1 n} + \varepsilon_{1n} \geq \max_{j =2, 3, \cdots, J_n} (V_{j n} + \varepsilon_{j n} ) \right] \\ &= \Pr{} \left[ V_{i n} + \varepsilon_{in} \geq U^* \right] \end{aligned} \]

where we define $U_n^* = \max \limits_{j =2, 3, \cdots, J_n} ( V_{j n} + \varepsilon_{j n})$, then we have $U_n^*$ following Gumbel distribution (according to its properties, see site):

\[U_n^* = \max \limits_{j =2, 3, \cdots, J_n} ( V_{j n} + \varepsilon_{j n}) \sim \text{Gumbel} \left(\frac{1}{\mu} \ln \left[ \sum_{j=2}^{J_n} \exp (\mu V_{jn}) \right], \ \mu \right) \]

Then, we can write $U^*_n = V^*_n + \varepsilon^*_n$, where

\[V^*_n = \frac{1}{\mu} \ln \left[ \sum_{j=2}^{J_n} \exp (\mu V_{jn}) \right], \quad \text{and} \quad \varepsilon^*_n \sim \text{Gumbel} \left(0, \mu \right) \]

Then, we have:

\[\begin{aligned} \Pr{}_n(1) & = \Pr{} \left(V_{1 n}+\varepsilon_{1 n} \geq V_n^*+\varepsilon_n^*\right) \\ & = \Pr{} \left[\left(V_n^*+\varepsilon_n^*\right)-\left(V_{1 n}+\varepsilon_{1 n}\right) \leq 0\right] \\ & = \Pr{} \left[\varepsilon_n^* - \varepsilon_{1 n} \leq V_{1 n} - V_n^* \right] \end{aligned} \]

Then, $\varepsilon_n^* - \varepsilon_{1 n}$ follows the logistical distribution, i.e., $\varepsilon_n^* - \varepsilon_{1 n} \sim \text{Logistic} \left(0, \mu \right)$. Thus, we have

\[\begin{aligned} \Pr{}_n(1) &= F[X \leq V_{1 n} - V_n^*] \\ &= \frac{1}{1+\exp[-\mu (V_{1 n} - V_n^*)]} \\ &= \frac{\exp(\mu V_{1n})}{\displaystyle \sum_{j \in C_n} \exp(\mu V_{jn})} \end{aligned} \]

Q.E.D.

Remarks: by setting $\mu=1.0$, we have

\[\Pr{}_n(i) = \frac{\exp(V_{in})}{\displaystyle \sum_{j \in C_n} \exp(V_{jn})} \]

Different values of $\mu$ affect the values of the parameters in the utility function $V_{jn}, j \in C_{n}$

\[\begin{align*} V^{\text{new}}_{jn} &= \mu V_{jn}, \quad j \in C_{n} \\ V_{jn} {\ \ \ } &= \beta_{1} x_{jn1} + \beta_{2} x_{jn2} + \beta_{3} x_{jn3} + \cdots + \beta_{K} x_{jnK} = \boldsymbol{\beta}^{\top} \boldsymbol{x}_{jn}, \quad j \in C_{n} \end{align*} \]

(3) Extreme Cases:

There are two extreme cases of the MNL model that result from extreme values of $\mu$:

Case 1: $\displaystyle \mu = \frac{\pi}{\sqrt{6} \sigma}$: $\mu \to 0$ or $\sigma \to + \infty$

\[\lim_{\mu \to 0} \Pr{}_n(i) = \frac{1}{J_n}, \quad \forall i \in C_n \]

As $\mu \to 0$, the variance of the disturbances approaches infinity, i.e., $\sigma \to + \infty$. The choice model then provides no information, so the travel modes are equally likely to be selected.

Case 2: $\displaystyle \mu = \frac{\pi}{\sqrt{6} \sigma}$: $\mu \to + \infty$ or $\sigma \to 0$

\[\lim_{\mu \to + \infty} \Pr{}_n(i) = \lim_{\mu \to + \infty} \frac{1}{\displaystyle 1 + \sum_{j \in C_n, j \neq i} \exp \big[ \mu (V_{jn} - V_{in}) \big]} \]

As $\mu \to + \infty$, the variance of the utility disturbances approaches zero, i.e., $\sigma \to 0$, and a deterministic choice model is obtained because all the information about individual preferences is included in the systematic utilities.

(4) Linear-in-Variable Logit Model

To restrict in $V_{in}$ to the class of linear-in-parameters functions:

\[\Pr{}_n(i) = \frac{\exp \left(\mu \boldsymbol{\beta}^{\top} \boldsymbol{x}_{in} \right)}{\displaystyle \sum_{j \in C_n} \exp \left(\mu \boldsymbol{\beta}^{\top} \boldsymbol{x}_{jn} \right)} \]

where $\boldsymbol{x}_{in}$ and $\boldsymbol{x}_{jn}$ are vectors describing the attributes of travel models $i$ and $j$ and traveler $n$.

(5) Properties of MNL Model

Independence of Irrelevant Alternative (IIA) Property: The IIA property holds that for a specific individual, the ratio of the choice probabilities of any two travel modes is entirely unaffected by the systematic utilities of any other travel mode in the choice set.

\[\frac{\Pr{}_n(i)}{\Pr{}_n(l)} = \frac{\dfrac{\exp \left(\mu V_{in}\right)}{\sum_{j \in C_n} \exp \left(\mu V_{jn} \right)} }{\dfrac{\exp \left(\mu V_{ln} \right)}{\sum_{j \in C_n} \exp \left(V_{jn} \right)} } = \frac{\exp \left(\mu V_{in}\right)}{\exp \left(\mu V_{ln}\right)} = \exp \big[\mu (V_{in} - V_{ln}) \big] \]

The ratio is a constant, thus, it is independent of the rest of travel modes.

This property is perceived as a disadvantage which makes the model fail in the presence of correlated alternatives.

Remark: Although models other than multinomial logit model produce different numerical results, any models based on the assumption that all the random terms are mutually independent and identically distributed would necessarily yield counterintuitive choice probabilities for the classical red bus/blue bus problem.

1.2 Multinomial Probit (MNP) Model

(1) Assumption

where:

\[(\varepsilon_{1n}, \varepsilon_{2n}, \cdots, \varepsilon_{{J_n} \,n}) \sim \mathcal{N}(0, \mathbf{\Sigma}) \]

Covariance matrix:

\[\mathbf{\Sigma} = \Big[ \mathbb{E} \big[(\varepsilon_{in} -\mathbb{E}[\varepsilon_{in}]) \times (\varepsilon_{jn} -\mathbb{E}[\varepsilon_{jn}]) \big] \Big]_{J_{n} \times J_{n}} \]

Means of the random terms: $\mathbb{E} [\varepsilon_{in}] = 0, i \in C_{n}$

(2) Calculation of Probability $\Pr{}_n(i)$

\[\begin{align*} \Pr{}_n(i) &= \Pr_{}(V_{in}+\epsilon_{in} \geq V_{jn} + \epsilon_{jn}, \, \forall j \neq i, j \in C_n) \\ &= \int \limits_{y_1<y_i} \ \int \limits_{y_2<y_i} \cdots \int \limits_{y_i=-\infty}^{y_i=+\infty} \cdots \int \limits_{y_{_{J_n}}<y_i} \left\{ \left(\frac{1}{2\pi}\right)^{\frac{J_n}{2}} \times |\Sigma|^{-\frac{1}{2}} \times \exp \left[ -\frac{1}{2} (\boldsymbol{Y} - \boldsymbol{V})^{\top} \Sigma^{-1} (\boldsymbol{Y} - \boldsymbol{V}) \right] \right\}\, \mathrm{d} y_1 \, \mathrm{d} y_2 \cdots \mathrm{d} y_{{J_n}} \end{align*} \]

where vectors $\boldsymbol{Y}$ and $\boldsymbol{V}$:

\[\begin{align*} \boldsymbol{Y} &= \left[y_1, y_2, \cdots, y_{{J_n}} \right]^{\top} \\ \boldsymbol{V} &= \left[V_{1n}, V_{2n}, \cdots, V_{J_n \, n} \right]^{\top} \end{align*} \]

The travel mode choice probability cannot be expressed analytically.
If it is a binary case, it could work computationally.
For the general case, we have to use numerical approximation or Monto Carlo simulation methods.

1.3 Remarks on Function Specification of Deterministic Component

If a given variable does not vary over travel modes, then we can include it in at most $J-1$ travel modes, where $J$ is the total number of alternatives/travel at most $L-1$ dummy variables for a variable with $L$ categories.
Categorizing a continuous variable is usually unadvisable.
Using dummy variable to encode the categorical variable.

5. Maximum Likelihood Method

5.1 Likelihood

Likelihood:

Likelihood function for the multinomial choice model with the sample of $N$ travelers.

\[L(\beta_1, \beta_2, \cdots, \beta_K) = \prod_{n=1}^{N} \prod_{i \in C_n} \Pr{}_n(i)^{y_{n,i}} \]

where:

\[y_{n,i} = \begin{cases} 1, & \text{If traveler $n$ selects mode $i$} \\ 0, & \text{Otherwise} \end{cases} \]

Log-Likelihood Functions

\[LL(\beta_1, \beta_2, \cdots, \beta_K) \triangleq \ln [L(\beta_1, \beta_2, \cdots, \beta_K)] \]

5.2 Maximum Likelihood Method

Purpose: Find the values of $\boldsymbol{\beta}$ that are most likely to result in the travel mode choices observed in the sample.

Therefore, we maximize, over $\{\beta_1, \beta_2, \cdots, \beta_K\}$ the likelihood function $L(\beta_1, \beta_2, \cdots, \beta_K)$ or log-likelihood function $\ln [L(\beta_1, \beta_2, \cdots, \beta_K)]$, namely:

\[\max_{\beta_1, \beta_2, \cdots, \beta_K} L(\beta_1, \beta_2, \cdots, \beta_K) \qquad \Leftrightarrow \qquad \max_{\beta_1, \beta_2, \cdots, \beta_K} LL(\beta_1, \beta_2, \cdots, \beta_K) \]

5.3 Variable Selection Principles

Signs and relative magnitudes of the estimated parameters.
- For example, we expect the coefficient of price to be negative to a choice model.
Statistical test to test the significance of each parameter.
Goodness of fit measure.

\[\begin{array}{ll|cc} \hline & & \text{Policy} & \text{Other} \\ \hline \text{Correct} & \text{Significant} & \text{Include} & \text{Include} \\ \text{sign} & \text{Not significant} & \text{Include} & \text{May reject} \\ \hline \text{Wrong} & \text{Significant} & \text{Big Problem} & \text{Reject} \\ \text{sign} & \text{Not significant} & \text{Problem} & \text{Reject} \\ \hline \end{array} \]

5.4 Properties of Log-Likelihood Function

We find the estimation of $\boldsymbol{\beta}$, denoted by $\hat{\boldsymbol{\beta}}$, by maximizing the log-likelihood function.

Log-likelihood function has a non-positive value.

$LL = 0$: represents the likehood function equal to 1, i.e., $L=1$, and indicates perfect prediction.
$LL(0)$: represents only contains constant item, and indicates equally likely choices.

Goodness of Fit:

$\rho^2$ is analogous to $R^2$ in the linear regression method.
Likelihood ratio index $\rho^2$ : measures the fraction of an initial log-likelihood that is explained by the model

\[\rho^2 \triangleq 1 - \frac{LL(\hat{\boldsymbol{\beta}})}{LL(0)} \]
$\rho^2$ is monotonic in the number of parameters $K$
$\rho^2$ must lie between 0 and 1.

5.5 Free Choice Modelling Software

Apollo: R, official website

Apollo is a completely free package which does not rely on commercial statistical software as a host environment. It relies on R, which is very widely used across disciplines and works well across different operating systems. Several existing packages refer to specific models in their name (e.g. ALogit, NLogit) which is not applicable in our case given the wider set of models we cover.
mlogit: R, paper

"Estimation of multinomial logit models in R : The mlogit Packages"
biogeme: Python, official website

Reference

posted @ 2023-01-12 11:12 veager 阅读(23) 评论(0) 收藏举报

刷新页面返回顶部

veager