Calculus 微积分


在微分理论中,一个变量的无限小的增量 \(\Delta x\) 被称为“微分”(Differentiation),一般记为 \(\mathrm{d} x\)

Limit 极限

The limit of a function 函数极限

[Def] limit: Given a function, \(f(x):\R\mapsto \R\) ,

\[\lim_{x\to a}f(x)=L \iff \forall \epsilon >0 \exists \delta >0: 0<|x-a|<\delta \implies |f(x)-L|<\epsilon \]

\[\lim_{x\to x_0} f(x)= A\in \R\cup\{+\infty,-\infty\} \iff \lim_{x\to x_0^+} f(x)=A \wedge \lim_{x\to x_0^-} f(x)=A \]

[Def] limit for multivariate functions 多元函数极限:
Given a multivariate function \(f(\bm x):\R^n\mapsto \R\) , and its domain \(D\) :

\[\lim_{\bm x \to \bm a} f(\bm x)=L \iff \forall \epsilon \exists\delta: 0<\|\bm x-\bm a\|_2=\sqrt{\sum_{i}^n (x_i-a_i})<\delta \implies |f(\bm x)-L|<\epsilon \]

. 也就说,用来限制“邻域”范围的是欧氏距离。

One-side limit 单侧极限

假设 \(\lim_{x\to a} f(x), \lim_{x\to a} g(x)\) 均存在,则

\[\begin{aligned} &\lim _{x \rightarrow a}[f(x)+g(x)]=\lim _{x \rightarrow a} f(x)+\lim _{x \rightarrow a} g(x) \\ &\lim _{x \rightarrow a}[f(x)-g(x)]=\lim _{x \rightarrow a} f(x)-\lim _{x \rightarrow a} g(x) \\ &\lim _{x \rightarrow a}[c f(x)]=c \lim _{x \rightarrow a} f(x) \\ &\lim _{x \rightarrow a}[f(x) g(x)]=\lim _{x \rightarrow a} f(x) \cdot \lim _{x \rightarrow a} g(x) \\ &\lim _{x \rightarrow a} \frac{f(x)}{g(x)}=\frac{\lim _{x \rightarrow a} f(x)}{\lim _{x \rightarrow a} g(x)} \text { if } \lim _{x \rightarrow a} g(x) \neq 0 \end{aligned} \]


The Squeee Theorem 夹逼定理

若在a的去心邻域内处处有 \(f(x)\le g(x)\le h(x)\) ,且 \(\lim_{x\to a}f(x)=\lim_{x\to a}h(x)=L\) ,则 \(\lim_{x\to a} g(x)=L\) .


洛必达法则(L'Hospital's Rule)以法国数学家洛必达命名,但实际上是由瑞士数学家伯努利(Bernoulli)发现的。

法则内容:记扩展实数集 \(\bar\R=\R\cup\{\infty, -\infty\}\) , 设 \(c\in \bar\R\) , \(f(x), g(x)\)\(x=c\) 附近可微,其导数记为 \(f'(x),g'(x)\) ,且有 \(\lim_{x\to c}\frac{f'(x)}{g'(x)}\in \bar\R, g'(x)\ne 0\) 。如果 \(\lim_{x\to c}{f(x)}=\lim_{x\to c}g(x)=0\)\(\lim_{x\to c}|f(x)|=\lim_{x\to c}|g(x)|=\infty\) (换句话说, \(\frac{f(x)}{g(x)}\)\(\frac00\) 型或 \(\frac{\infty}{\infty}\) 型),则有

\[\lim_{x\to c}\frac{f(x)}{g(x)}=\lim_{x\to c} \frac{f'(x)}{g'(x)} \]

\(0\cdot\infty, \infty-\infty, 0^0, 1^{\infty}, \infty^0\) 型的未定式均可转换为一般型 \(\frac{0}{0}, \frac{\infty}{\infty}\) 求解。参见'Hôpital's_rule 中的 "Other indeterminate forms" 部分。


\[\lim_{x\to 0} \frac{\sin x}{x} = 1 \\ \]

留意函数 \(y=\sin x /x (x\neq 0)\) 曲线形状尤其在x=0附近形状与极限值 \(\lim_{x\to 0}\sin x/x\) 的关系。


若两个去穷小量 $ f(x), g(x) $ 满足

\[\lim_{f(x)\to 0, g(x)\to 0}\frac{f(x)}{g(x)}= 0 \]


若 $\lim_{f(x)\to 0, g(x)\to 0}\frac{f(x)}{g(x)}= 1 $ ,则称f(x)和g(x)是等价无穷小量。
若 $\lim_{f(x)\to 0, g(x)\to 0}\frac{f(x)}{g(x)}= c (c>0) $ ,则称f(x)和g(x)是同阶无穷小量。
若 $\lim_{f(x)\to 0, g(x)\to c (c>0)}\frac{f(x)}{[g(x)]^k}= c (c>0) $ ,则称f(x)是g(x)的k阶无穷小量。

Techniques of Evaluating Limits 求解极限的技巧
  1. 直接代入。
  2. 根式有理化。
  3. 因式分解。
  4. 化解为常见重要标准形式。
  5. 洛必达法则。
  6. 函数有界+运算性质+夹逼定理。

连续性 continuity
[Def] 称f(x)在x=a处连续,如果

\[\lim_{x\to a} = f(a) \]



[Theorem] 如果函数f是连续的,则其逆函数(如果存在)也是连续的。



\[\lim_{x\to a} f(g(x)) = f(\lim_{x\to a} g(x)), \text{ if } f \text{ is continuous at } \lim_{x\to a}g(x) \]

渐近线 asymptote


\[\lim_{x\to +\infty} f(x)=c \text { 或 } \lim_{x\to -\infty} f(x)=c \]

,则称水平线(x轴平行线) \(y=c\) 是函数 \(f(x)\) 的水平渐近线。


\[\lim_{x\to a} f(x)=+\infty \text { 或 } \lim_{x\to a} f(x)= -\infty \]

,则称竖直线(y轴平行线) \(x=c\) 是函数 \(f(x)\) 的竖直渐近线。


Derivative 导数


[Def] Derivative: For a function \(f: \R\mapsto\R, x\mapsto f(x), x\in \R\) , the derivative of \(f\) with respect to \(x\) is defined as the limit

\[\frac{\mathrm{d}f}{\mathrm{d} x} := \lim_{\Delta x\to 0} \frac{f(x+\Delta x)-f(x)}{\Delta x} \]


In calculus, we also denote the derivative of a function \(f(x)\) w.r.t. \(x\) simply as \(f'(x)\) .

求解单变元函数的导数的工具:导数相关数学工具;求解以方程刻画的两个变元间的纠缠关系(如 \(x^2+y^2=4\) )的导数:微分方程。

[Def] Partial Derivative: For a real-valued function of multivariate \(f: \R^n \mapsto \R, \bm x \mapsto f(\bm x), \bm x = (x_1,x_2,..., x_n)\) , the partial derivative of \(f\) with respect to \(x_i\) is defined as

\[\frac{\partial f}{\partial x_i}= \lim_{\Delta\to 0} \frac{f(x_1, ...,x_{i-1}, x_i +\Delta, x_{i+1}, ...,x_n) - f(x_1, ...,x_{i-1}, x_i , x_{i+1}, ...,x_n)}{\Delta} \]

The gradient of a real-valued function $f: \R^n \mapsto \R, \bm x\mapsto f(\bm x), \bm x\in \R^{n\times 1}=[x_1, x_2,...,x_n]^T $ with respect to the colunn vector \(\bm x\) is defined as a row vector of partial derivatives:

\[\nabla_{\bm x} f = \frac{\mathrm{d} f}{\mathrm{d} \bm x}:=\left[ \frac{\partial f}{\partial x_1}, ..., \frac{\partial f}{\partial x_n}\right] \in \R^{1\times n} \]



It is not uncommon to define the gradient vector as a column vector, following the convension that vectors are generally column vectors.

The reason why we define the gradient vector as a row vector is twofold: (1) First, we can consistently generalize the gradient to vector-valued functions \(f:\R^n \mapsto \R^m\) (then
the gradient becomes a matrix). (2) Second, we can immediately apply the multi-variate chain rule without paying attention to the dimension of the gradient.


Hessian: For a real-valued function \(f: \R^n \mapsto \R, \bm x\mapsto f(\bm x)\) , the Hessian is defined as the second-derivative of \(f\) with respect to \(\bm x\)

\[H\in\R^{n\times n} \text{ , where } H_{i,j}=\frac{\partial^2 f}{\partial x_i \partial x_j} \]

is a symmetric matrix.

If \(f: \R^{n } \mapsto \R^{m }, \bm x \mapsto f(\bm x)\) , then Jacobian

\[J \in\R^{m \times n} \text{ , where } J_{i,j}=\frac{\partial f_i}{\partial x_j} \]

and the Hessian

\[H\in\R^{m\times n\times n} \text{ , where } H_{ijk}=\frac{\partial^2 f_i}{\partial x_j \partial x_k} \]

is an ( \(m\times n\times n\) )-tensor.

In the context of neural networks, where the input dimensionality is often much higher than the dimensionality of the labels, the reverse mode is computationally significantly cheaper than the forward mode.


[Theorem] 多变元复合函数的链式法则:
For a real-valued funtion \(f(\bm x ): \R^n\mapsto \R, \bm x=[x_1,x_2,\dots,x_n]\) , and \(\bm x(\bm u):\R^m\mapsto \R^n=[ x_1(\bm u), x_2(\bm u), \dots, x_n(\bm u)], \bm u=[u_1,\dots,u_m]\) , then

\[\frac{\partial{f}}{\partial{u_j}}=\sum_i^n \frac{\partial{f}}{\partial{x_i}}\frac{\partial{x_i}}{\partial{u_j}} \\ \frac{\partial{f}}{\partial{\bm{\vec{u}}}}=\frac{\partial{f}}{\partial{\bm{\vec{x}}}}\frac{\partial{\bm{\vec{x}}}}{\partial{\bm{\vec{u}}}} \]

, where \(\frac{\partial{f}}{\partial{\bm{\vec{u}}}}\in\R^{1\times m}\) is a row-shaped vector of gradient of \(f\) w.r.t. \(\bm{\vec u}\) , and \(\frac{\partial{\bm{\vec{x}}}}{\partial{\bm{\vec{u}}}}\in\R^{m\times n}\) is the Jacobian matrix of \(\bm{\vec x}\) w.r.t. \(\bm{\vec u}\) .

total differential: for a real-valued function of independent multivariables, \(f(\bm x)\) ,

\[df=\nabla_{\bm x} f\cdot \bm{dx}=\sum_i \frac{\partial f}{\partial x_i}dx_i \]


[Def] 多变元函数的方向导数 directional derivative: For a function \(f(\bm{\vec x}):\R^n\mapsto \R\) , and a constant unit vector (direction) \(\bm{\vec u}\) , the directional derivative of \(f(\bm{\vec x})\) in the direction of a unit vector \(\bm{\vec u}\) is

\[D_{\bm u}f(\bm x):=\lim_{h\to 0} \frac{f(\bm x+h\bm u)-f(\bm x}{h} \]


And the partial derivatives come to be a special case of directional derivatives. (let \(\bm u=[0,\dots, 0,1,0,\dots,0])\) .

[Theorem] For \(f(\bm x)\) and a unit vector \(\bm u\) , the directional derivative

\[D_{\bm u}f(\bm x)=\nabla_{\bm x} f \cdot\bm u \]

级数 Series: 无穷个数项的和:

\[\sum_{n}^{\infty} a_n :=\lim_{N\to \infty} \sum_n^N a_n \]

自然常数e Natural Nubmer e

\[e = \lim_{n\to \infty}{(1+\frac1{n})^n}=\sum_{n=0}^{\infty}\frac1{n!} \]

几何级数 geometric series:

\[\sum_{n=0}^{\infty}ar^n=a+ar+ar^2+\cdots \]

is convergent if \(|r|<1\) , and the sum is

\[\sum_{n=0}^{\infty}ar^n=\frac{a}{1-r} ,\; |r|<1 \]

. If \(|r|\ge 1\) , the geometric series is divergennt.

Talor series 泰勒级数:
For a real-valued function \(f: \R \mapsto \R, x\mapsto f(x)\) , the Taylor series at \(x_0\) is defined as

\[f(x)=\sum_{k=0}^\infty \frac{D_x^k f(x_0)}{k!}(x-x_0)^k \]

, where \(D_x^k f(x_0)\) is the k-th derivative of f with respect to \(x\) , evaluated at \(x_0\) .

Multivariant Talor series: For a real-valued function \(f: \R^n \mapsto \R, \bm x\mapsto f(\bm x), \bm x\in\R^n\) , the multivariant Talor series at \(\bm x_0\in\R^n\) is defined as

\[f(\bm x)=\sum_{k=0}^\infty \frac{D_{ \bm x}^k f(\bm x_0)}{k!} (\bm x- \bm x_0)^{\otimes k} \]

, where \(D_{\bm x}^k f(\bm x_0)\) is the k-th (total) derivative of f with respect to \(\bm x\) , evaluated at \(\bm x_0\) . \((\bm x-\bm x_0)^{\otimes k}\) is the result of applying outer product on \(k\) times of the vector \((\bm x -\bm x_0)\) . Note that \(D_x^k f(\bm x)\) and \((\bm x-\bm x_0)^{\otimes k}\) are both k-th order tensor, i.e. \(D_x^k f(\bm x), (\bm x-\bm x_0)^{\otimes k} \in \R^{\overbrace{n\times ...\times n}^{k \text{ times}}}\) .

\[\left[D_{\bm x}^k f(\bm x_0)\right] \left[(\bm x- \bm x_0)^{\otimes k}\right] = \sum_{i_1=1}^n...\sum_{i_k=1}^n \left[D_{\bm x}^k f(\bm x_0) \right]_{i_1,...,i_k} \left[(\bm x-\bm x_0)^{\otimes k}\right]_{i_1,...,i_k} \]

For a vector $ \bm x \in \R^n$ , the outer product is defined as

\[\bm x^{\otimes k} = \overbrace{\bm x\otimes ...\otimes \bm x}^{k \text{ times}}=\mathbf Y \in \R^{\overbrace{n\times ...\times n}^{k \text{ times}}}, \text{ where } Y_{i_1,...,i_k}=x_{i_1}x_{i_2}...x_{i_k} \]


Derivative Properties 导数运算性质

\[\{a\}' = 0 \\ \{e^x\}'=e^x \\ \{a^x\}' = a^x \ln x \\ \{x^n\}' = n x^{n-1}, n\neq 0 \wedge n\in \Z \\ \{a f(x)\}'=a f'(x) \\ \{f(x)\pm g(x)\}' = f'(x) \pm g'(x) \\ \{f(g(\cdot))\}' = f'(\cdot) g'(\cdot) \\ \{f(x)g(x)\}' = f'(x)g(x)+f(x)g'(x) \\ \{\frac{f(x)}{g(x)} \}'=\{f(x)g^{-1}(x)\}' = \frac{1}{g^2(x)}\left[f'g-fg'\right] \]

Differentiation Properties 微分运算性质

[Theorem] For a differentiable function \(x\mapsto f(x)\) , we have

\[\mathrm{d} f(x) = \frac{\mathrm{d} f(x)}{\mathrm{d} x} \cdot \mathrm{d} x = f'(x) \mathrm{d} x \]



\[d[e^x] = [e^x] dx \\ d[a^x] = [a^x \ln a] dx \\ \]

临界点 critical point: 一阶导数为0或不存在的点,分一阶导数为0的点称驻点(stationary point),导数不存在的点称奇点(singular point)。
驻点 stionary point:在该点处一阶导数存在且为0。
奇点 singular point: (对于一元函数)导数不存在的点。
拐点 inflection point: 二阶导数在该点两边异号。
极值 extremum(局部最值 local maximum/minimum): 极大值/极小值点:在点邻域内该点函数值最大/最小。
最值(全局最值 global maximum/minimum):最大值/最小值点:在函数整个定义域内该点函数值最大/最小。(全局)最大值或最小值也可能有多个,如函数sin x的最值。

plural forms: extrema, maxima, minima.


Lagragian Mean Value Theorem 拉格朗日中值定理


弧微分 arc differential: 思想:以线段近似弧长。

\[(ds)^2=(dx)^2+(dy)^2 \\ ds=\sqrt{(dx)^2+\left(\frac{dy}{dx}dx\right)^2}=\left(\sqrt{1+\left(\frac{dy}{dx}\right)^2}\right)dx \]

(figure source: Baidu-Baike)

Integration 积分

\[\int_a^b f(x)dx := \lim_{n\to \infty} \sum_i^n f(x_i^*)\Delta x, \Delta x=\frac{b-a}{n} \]

Definite Integration Properties 定积分性质

\[\int_a^b f(x)dx = \lim_{\epsilon\to 0} \int_{a+\epsilon}^b f(x)dx \\ \int_b^a f(x)dx = -\int_a^b f(x)dx \]

Indefinite Integration Properties 不定积分运算性质

不定积分等式 \(\int f(x)dx = F(x) +C\) 的含义实际上是指,以 \(F(x)\) 中的变量x作为积分上界对 \(f(x)\) 积分,因其没有指定下界导致一个常数C的出现,换言之, \(F(x)+C=\int_?^x f(t)dt=\int f(x)dx\)

\[ \left[\int f(x) d x\right]^{\prime}=f(x) \\ d \int f(x) d x=f(x) d x \\ \int F^{\prime}(x) d x=F(x)+C \\ \int d F(x)=F(x)+C \\ \int f(g(x))g'(x)dx = \int f(u)du \]

[Theorem] The Substitution Rule.

If \(u=g(x)\) is differentiable function with range being an interval and \(f\) is continuous on the interval, then $\int f(g(x))g'(x)dx = \int f(u)du $ .


\[\int \frac{d F(x)}{d x} d x=F(x)+C \\ \int a d x=a x+c \\ \int x^{n} d x=\frac{1}{n+1} x^{n+1}+C, n \neq-1 \wedge n \in \Z \\ \int x^{k} d x=\frac{1}{k+1} x^{k+1}+C, k \neq-1 \wedge k \in \mathbb{Q} \\ \int \frac{1}{x} d x=\ln |x|+C \\ \int e^{x} d x=e^{x}+C \\ \int a^{x} d x=\frac{a^{x}}{\ln a}+C, a>0 \wedge a \neq 1 \\ \int \sin x d x=-\cos x +C \\ \int \cos x d x=\sin x+C \]


\[\int f(a x+b) d x=\frac{1}{a} \int f(a x+b) d(a x+b)(a \neq 0) \\ \int f\left(a x^{2}+b\right) x d x=\frac{1}{2 a} \int f\left(a x^{2}+b\right) d\left(a x^{2}+b\right)(a \neq 0) \\ \int f\left(a x^{n}+b\right) x^{n-1} d x=\frac{1}{n a} \int f\left(a x^{n}+b\right) d\left(a x^{n}+b\right)(a\neq 0, n \neq 0) \\ \int f\left(\frac{1}{x}\right) \frac{1}{x^{2}} d x=-\int f\left(\frac{1}{x}\right) d\left(\frac{1}{x}\right) \\ \int f(\sqrt{x}) \frac{1}{\sqrt{x}} d x=2 \int f(\sqrt{x}) d(\sqrt{x}) \\ \int f(\ln x) \frac{1}{x} d x=\int f(\ln x) d(\ln x) \\ \int f\left(e^{a x}\right) e^{a x} d x=\frac{1}{a} \int f\left(e^{a x}\right) d\left(e^{a x}\right)(a \neq 0) \\ \int f(\sin x) \cos x d x=\int f(\sin x) d(\sin x) \\ \int f(\cos x) \sin x d x=-\int f(\cos x) d(\cos x) \]

体积 Volume

Cylindric shell method 旋转体体积求解

\[V=\int_a^b \underbrace{f(x)}_{周长circumference} \underbrace{h(x)}_{高 height} \underbrace{dx}_{厚 thickness} \]

比较适合用于经旋转形成的各类复杂旋转体(cylindric solid)。



Differential Equations 微分方程

斜率场 Direction Field (Slope Field) 方法:y(x)是需求解的未知量,导数 \(dy/dx\) 是关于x、y的已知函数,即已知 \(\frac{dy}{dx}=F(x,y)\) ,则可以x,y为笛卡尔坐标系,采样若干(x,y)数据点,计算相应的 \(dy/dx\) ,对于每个数据点(x,y),在坐标点(x,y)画出以该点处导数 \(dy/dx|_{(x,y)}\) 为斜率的短线段,可从整个图像中根据导数走向看出(逼近出)未知量y(x)的大致方向(可能有多个解的曲线)。


\[\frac{dy}{dx}=f(x) g(y) \iff h(y)dy=f(x)dx \text{ 若 } g(y)\ne 0 \\ \implies \int h(y)dy = \int f(x)dx \]

正交轨线 orthogonal trajectory: 一条曲线总正交地相交于一簇曲线中的每一条,则称这条曲线是这一簇曲线的正交轨线(可能有多条,也称正交轨线簇)。


求解正交轨线的过程:对于给定一簇曲线 F(x,y,c)=0,其中x,y是自变量和因变量,c是定义簇的任意常量(相对于x,y来说是常量),求解出关于项 \(\frac{dy}{dx}\) 的微分方程 \(H(\frac{dy}{dx}, x, c)=0\) ,然后将其中的项 \(\frac{dy}{dx}\) 换为 \(-\frac{dx}{dy}\) 得到另一个微分方程 \(H(-\frac{dx}{dy},x,y,c)=0\) ,并且联立F(x,y,c)=0(因为是交点,则对于交点(x,y)其既在正交轨线上也在曲线F(x,y,c)=0上),消除c后解出该微分方程即是曲线F(x,y,c)=0的正交轨线。

例:求圆曲线 \(F(x,y,r)=0:x^2+y^2=r^2\) 的正交轨线簇。
解:方程两边取微分 \(\frac{d\cdot}{dx}\)\(2x+2y\frac{dy}{dx}=0\) ,该方程即是导数(切线斜率)方程,而正交轨线意味着相交时正交,则变换导数形成关于正交轨线的微分方程 \(2x-2y\frac{dx}{dy}=0\) ,求解该方程(定义簇的圆半径r本没有出现故无需再联立 \(x^2+y^2=r^2\) ),

\[2x-2y\frac{dx}{dy}=0\implies \frac{dx}{x}=\frac{dy}{y}\implies \\ \int\frac{dx}{x}=\int\frac{dy}{y}\implies \ln|y|+C_1=\ln|x|+C_2\implies |y|=e^{C_3}|x| \implies y=Cx \]

,即正圆簇 \(x^2+y^2=r^2\) 的正交轨线是曲线簇 \(y=Cx\)


一阶线性微分方程 first-order Linear differential equation

the form

\[\frac{dy}{dx}+P(x)y=Q(x) \]


《Calculus》 by James Stewart, 以丰富的示例进行讲解,其定义概念的风格大多先是描述性、直观性思维方式,再者形式化语言描述,对入门者较友好。


  • Calculus, by James Stewart, 8th edition.
