Deep Learning Week1 Notes
1. Tensors
\(\text{A tensor is a generalized matrix:}\)
\(\text{an element of }\mathbb{R^3} \text{ is a 3-dimension vector, but it's a 1-dimension tensor.}\)
\(\large \text{The 'dimension' of a tensor is the number of indices.}\)
2. PyTorch operation
@
\(\text{ corresponds to matrix/vector or matrix/matrix multiplication.}\)
*
\(\text{ is component-wise product.}\)
lstsq
:\(\text{ least square: }mq = y\)
>>>y = torch.randn(3)
>>>y
tensor([ 1.3663, -0.5444, -1.7488])
>>>m = torch.randn(3,3)
>>>q = torch.linalg.lstsq(m,y).solution
>>>m@q
tensor([ 1.3663, -0.5444, -1.7488])
3. Data Sharing
>>> a = torch.full((2, 3), 1)
>>> a
tensor([[1, 1, 1],
[1, 1, 1]])
>>> b = a.view(-1)
>>> b
tensor([1, 1, 1, 1, 1, 1])
>>> a[1, 1] = 2
>>> a
tensor([[1, 1, 1],
[1, 2, 1]])
>>> b
tensor([1, 1, 1, 1, 2, 1])
>>> b[0] = 9
>>> a
tensor([[9, 1, 1],
[1, 2, 1]])
>>> b
tensor([9, 1, 1, 1, 2, 1])
\(\large \text{Note: many operations returns a new tensor which shares the same underlying storage as the original tensor, so changing the values of one will change the other as well:}\) view
, transpose
,
squeeze
, unsqueeze
, expand
, permute
.
4. Einstein summation convention
torch.einsum
:
\(\text{Matrix Multiplication}\)
>>> p = torch.rand(2, 5)
>>> q = torch.rand(5, 4)
>>> torch.einsum('ij,jk->ik', p, q)
tensor([[2.0833, 1.1046, 1.5220, 0.4405],
[2.1338, 1.2601, 1.4226, 0.8641]])
>>> p@q
tensor([[2.0833, 1.1046, 1.5220, 0.4405],
[2.1338, 1.2601, 1.4226, 0.8641]])
\(\text{Matrix-Vector product:}\)
w = torch.einsum('ij,j->i', m, v)
\(\text{Component-wise Product:}\)
m = torch.einsum('ij,ij->ij', p, q)
\(\text{Trace:}\)
v = torch.einsum('ii->i', m)
5. Storage
>>> x = torch.zeros(2, 4)
>>> x.storage()
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
[torch.FloatStorage of size 8]
>>> q = x.storage()
>>> q[4] = 1.0
>>> x
tensor([[ 0., 0., 0., 0.],
[ 1., 0., 0., 0.]])
\(\large \text{The main idea of functions like }\)view
, narrow
, transpose
, \(\large\text{ etc. and of operations involving broadcasting is to never replicate data in memory, but to “play” with the offsets and strides of the underlying storage.}\)
\(\text{Therefore:}\)
>>> x = torch.empty(100, 100)
>>> x.t().view(-1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: invalid argument 2: view size is not compatible with
input tensor's size and stride (at least one dimension spans across
two contiguous subspaces). Call .contiguous() before .view()
x.t()
\(\text{ shares the storage with }\)x
, \(\text{ cannot flatten to 1d}\). \(\text{We can use the function }\)reshape()
.