Introduction to PyTorch

The purpose of this notebook is to introduce you to the basics of PyTorch, the deep learning framework that we will be using for the labs.

Many good introductions to PyTorch are available online, including the 60 Minute Blitz on the official PyTorch website. This notebook is designed to put focus on those basics that you will encounter in the labs. Beyond the notebook, you will also need to get comfortable with the PyTorch documentation.

We start by importing the PyTorch module:

import torch

The following code prints the current version of the module:

print(torch.__version__)

The version of PyTorch at the time of writing this notebook was 1.10.1.

Tensors

The fundamental data structure in PyTorch is the tensor, a multi-dimensional matrix containing elements of a single numerical data type. Tensors are similar to arrays as you may know them from NumPy or MATLAB.

Creating tensors

One way to create a tensor is to call the function torch.tensor() on a Python list or NumPy array.

The code in the following cell creates a 2-dimensional tensor with 4 elements.

x = torch.tensor([[0, 1], [2, 3]])
x

Each tensor has a shape, which specifies the number and sizes of its dimensions:

x.shape

Each tensor also has a data type for its elements. More information about data types

x.dtype

When creating a tensor, you can explicitly pass the intended data type as a keyword argument:

y = torch.tensor([[0, 1], [2, 3]], dtype=torch.float)
y.dtype

For many data types, there also exists a specialised constructor:

z = torch.FloatTensor([[0, 1], [2, 3]])
z.dtype

More creation operations

Create a 3D-tensor of the specified shape filled with the scalar value zero:

x = torch.zeros(2, 3, 5)
x

Create a 3D-tensor filled with random values:

x = torch.rand(2, 3, 5)
x

Create a tensor with the same shape as another one, but filled with ones:

y = torch.ones_like(x)
y    # shape: [2, 3, 5]

For a complete list of tensor-creating operations, see Creation ops.

Embrace vectorisation!

Iteration is of one the most useful techniques for processing data in Python. However, you should not loop over tensors. Instead, you should be looking at vectorising any operations. This is because looping over tensors is slow, while vectorised operations on tensors are fast (and can be made even faster when the code is run on a GPU). To illustrate this point, let us create a 1D-tensor containing the first 1M integers:

x = torch.arange(1000000)
x

Summing up the elements of the tensor using a loop is relatively slow:

sum(i for i in x)

Doing the same thing using a tensor operation is much faster:

x.sum()

Indexing and slicing

To access the contents of a tensor, you can use an extended version of Python’s syntax for indexing and slicing. Essentially the same syntax is used by NumPy. For more information, see Indexing on ndarrays.

To illustrate this, we create a 3D-tensor with random numbers:

x = torch.rand(2, 3, 5)
x

Index an element by a 3D-coordinate; this gives a 0D-tensor:

x[0,1,2]

(If you want the result as a non-tensor, use the method item().)

Index the second element; this gives a 2D-tensor:

x[1]

Index the second-to-last element:

x[-2]

Slice out the sub-tensor with elements from index 1 onwards; this gives a 3D-tensor:

x[1:]

Here is a more complex example of slicing. As in Python, the colon : selects all indices of a dimension.

x[:,:,2:4]

The syntax for indexing and slicing is very powerful. For example, the same effect as in the previous cell can be obtained with the following code, which uses the ellipsis (...) to match all dimensions but the ones explicitly mentioned:

x[...,2:4]

Creating views

You will sometimes want to use a tensor with a different shape than its initial shape. In these situations, you can re-shape the tensor or create a view of the tensor. The latter is preferable because views can share the same data as their base tensors and thus do not require copying.

We create a 3D-tensor of 12 random values:

x = torch.rand(2, 3, 2)
x

Create a view of this tensor as a 2D-tensor:

x.view(3, 4)

When creating a view, the special size -1 is inferred from the other sizes:

x.view(3, -1)

Modifying a view affects the data in the base tensor:

y = torch.rand(2, 3, 2)
z = y.view(3, 4)
z[2, 3] = 42
y

More viewing operations

There are a few other useful methods that create views. More information about views

x = torch.rand(2, 3, 5)
x

The permute() method returns a view of the base tensor with some of its dimensions permuted. In the example, we maintain the first dimension but swap the second and the third dimension:

y = x.permute(0, 2, 1)
print(y)
y.shape

The unsqueeze() method returns a tensor with a dimension of size one inserted at the specified position. This is useful e.g. in the training of neural networks when you want to create a batch with just one example.

y = x.unsqueeze(0)
print(y)
y.shape

The inverse operation to unsqueeze() is squeeze():

y = y.squeeze(0)
print(y)
y.shape

Re-shaping a tensor

There are some cases where you cannot create a view and need to explicitly re-shape a tensor. In particular, this happens when the data in the base tensor and the view are not in contiguous regions of memory.

x = torch.rand(2, 3, 5)
x

We permute the tensor x to create a new tensor y in which the data is no longer consecutive in memory:

y = x.permute(0, 2, 1)
# y = y.view(-1)    # raises a runtime error
y

In such a case, you can explicitly re-shape the tensor, which will copy the data if necessary:

y = x.permute(0, 2, 1)
y = y.reshape(-1)
y

Modifying a reshaped tensor will not necessarily change the data in the base tensor. This depends on whether the reshaped tensor is able to share the data with the base tensor.

y = torch.rand(2, 3, 2)
y = y.permute(0, 2, 1)    # if commented out, data can be shared
z = y.reshape(-1)
z[0] = 42
y

Computing with tensors

Element-wise operations

Unary mathematical operations defined on numbers can be ‘lifted’ to tensors by applying them element-wise. This includes multiplication by a constant, exponentiation (**), taking roots (torch.sqrt()), and the logarithm (torch.log()).

x = torch.rand(2, 3)
print(x)
x * 2    # element-wise multiplication with 2

Similarly, we can do binary mathematical operations on tensors with the same shape. For example, the Hadamard product of two tensors \(X\) and \(Y\) is the tensor \(X \odot Y\) obtained by the element-wise multiplication of the elements of \(X\) and \(Y\).

x = torch.rand(2, 3)
y = torch.rand(2, 3)
torch.mul(x, y)    # shape: [2, 3]

The Hadamard product can be written more succinctly as follows:

x * y

Matrix product

When computing the matrix product between two tensors \(X\) and \(Y\), the sizes of the last dimension of \(X\) and the first dimension of \(Y\) must match. The shape of the resulting tensor is the concatenation of the shapes of \(X\) and \(Y\), with the last dimension of \(X\) and the first dimension of \(Y\) removed.

x = torch.rand(2, 3)
y = torch.rand(3, 5)
torch.matmul(x, y)    # shape: [2, 5]

The matrix product can be written more succinctly as follows:

x @ y

Sum and argmax

Let us define a tensor of random numbers:

x = torch.rand(2, 3, 5)
x

You have already seen that we can compute the sum of a tensor:

torch.sum(x)

There is a second form of the sum operation where we can specify the dimension along which the sum should be computed. This will return a tensor with the specified dimension removed.

torch.sum(x, dim=0)    # shape: [3, 5]
torch.sum(x, dim=1)   # shape: [2, 5]

The same idea also applies to the operation argmax() which returns the index of the component with the maximal value along the specified dimension.

torch.argmax(x)    # index of the highest component across all dimensions, numbered in consecutive order
torch.argmax(x, dim=0)   # index of the highest component across the first dimension

Concatenating tensors

A list of tensors can be combined into one long tensor by concatenation.

x = torch.rand(2, 3)
y = torch.rand(3, 3)
z = torch.cat([x, y])
print(z)
z.shape

You can also concatenate along a specific dimension:

x = torch.rand(2, 2)
y = torch.rand(2, 2)
print(x)
print(y)
print(torch.cat([x, y], dim=0))
print(torch.cat([x, y], dim=1))

Broadcasting

The term broadcasting describes how PyTorch treats tensors with different shapes. Subject to certain constraints, the ‘smaller’ tensor is ‘broadcast’ across the larger tensor so that they have compatible shapes. Broadcasting is a way to avoid looping. In short, if a PyTorch operation supports broadcasting, then its Tensor arguments can be automatically expanded to be of equal sizes (without making copies of the data).

In the simplest case, two tensors have the same shapes. This is the case for the matrix x @ W and the bias vector b in the linear model below:

x = torch.rand(1, 2)
W = torch.rand(2, 3)
b = torch.rand(1, 3)
z = x @ W    # shape: [1, 3]
z = z + b    # shape: [1, 3]
print(z)
z.shape

Now suppose that we have a whole batch of inputs. Watch what happens when adding the bias vector b:

X = torch.rand(5, 2)
Z = X @ W    # shape: [5, 3]
Z = Z + b    # shape: [5, 3]    Broadcasting happens here!
print(Z)
Z.shape

In the example, broadcasting expands the shape of b from \([1, 3]\) into \([5, 3]\). The matrix Z is formed by effectively adding b to each row of X. However, this is not implemented by a Python loop but happens implicitly through broadcasting.

PyTorch uses the same broadcasting semantics as NumPy. More information about broadcasting

To be expanded!