Module 1

This module introduces one of the most fundamental ideas in modern NLP: the idea that words can be represented as vectors which can be learned from text data. On the application side of things, we focus on one of the most fundamental NLP tasks: text categorization.

Before working with the material in this module, you may want to have a look at the following:

We will discuss the material during the first course meeting. Please see the meeting page for details.

Unit 1-1: Introduction to representations of words and documents

In the first unit, we introduce the idea of word representations: the basic building block that we use to build deep learning models that process language. Specifically, we look at word embeddings and how they can be trained. We also discuss some challenges in representing words that can be solved by working with lower-level subword units.

Title Slides Video
Introduction to word representations [slides] [video]
Learning word embeddings with neural networks [slides] [video]
Subword models [slides] [video]

Unit 1-2: Language modelling and contextualized embeddings

This unit begins with an overview of language modelling. It highlights the historical significance of n-gram models in NLP, which laid the foundation for the transition to neural language models. We continue with an exploration of pre-Transformer neural architectures, specifically focusing on recurrent neural networks (RNNs) and the pivotal Long Short-Term Memory (LSTM) architecture. At the end of the unit, we explore the use of RNNs as language models and introduce the concept of contextualised embeddings, which recognize the varying meanings of words across different contexts.

Lecture videos

Title Slides Video
Introduction to language modelling [slides] [video] (14:49)
N-gram language models [slides] [video] (14:46)
Neural language models [slides] [video] (14:49)
Recurrent neural networks [slides] [video] (14:47)
The LSTM architecture [slides] [video] (14:45)
RNN language models [slides] [video] (14:32)
Contextualized word embeddings [slides] [video] (14:48)

Unit 1-3: Transformer-based language models

This unit explores the evolution of transformer-based language models, starting with the sequence-to-sequence or encoder–decoder architecture for neural machine translation. We then delve into the concept of attention, followed by the Transformer architecture as such, which sets a benchmark in machine translation and various natural language processing applications. We go through Transformer-based models, specifically the GPT family, which derives from the Transformer’s decoder side, and BERT, which utilizes the encoder side. The unit ends by discussing text generation algorithms, including beam search generation.

Lecture videos

Title Slides Video
Neural machine translation [slides] [video] (14:48)
Attention [slides] [video] (14:49)
The Transformer architecture [slides] [video] (14:45)
Decoder-based language models (GPT) [slides] [video] (14:50)
Encoder-based language models (BERT) [slides] [video] (14:44)
Generation algorithms [slides] [video] (26:42)

Reading