Architectures for written-text processing¶

In this module, we will study some neural models used to process texts. The professor of this module is Juan Antonio Pérez Ortiz. The module begins with a review of the functioning of logistic regression, which will help us establish the necessary knowledge to understand subsequent models. Next, we study in some detail skip-grams, one of the algorithms for obtaining non-contextual word embeddings. Then, we review the functioning of feedforward neural architectures and study their application to language models. The ultimate goal is to address the study of the most important architecture in current text processing systems: the transformer. Once we have studied these architectures, we will conclude with an analysis of the functioning of pretrained models (foundational models) in general, and language models in particular.

Class materials complement the reading of some chapters from a textbook ("Speech and Language Processing" by Dan Jurafsky and James H. Martin, third edition draft, available online) with annotations made by the professor.

First session of this module (December 11, 2024)¶

Contents to prepare before the session on Dec 11¶

The activities to complete before this class are:

Reading and studying the contents of this page on logistic regression. As you will see, the page indicates which contents you should read from the book. After a first reading, read the professor's annotations, whose purpose is to help you understand the key concepts of the chapter. Then, perform a second reading of the book's chapter. In total, this part should take you about 4 hours 🕒️ of work.
Watching and studying the video tutorials in this official PyTorch playlist. Study at least the first 4 videos (“Introduction to PyTorch”, “Introduction to PyTorch Tensors”, “The Fundamentals of Autograd”, and “Building Models with PyTorch”). In total, this part should take you about 2 hours 🕒️ of work.
Reading and studying the contents of this page on embeddings. As you will see, the page indicates which contents you should read from the book. After a first reading, read the professor's annotations to help you understand the key concepts of the chapter. Then, perform a second reading of the chapter from the book. In total, this part should take you about 3 hours 🕒️ of work.
After completing the previous parts, take this assessment test on these contents. There are few questions, and it will take you a few minutes.

Contents for the in-person session on Dec 11¶

In the in-person class (5 hours 🕒️ long), we will see how to implement a logistic regressor in PyTorch by following the implementations of a binary logistic regressor and a multinomial one discussed in this section. We will also explore an implementation of the skip-gram algorithm discussed here.

The idea is for you to study and slightly modify the notebooks we are working on. In a later class, a more advanced assignment involving modifying the transformer's code will be presented.

Second session (December 18, 2024)¶

Contents to prepare before the session on Dec 18¶

The activities to complete before this class are:

Reading and studying the contents of this page on feedforward neural networks and their use as very basic language models. Perform at least two readings complemented with the professor's notes as in the previous point. In total, this part should take you about 2 hours 🕒️ of work.
Reading and studying the contents of this page as an introduction to transformers. As always, perform at least two readings complemented with the professor's notes. In total, this part should take you about 4 hours 🕒️ of work.
After completing the previous parts, take this assessment test on these contents. There are few questions, and it will take you a few minutes.
If you have time left, take the opportunity to review the contents of the first session.

Contents for the in-person session on Dec 18¶

In the in-person class (5 hours 🕒️ in duration), we will see how to implement in PyTorch a language model based on a feedforward neural network , and a transformer following the implementations discussed in this section and the next two.

The idea is for you to study and slightly modify the notebooks we are working on. We will also present the assignment on mechanistic interpretability you need to submit for this module of the course.

Third session (January 8, 2025)¶

Contents to prepare before the session on Jan 8¶

The activities to complete before this class are:

Reading and studying the contents of this page on the complete transformer model (with encoder and decoder) and the possible uses of an architecture that only includes the encoder. As you will see, the page indicates which contents you should read from the book. In particular, you will need to read some sections of the chapter on machine translation and others from the chapter on pretrained models, in addition to standalone sections on beam search and subword tokenization. After a first reading, read the professor's annotations to help you understand the key concepts of each section. Then, perform a second reading of the book's contents. In total, this part should take you about 4 hours 🕒️ of work.
Watching and studying Jesse Mu's lecture titled “Prompting, Reinforcement Learning from Human Feedback” from Stanford's CS224N course in 2023 about language models based on the transformer's decoder. This should take you about 2 hours 🕒️ of work, as you'll need to take notes so you don't have to rewatch the video when reviewing. Downloading the slides and annotating them may be helpful. Regarding the topic discussed between minutes 39 and 46, you can simply focus on the basic ideas, as the reinforcement learning equations are not a priority topic for this course and will be covered in other courses. It's important to review what you've already studied about transformers as a language model based on the decoder before watching the video. Don't be confused by encoder-based models also sometimes being called language models. This video discusses the properties of decoder-based models initially trained to predict the next token in a sequence.
Study the description of multilingual models in this section of one of the pages on transformers. It's a brief section that will take you about 🕒️ 15 minutes.
After completing the previous parts, take this assessment test on these contents. There are few questions, and it will take you a few minutes.
If you have time left, take the opportunity to review all the contents from previous sessions.

Contents for the in-person session on Jan 8¶

In the in-person class (5 hours 🕒️ in duration), we will see how to implement on top of our transformer architecture code both a language model based on a decoder and a named entity recognition model based on an encoder.

We will take the opportunity to review some aspects of the code from previous sessions and relate theoretical aspects with practical ones.

Fourth session (January 15, 2025)¶

This fourth session is actually the first and only session on the topic of speech. See the page on speech to view the contents prior to this session.