Link Search Menu Expand Document

CS329D: ML Under Distribution Shifts

A graduate course surveying topics in machine learning when the training and test data arise from different distributions.

Tatsunori Hashimoto


Office Hours: 11am-12pm Fri

Week 0 Announcement

Mar 8 · 0 min read

The course website is live! Any questions about the course or issues should be directed to



The progress of machine learning systems has seemed remarkable and inexorable — a wide array of benchmark tasks including image classification, speech recognition, and question answering have seen consistent and substantial accuracy gains year on year. However, these same models are known to fail consistently on atypical examples and domains not contained within the training data. This course will cover methods for understanding and improving machine learning under distributional shifts, where the training and test distribution for a model are mismatched.

Course goals

The course aims to cover recent research on the following topics:

  • Definition of various distribution shifts in terms of distributional overlap or as the result of changes to the environment.
  • Real-world distribution shifts: domain adaptation in NLP and vision as well as fairness in prediction tasks.
  • Methods for improving robustness: neural approaches, invariance constraints, and minimax losses.
  • Adversarial shifts: adversarial examples in image recognition, provable defenses, and data poisoning.

The goal of the course is to introduce the variety of areas in which distributional shifts are central and equip students with the fundamentals necessary to conduct research on developing more robust machine learning methods. Because of this goal, the course will aim to cover the classic papers and basic concepts in this area, rather than spend the quarter on any single task or problem.

Course activities

The course will consist of three kinds of activities

  • Lectures: The course will consist of 10 lectures, covering domain adaptation theory and methods, representation-based approaches to robustness, minimax methods, adversarial examples, and data poisoning.
  • Paper discussions: There will be 9 student driven discussion and critique sessions in which we go over and discuss selected papers in each area.
  • Project: Each student will be responsible for implementing and testing one of the methods from the class on a distribution shift task of their choice.

The instructors will have open office hours on zoom. Please check canvas for the zoom link (this is to restrict the office hours to enrolled students).

For details on grading and other accommodations see the course policies


All lectures, discussions, and office hours will be held over Zoom, and links will be posted on Canvas. Lectures and discussions will be recorded and posted on Canvas by the following day. Group office hours are listed on the staff page, and presenters for paper discussions can sign up for additional office hours via a link on Canvas. You will be submitting all assignments via Gradescope, and you will be automatically added in the first week of instruction. We will have course announcements on Piazza, which you can join using the access code shared on Canvas. If you would like to contact the course staff, please make a Piazza post or email us at

Weekly Schedule

Week-to-week schedule and papers covered are tentative, and may change by the start of the quarter.

Introduction and taxonomy of distribution shifts

Mar 30
  1. Overview of the course
  2. Distribution shifts in the real world
  3. A taxonomy of distribution shifts and how they arise
Lecture video Lecture notes
Apr 1
Covariate and label shifts
Lecture + Discussion
  1. What is a covariate shift?
  2. Handling covariate shift under distribuitonal overlap.
  3. Shortcut Learning in Deep Neural Networks
Lecture video Lecture notes
Apr 6
Covariate and label shifts 2
  1. Improving Predictive Inference Under Covariate Shift by Weighting the Log-Likelihood
  2. Adjusting the Outputs of a Classifier to New a Priori Probabilities: A Simple Procedure
Discussion video

Domain adaptation and impossibility results

Apr 8
Domain adaptation
  1. When can we provably learn under distribution shift?
  2. Can unlabeled data help?
  3. Defining generalization bounds under distribution shift.
Lecture video Lecture notes
Apr 13
Domain adaptation 2
  1. A Theory of Learning from Different Domains
  2. Optimal Transport for Domain Adaptation
Discussion video
Apr 15
Impossibility theorems
  1. When do domain adaptation bounds fail?
  2. Impossibility bounds for covariate shift
Lecture video Lecture notes
Apr 20
Impossibility theorems 2
  1. On the Hardness of Domain Adaptation and the Utility of Unlabeled Target Samples
  2. Learning Bounds for Importance Weighting
Discussion video

Neural and representation-based methods

Apr 22
Neural domain adaptation
  1. Indistinguishability over representations.
  2. Adversarial approaches to neural domain adaptation.
  3. Connections to classical theory.
Lecture video
Apr 27
Neural domain adaptation 2
  1. Domain Adversarial Training of Neural Networks
  2. Conditional Adversarial Domain Adaptation
Discussion video
Apr 29
Learning from invariant representations
Lecture + Project (Progress report due)
  1. Provable guarantees from representational indistinguishability
  2. Tradeoffs between overlap and covariate shift
  3. Challenges to representation-based approaches.
Lecture video
May 4
Learning from invariant representations 2
  1. On Learning Invariant Representations for Domain Adaptation
  2. Support and Invertibility in Domain-Invariant Representations
Discussion video

Causal and minimax approaches to domain generalization

May 6
Connections to causality
  1. Distribution shifts as arising from causal interventions.
  2. Existing connections between causality and robustness.
  3. Robustness and invariance as tools for causal inference.
Lecture video
May 11
Connections to causality 2
  1. Conditional Variance Penalties and Domain Shift Robustness
  2. Invariant Risk Minimization
Discussion video
May 13
Minimax methods
  1. Robustness as a minimax game between nature and the model.
  2. Tractable families of worst-case distributions and duality.
  3. Pitfalls and pessimism from worst-case bounds.
Lecture video
May 18
Minimax methods 2
  1. Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
  2. Learning Models with Uniform Performance via Distributionally Robust Optimization

Adversarial robustness

May 20
Adversarial examples
  1. Defining and motivating adversarial examples.
  2. Heuristic defenses and their pitfalls
  3. Provable defenses.
Lecture video
May 25
Adversarial examples 2
  1. Unlabeled Data Improves Adversarial Robustness
  2. Certified Adversarial Robustness via Randomized Smoothing
May 27
Data poisoning
  1. Classical robust statistics
  2. High-dimensional mean estimation
  3. Convex optimization under data poisoning
Jun 1
Data Poisoning 2
  1. Recent Advances in Algorithmic High-Dimensional Robust Statistics
  2. SEVER: A Robust Meta-Algorithm for Stochastic Optimization
Jun 3
Short project presentations