CS 498 Machine Learning System, Spring 2025

Basic Information

Instructor: Minjia Zhang
Schedule: Tuesdays and Thursdays 2-3:15pm CST
Location: 1214 Siebel Center for Computer Science
Instructor Email: minjiaz AT illinois.edu
TA: Ahan Gupta
TA Email: ag82@illinois.edu
TA Office hours: Every Friday 1-2pm virtually on Zoom Link
LMS: Canvas
Recommended Prerequisites: CS 425 - Distributed Systems CS 484 - Parallel Programming, CS 533 - Parallel Computer Architecture, CS 446 - Machine Learning

Course Description

Welcome to the Spring 2025 offering of CS 498: Machine Learning System!

This is a new undergraduate course offered for the first time at UIUC. Therefore, we might adjust the schedule and content depending on your learning progress.

The goal of this course is to provide students with an in-depth understanding of various elements of modern machine learning systems, ranging from the performance characteristics of ML models such as transformers and diffusers, performance optimization techniques that reduce the compute, memory, and communication for training and inference of large ML models, and compression algorithms that make ML models smaller and cheaper. The course will also conduct case studies on modern large language model training and serving and cover the design rationale behind state-of-the-art machine learning frameworks.

Course Schedule

Course Policy

Grading

The course assignments consist of (i) attendance and class participation, (ii) lab assignments, (iii) reading summary, (iv) final project presentation, and (v) completing an open-ended research project. The breakdown is as follows.

Grading Breakdown
Attendance and class participation	20%
Lab assignments	20% (2 lab assignments, 10% each)
Reading summary	20% (10 readings, 2% each)
Final research project presentation	15%
Project report	25% (5% + 5% + 15%)

Paper Reviews

The instruction will select 10 highly relevant papers in MLSys (mostly under 12 pages). One paper per week (starting from Jan 27), submit your reading by the end of day of Friday midnight of each week.

Reading List

The reading summary should be done independently and include the following contents:

The problem the paper is trying to tackle.
What's the impact of the work, e.g., why is it an important problem to solve?
The main proposed idea(s).
A summary of your understanding of different components of the proposed technique, e.g., the purpose of critical design choices.
Your perceived strengths and weaknesses of the work, e.g., novelty, significance of improvements, quality of the evaluation, easy-to-use.
Is there room for improvement? If so, what idea do you have for improving the techniques?

The reading summary length should be around 4-5 paragraphs. However, you do not need to write super long paragraphs, as long as you have the key points listed out in each paragraph. You can discuss the paper with other students, but all of your writing work should be your own.

In terms of grading criteria, each summary has 12 points in total. For each review item above, you get:

2: The summary item demonstrates a clear understanding of the paper.
1: The summary item misses the point of the paper.
0: The summary item is missing.

Course Project

The course also includes proposing and completing a course project. The project can involve, but is not limited to, any of the following tasks:

Benchmark and analyze important DL workloads to understand their performance gap and identify important angles to optimize their performance.
Apply and evaluate how existing solutions work in the context of emerging AI/DL workloads.
Design and implement new algorithms that are both theoretically and practically efficient.
Design and implement system optimizations, e.g., parallelism, cache-locality, IO-efficiency, to improve the compute/memory/communication efficiency of AI/DL workloads.
Offer customized optimization for critical DL workloads where latency is extremely tight.
Build library/tool/framework to improve the efficiency of a class of problems.
Integrate important optimizations into existing frameworks (e.g., DeepSpeed), providing fast and agile inference.
Combine system optimization with modeling optimizations.
Combine and leverage hardware resources (e.g., GPU/CPU, on-device memory/DRAM/NVMe/SSD) in a principled way.
...

The project will be done in groups of 2-3 people, which consists of a proposal, mid-term report, final presentation, and final report. The tentative timeline for the project is as follows.

Late Submissions

All assignments are due on the respective due date. Only on-time assignments will be accepted.

Computing Resources

We will be using the National Center for Supercomputing Applications (NCSA).