CS 598 AIE - AI Efficiency: Systems & Algorithms, Fall 2024

Basic Information

Instructor: Minjia Zhang
Schedule: Wednesday and Friday 12:30-1:45pm CST
Location: 1302 Siebel Center for Computer Science
Office hours: By appointment
Instructor Email: minjiaz AT illinois.edu
TA: Akshat Sharma
TA Email: akshat7 AT illinois.edu
LMS: Canvas
Recommended Prerequisites: CS 425 - Distributed Systems CS 484 - Parallel Programming, CS 533 - Parallel Computer Architecture, CS 446 - Machine Learning

Course Description

Are you curious about how system techniques enable today's large-scale model training and deliver ultra-fast inference? Do you have a passion for making AI accessible to all by using advanced system and algorithm techniques, significantly reducing the cost of training and serving deep learning models? If so, this course is for you.

This is a research-oriented course on AI efficiency, which covers both the core concepts of AI systems and algorithmic methods. Students will learn about system/algorithm design and analysis, and performance optimizations of algorithm implementations. The course will look into the design and implementation of seminal works in the field of AI efficiency, such as various parallelism strategies that enable large-scale model training on parallel/distributed/heterogeneous hardware. The course will also go over inference optimization techniques, including parallel, cache-efficient, compression-based algorithms for fundamental problems in AI efficiency. Many of the principles of system and algorithm design will be illustrated in the context of natural language processing and large language models. Students will read and present research papers, write paper reviews, participate in classroom discussions, and complete a semester-long research project.

Course Schedule

Course Policy

Lectures

In each lecture, we will study 1-2 research papers. For each paper, one student will give a presentation followed by a discussion.

Grading

The course assignments consist of (i) writing paper reviews, (ii) doing paper presentations, (iii) class participation, and (iv) completing an open-ended research project. The breakdown is as follows.

Grading Breakdown
Paper reviews20%
Paper presentations25%
Research project35%
Class participation20%

Paper Reviews

Before each lecture, you are expected to have read the papers to be presented in the schedule for that lecture. Furthermore, students also need to submit a review of the paper to be discussed before every lecture. The review should be on one paper chosen from papers for that lecture.

The review should be done independently and include the following contents:
  • The problem the paper is trying to tackle.
  • What's the impact of the work, e.g., why is it an important problem to solve?
  • The main proposed idea(s).
  • A summary of your understanding of different components of the proposed technique, e.g., the purpose of critical design choices.
  • Your perceived strengths and weaknesses of the work, e.g., novelty, significance of improvements, quality of the evaluation, easy-to-use.
  • Is there room for improvement? If so, which directions you may want to explore or idea you have for improving the techniques?
The review length should be around 4-5 paragraphs. However, you do not need to write super long paragraphs, as long as you have the key points listed out in each paragraph. You can discuss the paper with other students, but all of your writing work should be your own.

In terms of grading criteria, each review has 12 points in total. For each review item above, you get:
  • 2: The review item demonstrates a clear understanding of the paper.
  • 1: The review item misses the point of the paper.
  • 0: The review item is missing.
For each paper, the instructor will select a "best review" and post it on the course website.

Paper Presentations

Each student will choose and give at least one presentation from the AI efficiency reading list in the semester. The instructor has created a tentative schedule at schedule. Column A of the sheet contains the category of the paper; Column B contains the paper name (the link can be found in the reading list). For each student, please select two papers that you are interested in presenting and add your name in either column D or column E (first come, first served). Please make your selection by 11:59pm August 30st.

The presentations should be 25-30 minutes long with slides, followed by a discussion. The presenter shall share their learnings from the paper and the content of the presentation can include:
  • The background and motivation of the problem. (4 points)
  • Existing work in the field. If any relevant information from related work is necessary for fully understanding the presented paper, that should be included as well. Also include key concepts and definitions needed to understand the paper. (4 points)
  • Key ideas and design in the paper. (4 points)
  • Evaluation methodology and experimental results. (4 points)
  • Your own thoughts on the paper, such as the strengths and weaknesses, the purpose of design choices, the future directions based on the work.(4 points)
Each presentation has 20 points in total, where each part above has 4 points. You will get: 4) excellent, 3) very good, 2) good, 1) fair based on the correctness, organization, clarify, graphics, and time management of each part.

Research Project

A large portion of the work of this course is in proposing and completing a research project. The project can involve, but certainly is not limited to, any of the following tasks:
  • Benchmark and analyze important DL workloads to understand their performance gap and identify important angles to optimize their performance.
  • Apply and evaluate how existing solutions work in the context of emerging AI/DL workloads.
  • Design and implement new algorithms that are both theoretically and practically efficient.
  • Design and implement system optimizations, e.g., parallelism, cache-locality, IO-efficiency, to improve the compute/memory/communication efficiency of AI/DL workloads.
  • Offer customized optimization for critical DL workloads where latency is extremely tight.
  • Build library/tool/framework to improve the efficiency of a class of problems.
  • Integrate important optimizations into existing frameworks (e.g., DeepSpeed), providing fast and agile inference.
  • Combine system optimization with modeling optimizations.
  • Combine and leverage hardware resources (e.g., GPU/CPU, on-device memory/DRAM/NVMe/SSD) in a principled way.
  • ...

The project can be related to a research project that you are currently working on, but it requires the instructor's approval.

The project will be done in groups of 2-3 people, which consists of a proposal, mid-term report, final presentation, and final report. The tentative timeline for the project is as follows.

AssignmentDue Date
Proposal submission10/13
Weekly progress reports10/20, 10/27, 11/3
Midterm report11/10
Project presentation12/4, 12/6
Final report 12/13

  • Proposal submission: The proposal should be 1 page (excluding references) and will describe the project that you are proposing to work on, the motivation, related works you have already identified, the main components of the project, as well as a projected weekly schedule that describes how you plan to accomplish the project throughout the semester. The instructor can provide feedback in terms of the direction and additional aspects that can be incorporated into the proposal.
  • Weekly progress reports: You will submit a weekly report due at 11:59pm on Sunday, starting 10/20. In the report, please describe (i) what was done, i.e., your progress on the project during the week, (ii) any issues that you encountered and need feedback, and (iii) what's next in order of priority. Each student will submit an individual progress report.
  • Midterm report: Each person needs to submit a midterm report, which summarizes what you have accomplished so far. If you work in a group, this shall include your contribution to the project. The report can also include any obstacles you have encountered, adjustments to the proposed tasks, and a schedule of the remaining tasks. This should be about 3 pages long (excluding references).
  • Project presentation: The course has final project presentations, where each group will present their research to the instructor and classmates, and learn about other projects.
  • Final report: The final report will be in the style of a research paper describing your project. The recommended length is about 6-8 pages long (excluding references) and a potential division can be:
    • An abstract, which summarizes the project (0.25 pages).
    • An introduction, which describes and motivates the problem and summarizes the main results of the work (0.75 pages).
    • A brief discussion of related work (0.5 pages).
    • A brief overview of preliminary and background knowledge needed to understand the paper (0.75 pages).
    • Analysis and characterization to show the existence and severity of the problem (1 page).
    • Main design (1-2 pages).
    • Implementation (0.5 pages).
    • Evaluation methodology and experiment results (1-2 pages).
    • Concluding remarks, which can include a discussion of open questions or directions for future work (0.25 pages).
    Students are encouraged to submit the outline of their final report to the instructor to get feedback before finalizing the writing.