(SP24) CS 598 AIE - AI Efficiency: Systems & Algorithms

Schedule (Tentative)

Date Presenter Topics/Readings Slides
Jan 16 Minjia Zhang Course Introduction
Jan 18 Minjia Zhang Training Efficiency pdf
Jan 23 Minjia Zhang Inference Effiiency pdf
System Optimizations for Training Massive Models
Jan 25 Olatunji Ruwase (Invited speaker) DeepSpeed Library pdf
Jan 30 Yiqi Liu (ZeRO-style data parallelism) ZeRO: Memory Optimizations Toward Training Trillion Parameter Models (SC 2020)
pdf
Feb 1 Haoyang Zhang ZeRO-Offload: Democratizing Billion-Scale Model Training (ATC 2021)
pdf
Feb 6 Yufeng Du (Tensor-slicing parallelism) Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism (Arxiv 2019)
pdf
Feb 8 Siyuan Chai ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning (SC 2021)
pdf
Feb 13 Gangmuk Lim,
Ahan Gupta
POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging (ICML 2022)
(Sequence parallelism) Reducing Activation Recomputation in Large Transformer Models (Arxiv 2022)
pdf
pdf
System Optimizations for Low Inference Latency and Cost
Feb 15 Yuhao Ge,
Yuqi Xue
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning (NeurIPS 2022)
(vLLM) Efficient Memory Management for Large Language Model Serving with PagedAttention (SOSP 2023)
pdf
pdf
Feb 20 Yanzhuo Chen ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs (IPDPS 2023 Best Paper)
pdf
Feb 22 Aditya Prerepa Orca: A Distributed Serving System for Transformer-Based Generative Models (OSDI 2022)
pdf
Feb 27 Vignesh Suresh Efficiently Scaling Transformer Inference (Arxiv 2022)
pdf
Feb 29 Steven Gao FlexGen: High-throughput Generative Inference of Large Language Models with a Single GPU (ICML 2023)
pdf
Efficient Algorithms to Make DL Models Smaller, Faster, and Cheaper
Mar 5 Xinyu Lian,
Mayank Bhatia
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers (NeurIPS 2022)
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models (ICML 2023)
pdf
pdf
Pre-proposal meetings
Mar 7 Students & Instructors Pre-proposal: 15 minutes each group (need schedule ahead-of-time)
Efficient Algorithms to Make DL Models Smaller, Faster, and Cheaper
Mar 19 Selin Yildirim,
-
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time (ICLR 2023)
H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models (ICML 2023)
pdf
-
Mar 21 Akhil Bhimaraju,
Lingzhi Zhao
Efficient Streaming Language Models with Attention Sinks (Arxiv 2023)
Fast Inference from Transformers via Speculative Decoding (ICML 2023)
pdf
pdf
Mar 26 Akshat Sharma,
Henry Zhu
Mixed Precision Training (Arxiv 2017)
QLoRA: Efficient Finetuning of Quantized LLMs (NeurIPS 2023)
pdf
pdf
System and Algorithm Co-Design for Efficient Training and Inference
Mar 28 Wanyu Zhao E.T.: re-thinking self-attention for transformer models on GPUs (SC 2021)
pdf
Apr 2 Bakshree Mishra Training and Inference of Large Language Models using 8-bit Floating Point (Arxiv 2023)
pdf
Apr 11 Chunyuan Li (Invited speaker) Invited Talk on multi-modal pdf
Apr 16 Wei Wen (Invited speaker) Invited Talk on NAS + DLRM from Meta
Efficiency Improvements for Emerging Real-World Models and Applications
Apr 4 Ritik Dutta Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (JMLR 2022)
pdf
Apr 9 Hari Umesh When Parameter-efficient Tuning Meets General-purpose Vision-language Models (Arxiv 2023)
pdf
Apr 18 James Soole InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation (Arxiv 2023)
pdf
Apr 23 Tanay Dixit AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (Arxiv 2023)
pdf
Apr 25 Haochen Shen Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Arxiv 2023)
pdf
Apr 30 Zhenrui Yue Scalable Diffusion Models with Transformers (CVPR 2023)
pdf
TBD Final Project Presentations