Date |
Presenter |
Topics/Readings |
Slides |
Jan 16 |
Minjia Zhang |
Course Introduction |
|
Jan 18 |
Minjia Zhang |
Training Efficiency |
pdf |
Jan 23 |
Minjia Zhang |
Inference Effiiency |
pdf |
System Optimizations for Training Massive Models |
Jan 25 |
Olatunji Ruwase (Invited speaker) |
DeepSpeed Library |
pdf |
Jan 30 |
Yiqi Liu |
(ZeRO-style data parallelism) ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
(SC 2020)
|
pdf |
Feb 1 |
Haoyang Zhang |
ZeRO-Offload: Democratizing Billion-Scale Model Training (ATC 2021)
|
pdf |
Feb 6 |
Yufeng Du |
(Tensor-slicing parallelism) Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism (Arxiv 2019)
|
pdf |
Feb 8 |
Siyuan Chai |
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
(SC 2021)
| pdf |
Feb 13 |
Gangmuk Lim, Ahan Gupta |
POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging (ICML 2022)
(Sequence parallelism) Reducing Activation Recomputation in Large Transformer Models (Arxiv 2022)
|
pdf pdf |
System Optimizations for Low Inference Latency and Cost |
Feb 15 |
Yuhao Ge, Yuqi Xue |
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning (NeurIPS 2022)
(vLLM) Efficient Memory Management for Large Language Model Serving with PagedAttention (SOSP 2023)
|
pdf pdf |
Feb 20 |
Yanzhuo Chen |
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs (IPDPS 2023 Best Paper)
|
pdf |
Feb 22 |
Aditya Prerepa |
Orca: A Distributed Serving System for Transformer-Based Generative Models
(OSDI 2022)
|
pdf |
Feb 27 |
Vignesh Suresh |
Efficiently Scaling Transformer Inference (Arxiv 2022)
|
pdf |
Feb 29 |
Steven Gao |
FlexGen: High-throughput Generative Inference of Large Language Models with a Single GPU (ICML 2023)
|
pdf |
Efficient Algorithms to Make DL Models Smaller, Faster, and Cheaper |
Mar 5 |
Xinyu Lian, Mayank Bhatia |
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers (NeurIPS 2022)
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models (ICML 2023)
|
pdf pdf |
Pre-proposal meetings |
Mar 7 |
Students & Instructors |
Pre-proposal: 15 minutes each group (need schedule ahead-of-time) |
|
Efficient Algorithms to Make DL Models Smaller, Faster, and Cheaper |
Mar 19 |
Selin Yildirim, - |
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time (ICLR 2023)
H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models (ICML 2023)
|
pdf - |
Mar 21 |
Akhil Bhimaraju, Lingzhi Zhao |
Efficient Streaming Language Models with Attention Sinks (Arxiv 2023)
Fast Inference from Transformers via Speculative Decoding (ICML 2023)
|
pdf pdf |
Mar 26 |
Akshat Sharma, Henry Zhu |
Mixed Precision Training (Arxiv 2017)
QLoRA: Efficient Finetuning of Quantized LLMs (NeurIPS 2023)
|
pdf pdf |
System and Algorithm Co-Design for Efficient Training and Inference |
Mar 28 |
Wanyu Zhao |
E.T.: re-thinking self-attention for transformer models on GPUs (SC 2021)
|
pdf |
Apr 2 |
Bakshree Mishra |
Training and Inference of Large Language Models using 8-bit Floating Point (Arxiv 2023)
|
pdf |
Apr 11 |
Chunyuan Li (Invited speaker) |
Invited Talk on multi-modal |
pdf |
Apr 16 |
Wei Wen (Invited speaker) |
Invited Talk on NAS + DLRM from Meta |
|
Efficiency Improvements for Emerging Real-World Models and Applications |
Apr 4 |
Ritik Dutta |
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (JMLR 2022)
|
pdf |
Apr 9 |
Hari Umesh |
When Parameter-efficient Tuning Meets General-purpose Vision-language Models (Arxiv 2023)
|
pdf |
Apr 18 |
James Soole |
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation (Arxiv 2023)
|
pdf |
Apr 23 |
Tanay Dixit |
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (Arxiv 2023)
|
pdf |
Apr 25 |
Haochen Shen |
Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Arxiv 2023)
|
pdf |
Apr 30 |
Zhenrui Yue |
Scalable Diffusion Models with Transformers (CVPR 2023)
|
pdf |
TBD |
|
Final Project Presentations |
|