Preprint
- Preprint "RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model", Fengxiang Bie, Yibo Yang, Zhongzhu Zhou, Adam Ghanem, Minjia Zhang, Zhewei Yao, Xiaoxia Wu, Connor Holmes, Pareesa Golnari, David A. Clifton, Yuxiong He, Dacheng Tao, Shuaiwen Leon Song
- Preprint "DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales", Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He
- Preprint "DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention", Zhewei Yao, Xiaoxia Wu, Conglong Li, Minjia Zhang, Heyang Qi, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He
- Preprint "Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding", Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu
2025
- HPCA2025 " Buffalo: Enabling Large-Scale GNN Training via Memory-Efficient Bucketization", Shuangyan Yang, Minjia Zhang, Dong Li (acceptance rate: 112/534 = 21%)
- HPCA2025 " VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference", Zihan Liu, Xinhao Luo, Junxian Guo, Wentao Ni,Yangjie Zhou,Yue Guan, Cong Guo, Weihao Cui, Yu Feng, Minyi Guo, Yuhao Zhu, Minjia Zhang, Jingwen Leng, Chen Jin (acceptance rate: 112/534 = 21%)
- PSB 2025 "Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions", Guangzhi Xiong∗, Qiao Jin∗, Xiao Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang
- ICSE 2025 "Large Language Models as Configuration Validators", Xinyu Lian, Yinfang Chen, Runxiang Cheng, Jie Huang, Parth Thakkar, Minjia Zhang, Tianyin Xu (acceptance rate: 22%)
2024
- NeurIPS 2024 "UltraEdit: Instruction-based Fine-Grained Image Editing at Scale", Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao Chang (acceptance rate: 460/1820 = 25.3%)
- PODC 2024 "DeepSpeed-Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models", Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Reza Yazdani Aminabadi, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He (acceptance rate: 21.3%)
- NSDI 2024 "Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances", Jiangfei Duan, Ziang Song, Xupeng Miao, Xiaoli Xi, Dahua Lin, Harry Xu, Minjia Zhang, Zhihao Jia (acceptance rate: 40/227=17.6%)
- AAAI 2024 "DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing", Conglong Li, Zhewei Yao, Xiaoxia Wu, Minjia Zhang, Connor Holmes, Cheng Li, Yuxiong He (acceptance rate: 2342/12100 = 23.75%)
- ICLR 2024 (Oral) "Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs", Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, Jianfeng Gao (acceptance rate: 85/7304 = 1.16%)
- SIGMOD 2024 "Vexless: A Serverless Vector Data Management System Using Cloud Functions", Yongye Su, Yinqi Sun, Minjia Zhag, Jianguo Wang
- Nature Methods 2024 "OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization", with Gustaf Ahdritz, Nazim Bouatta, et. al.
2023
- ICLR 2023 "Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam", Yucheng Lu, Conglong Li, Minjia Zhang, Christopher De Sa, Yuxiong He (acceptance rate: 1574/4956=32%)
- ASPLOS 2023 "Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning", Shuangyan Yang, Minjia Zhang, Wenqian Dong, Dong Li (acceptance rate: 128/598=21%)
- NSDI 2023 "Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs", John Thorpe, Pengzhan Zhao, Jonathan Eyolfson, Yifan Qiao, Zhihao Jia, Minjia Zhang, Ravi Netravali, Guoqing Harry Xu (acceptance rates: 50/272=18.4%)
- MobiCom 2023 "Cost-effective On-device Continual Learning over Memory Hierarchy with Miro", Xinyue Ma, Suyeon Jeong, Minjia Zhang, Di Wang, Jonghyun Choi, Myeonjae Jeon (acceptance rate: 440/2972=15%)
- PPoPP 2023 "iQAN: Fast and Accurate Vector Search with Efficient Intra-Query Parallelism on Multi-Core Architectures", Zhen Peng, Minjia Zhang, Kai Li, Ruoming Jin, Bin Ren (acceptance rate: 31/131=23.6%)
- ECAI 2023 "Revisiting the Efficiency-Accuracy Tradeoff in Adapting Transformer Models via Adversarial Fine-Tuning", Minjia Zhang, Niranjan Uma Naresh, Yuxiong He (acceptance rate: 392/1632=24%)
- IEEE Data Engineering Bulletin 2023 "Exploiting Modern Hardware Architectures for High-Dimensional Vector Search at Speed and Scale", Minjia Zhang, Jie Ren, Zhen Peng, Ruoming Jin, Dong Li, and Bin Ren
- NeurIPS 2023 AI4Science Workshop "DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies", Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, and all
2022
- ICML 2022 "Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale", Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He (acceptance rate: 1117/5630=21.9%)
- ACL 2022 Workshop BigScience "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model"
- WSDM 2022 "GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning", Minjia Zhang, Wenhan Wang, Yuxiong He (acceptance rate: 159/786=20.2%)
- DAC 2022 "CarM: Hierarchical Episodic Memory for Continual Learning", Soobee Lee, Minindu Weerakoon, Jonghyun Choi, Minjia Zhang, Di Wang, Myeongjae Jeon (acceptance rates: 20-25%)
- SC 2022 "Enabling Efficient Inference of Transformer Models at Unprecedented Scale", Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Olatunji Ruwase, Shaden Smith, Yuxiong He (acceptance rate: 81/320=25.3%)
- MLSys Workshop 2022 "A Survey of Multi-Tenant Deep Learning Inference on GPU", Fuxun Yu, Di Wang, Longfei Shangguan, Minjia Zhang, Chenchen Liu, Xiang Chen, in the MLSYS'22 workshop on Cloud Intelligence/AIOps
- TECS 2022 "SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Network", Reza Yazdani, Olatunji Ruwase, Minjia Zhang, Yuxiong He, Jose-Maria Arnau, Antonio Gonzalez, in the ACM Transactions on Embedded Computing Systems 2022
- NeurIPS 2022 (Spotlight) "The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models", Conglong Li, Minjia Zhang, Yuxiong He (acceptance rate: 2665/10411=25.6%)
- NeurIPS 2022 (Oral) "Extreme Compression for Pre-trained Transformers Made Simple and Efficient", Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, Yuxiong He (acceptance rate: 183/10411=1.76%)
- NeurIPS 2022 (Spotlight) "ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers", Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He (acceptance rate: 2665/10411=25.6%)
- AAAI 2022 "Adversarial Data Augmentation for Task-Specific Knowledge Distillation of Pre-Trained Transformers", Minjia Zhang, Niranjan Uma Naresh, Yuxiong He (acceptance rates: 1349/9251=15%)
2021
- USENIX ATC 2021 "ZeRO-Offload: Democratizing Billion-Scale Model Training", Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, Yuxiong He (acceptance rate: 64/341=18.7%)
- HPCA 2021 "Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning", Jie Ren, Jiaolin Luo, Kai Wu, Minjia Zhang, Hyeran Jeon, Dong Li (acceptance rate: 63/258=24.4%)
- NeurIPS 2021 "NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM", Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu (acceptance rate: 2372/9122=26%)
- ICLR 2021 "DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation", Minjia Zhang*, Menghao Li*, Chi Wang, Minqin Li. (acceptance rate: 860/2997=28.7%)
- IPDPS 2021 "DUET: Compiler-Aware Subgraph Scheduling for Tensor Programs on a Coupled CPU-GPU Architecture", Minjia Zhang*, Zehua Hu*, Minqin Li. *Equal contribution. (acceptance rate: 105/462=22.7%)
2020
- NeurIPS 2020 "AdaTune: Adaptive Tensor Program Compilation Made Efficient", Menghao Li*, Minjia Zhang*, Chi Wang, Minqin Li. *Equal contribution. (acceptance rate: 1900/9454=20%)
- NeurIPS 2020 "Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping", Minjia Zhang, Yuxiong He (acceptance rate: 1900/9454=20%)
- NeurIPS 2020 "HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory", Jie Ren, Minjia Zhang, Dong Li (acceptance rate: 1900/9454=20%)
- SIGMOD 2020 "Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination", Conglong Li, Minjia Zhang, Yuxiong He, David Anderson (acceptance rate: 123/458=26.9%)
2019
- CIKM 2019 "GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine", Minjia Zhang, Yuxiong He (acceptance rate: 200/1030=19.4%)
- USENIX OpML 2019 "Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft", Minjia Zhang, Samyam Rajbandari, Wenhan Wang, Elton Zheng, Olatunji Ruwase, Jeff Rasley, Jason Li, Junhua Wang, Yuxiong He
2018
- NeurIPS 2018 “Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models”, Minjia Zhang, Xiaodong Liu, Wenhan Wang, Jianfeng Gao, Yuxiong He (acceptance rate: 1010/4854=20.8%)
- ICLR 2018 “Learning Intrinsic Sparse Structures within Long Short-Term Memory”, Wei Wen, Yuxiong He, Samyam Rajbhandari, Minjia Zhang, Wenhan Wang, Fang Liu, Bin Hu, Yiran Chen, Hai Li, (acceptance rate: 337/937=36%)
- USENIX ATC 2018 "DeepCPU: Serving RNN-based Deep Learning Models 10x Faster", Minjia Zhang*, Samyam Rajbhandari*, Wenhan Wang, Yuxiong He. *Equal contribution. (acceptance rate: 76/378=20.1%)
2017 and older
- TOPC 2017 "Hybridizing and Relaxing Dependence Tracking for Efficient Parallel Runtime Support", Man Cao, Minjia Zhang, Aritra Sengupta, Swarnendu Biswas, and Michael D. Bond, In ACM Transactions on Parallel Computing.
- ISMM 2017 "Avoiding Consistency Exceptions Under Strong Memory Consistency Models", Minjia Zhang, Swarnendu Biswas, Michael D. Bond, in the 2017 ACM SIGPLAN International Symposium on Memory Management.
- CC 2017 "Lightweight Data Race Detection for Production Runs", Swarnendu Biswas, Man Cao, Minjia Zhang, Michael D. Bond, and Benjamin P. Wood, in the 26th International Conference on Compiler Construction.
- PPoPP 2017 "On the Problem of Consistency Exceptions in the Context of Strong Memory Models", Minjia Zhang, Swarnendu Biswas, Michael D. Bond, in the 22th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.
- CC 2016 "Relaxed Dependence Tracking for Parallel Runtime Support", Minjia Zhang, Swarnendu Biswas, Michael D. Bond, in the 25th International Conference on Compiler Construction.
- PPoPP 2016 "Drinking from Both Glasses: Combining Pessimistic and Optimistic Tracking of Cross-Thread Dependences", Man Cao, Minjia Zhang, Aritra Sengupta, and Michael Bond, in the 21th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.
- OOPSLA 2015 "Valor: Efficient, Software-Only Region Conflict Exceptions"(Distinguished Artifact Award, Distinguished Paper Award), Swarnendu Biswas, Minjia Zhang, Michael D. Bond, and Brandon Lucia, in the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications.
- PPoPP 2015 "Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics", Minjia Zhang, Jipeng Huang, Man Cao, and Michael D. Bond, in the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.
- ASPLOS 2015 "Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability", Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond, and Milind Kulkarni, in the 20th International Conference on Architectural Support for Programming Languages and Operating Systems.
- SPLASH 2015 Companion "SIRe: An Efficient Snapshot Isolation based Memory Model for Detecting and Tolerating Region Conflicts", Minjia Zhang, in 2015 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity.
- WoDet 2014 "Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support", Man Cao, Minjia Zhang, and Michael D. Bond, in the 5th Workshop on Determinism and Correctness in Parallel Programming.
- OOPSLA 2013 "Octet: Capturing and Controlling Cross-Thread Dependences Efficiently", Michael D. Bond, Milind Kulkarni, Man Cao, Minjia Zhang, Meisam Fathi Salmi, Swarnendu Biswas, Aritra Sengupta, and Jipeng Huang, in the 2013 ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications.
- ICPP 2011 "Memcached Design on High Performance RDMA Capable Interconnects", J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. W. Rahman, N. S. Islam, X. Ouyang, S. Sur and D. K. Panda, in the 40th International Conference on Parallel Processing.
- ICPADS 2010 "VirtCFT: A Transparent VM-level Fault-Tolerant System for Virtual Clusters", Minjia Zhang,Hai Jin,Song Wu,Xuanhua Shi, in IEEE 16th International Conference on Parallel and Distributed Systems.