[NeurIPS'23] CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training

CD-GraB aims to find a distributed data permutation with provably better convergence guarantees than Distributed Random Reshuffling (D-RR) based on the gradient balancing frameworks introduced in the original GraB paper. The technical details can be found in our NeurIPS'23 paper. Please contact Wentao Guo if you have any questions or suggestions on the paper / code: wg247@cornell.edu.

Requirements

Python >= 3.9

PyTorch >= 2.0.0

CUDA >= 11.7 on linux

torchopt

torchvision

functorch

transformers

Experiments

All generated plots in the paper can be found under notebooks directory.

GraB repository: https://github.com/EugeneLYC/GraB

Logistic regression on HMDA

Please run the following command for CD-GraB

torchrun --nproc_per_node=4 --nnodes=1 --master_addr="localhost" --master_port=35500 main-LR-HMDA.py --sorter CD-GraB --seed 0 --lr 5e-3 --node_cnt 4

and the following command for D-RR

torchrun --nproc_per_node=4 --nnodes=1 --master_addr="localhost" --master_port=35500 main-LR-HMDA.py --sorter D-RR --seed 0 --lr 5e-3 --node_cnt 4

LSTM on Wiki2

Please run the following command for CD-GraB

torchrun --nproc_per_node=4 --nnodes=1 --master_addr="localhost" --master_port=35500 main-LSTM-Wiki2.py --sorter CD-GraB --seed 0 --lr 5.0 --B 16 --node_cnt 4

and the following command for D-RR

torchrun --nproc_per_node=4 --nnodes=1 --master_addr="localhost" --master_port=35500 main-LSTM-Wiki2.py --sorter D-RR --seed 0 --lr 5.0 --B 16 --node_cnt 4

Autoregressive MLP on M4

Please run the following command for CD-GraB

torchrun --nproc_per_node=4 --nnodes=1 --master_addr="localhost" --master_port=35500 main-MLP-M4.py --sorter CD-GraB --seed 0 --B 32 --epochs 50 --node_cnt 32

and the following command for D-RR

torchrun --nproc_per_node=4 --nnodes=1 --master_addr="localhost" --master_port=35500 main-MLP-M4.py --sorter D-RR --seed 0 --B 32 --epochs 50 --node_cnt 32

Tiny GPT2 pretraining on WikiText-103

Please run the following command for CD-GraB

python main-GPT2-Wiki103.py --sorter CD-GraB --seed 0

and the following command for D-RR

python main-GPT2-Wiki103.py --sorter D-RR --seed 0

Authors

Wentao Guo (Cofirst author), wg247@cornell.edu
A. Feder Cooper (Cofirst author), afc78@cornell.edu
Khiem Pham (Cofirst author), dkp45@cornell.edu
Tiancheng Yuan, ty373@cornell.edu
Charlie F. Ruan, cfr54@cornell.edu
Yucheng Lu, yl2967@cornell.edu
Christopher De Sa, cdesa@cs.cornell.edu

License

CD-GraB uses Apache-2 license in the LICENSE file.

Acknowledgement

A. Feder Cooper is supported by Christopher De Sa's NSF CAREER grant, and in part by the Artificial Intelligence Policy and Practice initiative at Cornell University and the John D. and Catherine T. MacArthur Foundation. Yucheng Lu is supported by Meta Ph.D. Fellowship. We also acknowledge a gift from SambaNova Systems. All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.

Cite us

If you find CD-GraB helpful in your research, please consider citing us:

@inproceedings{
  cooper2023cdgrab,
  title={CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training},
  author={A. Feder Cooper and Wentao Guo and Khiem Pham and Tiancheng Yuan and Charlie F. Ruan and Yucheng Lu and Christopher De Sa},
  booktitle={Advances in Neural Information Processing Systems},
  year={2023},
  url={https://arxiv.org/pdf/2302.00845.pdf}
}

@inproceedings{
    lu2022grab,
    title={GraB: Finding Provably Better Data Permutations than Random Reshuffling},
    author={Yucheng Lu and Wentao Guo and Christopher De Sa},
    booktitle={Advances in Neural Information Processing Systems},
    editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
    year={2022},
    url={https://openreview.net/forum?id=nDemfqKHTpK}
}

@inproceedings{
    lu2022a,
    title={A General Analysis of Example-Selection for Stochastic Gradient Descent},
    author={Yucheng Lu and Si Yi Meng and Christopher De Sa},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=7gWSJrP3opB}
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
notebooks		notebooks
.gitignore		.gitignore
CD-GraB.png		CD-GraB.png
LICENSE		LICENSE
M4_generator.py		M4_generator.py
algo.py		algo.py
d_algo.py		d_algo.py
d_cv_train.py		d_cv_train.py
d_data.py		d_data.py
d_eventTimer.py		d_eventTimer.py
d_hmda.py		d_hmda.py
d_lm_data.py		d_lm_data.py
d_lm_train.py		d_lm_train.py
d_model.py		d_model.py
d_time_series_train.py		d_time_series_train.py
d_utils.py		d_utils.py
huggingface_pt.py		huggingface_pt.py
main-GPT2-Wiki103.py		main-GPT2-Wiki103.py
main-LR-HMDA.py		main-LR-HMDA.py
main-LSTM-Wiki2.py		main-LSTM-Wiki2.py
main-MLP-M4.py		main-MLP-M4.py
readme.md		readme.md
utils.py		utils.py

License

GarlGuo/CD-GraB

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS'23] CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training

Requirements

Experiments

All generated plots in the paper can be found under notebooks directory.

Logistic regression on HMDA

LSTM on Wiki2

Autoregressive MLP on M4

Tiny GPT2 pretraining on WikiText-103

Authors

License

Acknowledgement

Cite us

About

Topics

Resources

License

Stars

Watchers

Forks

Languages