|
# Pipeline Parallellism with Controllable Memory |
|
|
|
Pipeline Parallelism with Controllable Memory creates a framework on designing pipeline schedules and uses the framework to find memory optimal efficient schedules. |
|
|
|
From our findings, we need approximately 1/3 memory under ideal conditions (F, B and W have same runtime), and 1/2 memory to create zero bubble schedule in realistic scenarios (with the necessary condition being W + 2B ≥ 2F and W + 2F ≥ 2B ). |
|
|
|
Check out our paper at [Arxiv](https://arxiv.org/abs/2405.15362). |
|
|
|
|
|
| Method | 1F1B | V-Min | V-Half | V-ZB | |
|
|------------------------------------------|-------|----------|----------| ---- | |
|
| Bubble Rate <br> (assuming T_F=T_B=T_W) | ~ p/m | ~ 2p/3m | ~ p/ 2m | 0 | |
|
| Activation Memory <br> (by #micro-batch) | p | (p+4)//3 | (p+2)//2 | p | |
|
|
|
|
|
Bubble Rate here is calculated as `1 - (F+B+W)*m / longest_stage_time`. |
|
|