Sharded Backwards
On distributed backwards passes. DDP, FSDP, TP, SP, EP, Partial placements, and more.
On distributed backwards passes. DDP, FSDP, TP, SP, EP, Partial placements, and more.
Einstein notation extended with sharding subscripts for reasoning about distributed ML. Discusses: DDP, FSDP, Torch's DTensor, Tensor Parallel, Sequence Parallel, and Ring Attention.