On distributed backwards passes. DDP, FSDP, TP, SP, EP, Partial placements, and more.
Sharded Einstein Notation
Einstein notation extended with sharding subscripts for reasoning about distributed ML. Discusses: DDP, FSDP, Torch's DTensor, Tensor Parallel, Sequence Parallel, and Ring Attention.