| ResNet50 LARS (torch.compile) | 32 | bf16 | 46508 img/sec | 76.39 | 23.4 min | 256 | |
| ResNet50 LARS (torch.compile) | 8 | bf16 | 11959 img/sec | 76.39 | 2.7 min | 256 | |
| BERT Pre Training combine | 32 | bf16 | 4851 sent/sec | | 1735 min | 64 | |
| BERT Pre Training combine | 8 | bf16 | 1240 sent/sec | | | 64 | |
| BERT Pre Training Phase 1 | 32 | bf16 | 5810 sent/sec | Loss: | 1302 min | 64 | |
| BERT Pre Training Phase 1 | 8 | bf16 | 1489 sent/sec | | | 64 | |
| BERT Pre Training Phase 2 | 32 | bf16 | 1932 sent/sec | Loss: | 433 min | 16 | |
| BERT Pre Training Phase 2 | 8 | bf16 | 490 sent/sec | | | 16 | |
| BERT SQUAD Fine Tuning | 8 | bf16 | 406 sent/sec | 90.68 | 12.96 min | 24 | |
| BART Fine Tuning | 8 | bf16 | 1782 sent/sec | | | 32 | |
| Transformer | 8 | bf16 | 186020 tokens/sec | 27.8 | 1034 min | 4096 | |
| Unet2D (torch.compile) | 8 | bf16 | 4776 img/sec | 72.88 | 67.4 min | 64 | Lightning 2.3.3 |
| Unet3D PTL | 8 | bf16 | 60.77 img/sec | 74.28 | 59.4 min | 2 | Lightning 2.3.3 |
| YOLOX | 8 | bf16 | 312.37 img/sec | 39.93 | 2331.2 min | 16 | |