Model# HPUPrecisionInput LengthOutput LengthBatch SizeThroughput (tokens/sec)
LLaMA 2 7b1fp81281281,23013,137
1fp81282,0481634,791
1fp82,048128941,449
1fp82,0482,048811,953
LLaMA 2 70b2fp81281281,7502,844
4fp81282,0483273,211
2fp82,04812895310
2fp82,0482,048781,438
Mistral 7b1fp812812889618,023
1fp81282,04812011,294
1fp82,0481281201,427
1fp82,0482,048443,947
LLaMA 3.1 8B1fp81281282,42918,201
1fp81282,04828911,138
1fp82,0481281792,028
1fp82,0482,0481555,511

Model# HPUPrecisionInput LengthBatch SizeLatency (ms)
LLaMA 2 7b1fp812817.62
1fp82,048156.31
LLaMA 2 70b8fp8128126.93
8fp82,0481116
LLaMA 3.1 -8B1fp812818.17
1fp82,048160.52