FLOP: A "float math step." We use float32 if not stated.
FLOP/s: How many float math steps a GPU does each second.
We skip saying "FLOPs" or "FLOPS."
Formula:
FLOP/$ = (GPU FLOP/s) * (GPU life in s) / (GPU price in $)
We can find cost per token if we know:
Formula:
$/token = (FLOP / token) / (GPU FLOP/$)
= e * (LLM params) / (GPU FLOP/$)
We assume:
Memory need
Suppose we have 2 racks, each with 8×H200 (SXM). Each H200 gives 67 TFLOP/s.
Total FLOP/$
= (2 × 8 × 67 TFLOP/s)
× (5 years in sec)
/ (2 × $300k)
= 2.817×10^17 FLOP/$
(We note "2 × $300k" since each rack costs $300k, so total is $600k.)
$/token
= e × (405B) / (2.817×10^17 FLOP/$)
= e × 1.44×10^-6 $/token
= e × $1.44 per 1M tokens
If e = 2, then cost is about $2.88 per 1M tokens.
OpenAI posts their price at this page. You can see how these numbers might line up with that.
Note: