Browse parent directory

Simple Guide to FLOP and Cost Per Token

FLOP: A "float math step." We use float32 if not stated.
FLOP/s: How many float math steps a GPU does each second.
We skip saying "FLOPs" or "FLOPS."


1. Cost of GPU in Terms of FLOP

Formula:

FLOP/$ = (GPU FLOP/s) * (GPU life in s) / (GPU price in $)

2. Cost Per Token for an LLM

We can find cost per token if we know:

Formula:

$/token = (FLOP / token) / (GPU FLOP/$)
         = e * (LLM params) / (GPU FLOP/$)

We assume:


3. Example: Llama3 with 405B Params

Memory need


4. An Example Setup

Suppose we have 2 racks, each with 8×H200 (SXM). Each H200 gives 67 TFLOP/s.

Total FLOP/$

= (2 × 8 × 67 TFLOP/s) 
  × (5 years in sec) 
  / (2 × $300k)
= 2.817×10^17 FLOP/$

(We note "2 × $300k" since each rack costs $300k, so total is $600k.)


5. Cost Per Token for Llama3 (405B)

$/token 
= e × (405B) / (2.817×10^17 FLOP/$)
= e × 1.44×10^-6 $/token
= e × $1.44 per 1M tokens

If e = 2, then cost is about $2.88 per 1M tokens.


6. Compare With OpenAI

OpenAI posts their price at this page. You can see how these numbers might line up with that.


Note: