Simple Guide to FLOP and Cost Per Token

FLOP: A "float math step." We use float32 if not stated.
FLOP/s: How many float math steps a GPU does each second.
We skip saying "FLOPs" or "FLOPS."

1. Cost of GPU in Terms of FLOP

Formula:

FLOP/$ = (GPU FLOP/s) * (GPU life in s) / (GPU price in $)

"GPU FLOP/s": the steps per sec.
"GPU life in s": total sec we run the GPU (say, 5 years).
"GPU price in $": cost to buy the GPU.

2. Cost Per Token for an LLM

We can find cost per token if we know:

How many float steps each token needs (FLOP/token).
The GPU's FLOP/$ rate.

Formula:

$/token = (FLOP / token) / (GPU FLOP/$)
         = e * (LLM params) / (GPU FLOP/$)

e: number of times each LLM param is used in one pass. e > 1.
LLM params: the size (in counts) of the model.

We assume:

Power cost over 5 years is small vs. the GPU cost.
One token comes from one forward pass.

3. Example: Llama3 with 405B Params

Memory need

Each param needs 4 bytes at float32.
So, 405B × 4 bytes = 1620 GB.
One H200 GPU has 141 GB.
1620 ÷ 141 ≈ 11.48 → Need ~12 H200 GPUs.

4. An Example Setup

Suppose we have 2 racks, each with 8×H200 (SXM). Each H200 gives 67 TFLOP/s.

Total FLOP/$

= (2 × 8 × 67 TFLOP/s) 
  × (5 years in sec) 
  / (2 × $300k)
= 2.817×10^17 FLOP/$

(We note "2 × $300k" since each rack costs $300k, so total is $600k.)

5. Cost Per Token for Llama3 (405B)

$/token 
= e × (405B) / (2.817×10^17 FLOP/$)
= e × 1.44×10^-6 $/token
= e × $1.44 per 1M tokens

If e = 2, then cost is about $2.88 per 1M tokens.

6. Compare With OpenAI

OpenAI posts their price at this page. You can see how these numbers might line up with that.

Note:

This is a rough check.
Actual cost also depends on power use, staff cost, and more.