AI architecture lab

LLM Transformer Size Calculator

Adjust the key parameters of a decoder-only Transformer model and see an approximate parameter count. The estimate assumes multi-head attention and a SwiGLU-style MLP.

hidden_size = n_heads x head_dim

Architecture parameters

Hover the info icon for a plain-language analogy. Number inputs and sliders stay synchronized.

Together with head_dim, this determines hidden_size.

Common values include 64, 80, 96, and 128.

Increases model size almost linearly.

Often around 2.5x to 4x hidden_size.

If untied, the output layer adds another vocab_size x hidden_size matrix.

Estimated size

This is an architecture-level approximation, not an exact calculator for a specific model family.

Total parameters
-
-
hidden_size -
MLP ratio -
params / block -
embedding params -
Attention
-
MLP
-
Embedding / head
-
Scope note This calculator focuses on the dominant weight matrices of a decoder-only Transformer. It intentionally ignores smaller implementation-specific details such as rotary embeddings, bias terms in some variants, and auxiliary adapters.