AI architecture lab
hidden_size = n_heads x head_dim
LLM Transformer Size Calculator
Adjust the key parameters of a decoder-only Transformer model and see an approximate parameter count. The estimate assumes multi-head attention and a SwiGLU-style MLP.
Architecture parameters
Hover the info icon for a plain-language analogy. Number inputs and sliders stay synchronized.
Together with head_dim, this determines hidden_size.
Common values include 64, 80, 96, and 128.
Increases model size almost linearly.
Often around 2.5x to 4x hidden_size.
If untied, the output layer adds another vocab_size x hidden_size matrix.
Estimated size
This is an architecture-level approximation, not an exact calculator for a specific model family.
Total parameters
-
-
hidden_size
-
MLP ratio
-
params / block
-
embedding params
-
Attention
-
MLP
-
Embedding / head
-
Scope note
This calculator focuses on the dominant weight matrices of a decoder-only Transformer. It intentionally ignores smaller implementation-specific details such as rotary embeddings, bias terms in some variants, and auxiliary adapters.