AI architecture lab

LLM Transformer Size Calculator

Adjust the key parameters of a decoder-only Transformer model and see an approximate parameter count. The estimate assumes multi-head attention and a SwiGLU-style MLP.

hidden_size = n_heads x head_dim

Architecture parameters

Hover the info icon for a plain-language analogy. Number inputs and sliders stay synchronized.

n_heads ?

Together with head_dim, this determines hidden_size.

head_dim ?

Common values include 64, 80, 96, and 128.

n_layers ?

Increases model size almost linearly.

intermediate_size ?

Often around 2.5x to 4x hidden_size.

vocab_size ?

Assume tied embeddings so the lm_head shares the token embedding weights.

If untied, the output layer adds another vocab_size x hidden_size matrix.

Estimated size

This is an architecture-level approximation, not an exact calculator for a specific model family.

Total parameters

hidden_size -

MLP ratio -

params / block -

embedding params -

Attention

MLP

Embedding / head

Scope note This calculator focuses on the dominant weight matrices of a decoder-only Transformer. It intentionally ignores smaller implementation-specific details such as rotary embeddings, bias terms in some variants, and auxiliary adapters.