> BROMANDER_LABS

CAN YOUR RIGTRAIN IT?

Fine-tuning eats far more memory than inference — weights, gradients, optimizer state, and activations all have to live on the card at once. Here's the honest peak VRAM for full, LoRA, and QLoRA across 50+ models.

Math runs in your browserMixed precision · fp32 master copy

01 — Model & method

Model to fine-tune

Method

4-bit frozen base + adapters. The home-GPU path.

Optimizer

LoRA rank: 16

rank 4 – 256

02 — Batch & sequence

Sequence length: 2,048 tokens

256 – 32K

Batch size (per device): 1

1 – 64

Gradient checkpointing

03 — Your hardware

GPU to check against

Peak VRAM

QLoRA · peak training memory

6.03 GB

Trainable

of total

0.10%

GPU usable

22.1 GB

Fits on one NVIDIA RTX 4090. 6.03 GB of 22.1 GB usable VRAM.

Memory breakdown

Weights · 67%Gradients · 0%Optimizer · 2%Activations · 11%Overhead · 21%

Weights

4.03 GB

Gradients

0.02 GB

Optimizer + master

0.10 GB

Activations

0.64 GB

Overhead

1.24 GB

Estimate assumes bf16 mixed precision with an fp32 master weight copy. Activation memory is approximate — it varies with attention implementation (Flash Attention lowers it) and exact model architecture.

─── Your Training Card ───

The actual image that shows on X, LinkedIn, and Facebook when you share the link.

Live preview

Need a training pipeline built?

Talk to Bromander Studios