Skip to main content This page explains how billing relates to memory usage and storage, and how estimates are shown during deployment.
1. Memory-based compute pricing
You are billed for the memory used by your model instances on our GPU servers.
Example A: A single model instance uses 50 GB of VRAM.
Estimated cost: ~$5.50 per hour for that instance.
Example B: Two active instances of the same model, each using 50 GB.
Estimated cost: ~$11.00 per hour in total.
Notes:
Higher precision (e.g., Float32) uses more memory than lower precision (e.g., BFloat16), which can increase cost.
Instance memory usage depends on model size, precision, optional quantization, and context length for LLMs.
2. Storage for Super fast readiness level
When you choose the Super fast readiness level, an additional storage charge applies for the prepped model artifacts used to accelerate load times.
Price per GB: $0.55
Example A: A prepared model requiring 20 GB of storage.
Estimated monthly storage cost: 20 GB × $0.55 = $11.00
Example B: A prepared model requiring 60 GB of storage.
Estimated monthly storage cost: 60 GB × $0.55 = $33.00
All prices are estimates and are shown before you deploy so you can review and confirm.
3. Storage for Always ready readiness level
When you choose the Always ready readiness level, we cover the storage cost for you, so you get the speed of Super Fast without the additional storage charge.