Volt documentation
Run 70B models in your customer's metro. At Bedrock prices. Without your data leaving the city.
Volt is the Sovereign Inference Cloud — a distributed Tier-3 inference fabric across metro pods, with zero egress and cryptographic attestation. Three products under one stack:
Spark
Tokens-as-a-service. OpenAI-compatible. Llama 70B at $0.95/M standard, $1.45/M sovereign.
Forge
GPU-as-a-service. NVIDIA B200 at $2.36/GPU/hr on 36-month reserved.
Vault
Dedicated bare-metal. 8-GPU B200 rack from $85K/mo. Sovereign by default.
Start here
Quickstart
Install the SDK, get a key, make your first request in under 5 minutes.
Authentication
API keys, environment variables, and OAuth.
Sovereignty
How in-metro inference and zero egress work.
API reference
Full control-plane API for Spark, Forge, and Vault.
OpenAI drop-in
Already using OpenAI? Point your client at Volt by changing the base URL and key:
from volt import Volt
client = Volt(api_key="volt_sk_live_...")
resp = client.chat.completions.create(
model="llama-3.3-70b-instruct",
messages=[{"role": "user", "content": "Explain CAP theorem"}],
)
print(resp.choices[0].message.content)