Volt documentation

Run 70B models in your customer's metro. At Bedrock prices. Without your data leaving the city.

Volt is the Sovereign Inference Cloud — a distributed Tier-3 inference fabric across metro pods, with zero egress and cryptographic attestation. Three products under one stack:

Spark

Tokens-as-a-service. OpenAI-compatible. Llama 70B at $0.95/M standard, $1.45/M sovereign.

Forge

GPU-as-a-service. NVIDIA B200 at $2.36/GPU/hr on 36-month reserved.

Vault

Dedicated bare-metal. 8-GPU B200 rack from $85K/mo. Sovereign by default.

Start here

Quickstart

Install the SDK, get a key, make your first request in under 5 minutes.

Authentication

API keys, environment variables, and OAuth.

Sovereignty

How in-metro inference and zero egress work.

API reference

Full control-plane API for Spark, Forge, and Vault.

OpenAI drop-in

Already using OpenAI? Point your client at Volt by changing the base URL and key:

from volt import Volt

client = Volt(api_key="volt_sk_live_...")
resp = client.chat.completions.create(
    model="llama-3.3-70b-instruct",
    messages=[{"role": "user", "content": "Explain CAP theorem"}],
)
print(resp.choices[0].message.content)