Streaming chat

Pass stream=True to get an iterator of chunks. The SDK parses the SSE framing for you.

from volt import Volt

client = Volt()
stream = client.chat.completions.create(
    model="llama-3.3-70b-instruct",
    messages=[{"role": "user", "content": "Write a haiku about sovereign clouds."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content, end="", flush=True)
print()

Streaming is resumable: on a disconnect the SDK can resume from the last sequence if the pod's KV cache is still warm, otherwise it re-issues. Bound retries per call with max_retries=.