Streaming

Server-Sent Events (SSE) streaming for real-time inference responses.

Enabling Streaming

Set "stream": true in the POST /v1/responses request body to receive a streaming response.

{
  "model": "claude-sonnet-4-20250514",
  "input": "Hello!",
  "stream": true
}

Response Format

Streaming responses use the Server-Sent Events (SSE) protocol:

  • Content-Type: text/event-stream; charset=utf-8
  • Cache-Control: no-cache
  • Connection: keep-alive

Each event is formatted as:

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_1","output_index":0,"content_index":0,"delta":"Hello"}

The stream ends with a [DONE] sentinel:

data: [DONE]

Event Lifecycle

Events arrive in a predictable order during a typical text response:

  1. response.created — Response object created (status: in_progress)
  2. response.in_progress — Processing started
  3. response.output_item.added — New output item (message or function_call)
  4. response.content_part.added — Content part started
  5. response.output_text.delta — Text chunk (the actual streaming content, repeated)
  6. response.output_text.done — Text complete for this content part
  7. response.content_part.done — Content part complete
  8. response.output_item.done — Output item complete
  9. response.completed — Full response complete (includes usage)

Event Types Reference

Event TypeKey FieldsDescription
response.created response (full ResponseResource) Response initialized
response.in_progress response Processing started
response.output_item.added output_index, item New output item added
response.content_part.added item_id, output_index, content_index, part Content part started
response.output_text.delta item_id, output_index, content_index, delta Text chunk (streaming content)
response.output_text.done item_id, output_index, content_index, text Complete text for content part
response.content_part.done item_id, output_index, content_index, part Content part complete
response.output_item.done output_index, item Output item complete
response.function_call_arguments.delta item_id, output_index, call_id, delta Function call argument chunk
response.function_call_arguments.done item_id, output_index, call_id, name, arguments Function call complete
response.completed response (with usage) Response complete
response.failed response (with error) Response failed

Example SSE Output

An abbreviated streaming response for a simple text reply:

event: response.created
data: {"type":"response.created","response":{"id":"abc-123","object":"response","created_at":1700000000,"status":"in_progress","model":"claude-sonnet-4-20250514","output":[],"usage":{"input_tokens":0,"output_tokens":0,"total_tokens":0}}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_1","output_index":0,"content_index":0,"delta":"Hello"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_1","output_index":0,"content_index":0,"delta":" world"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_1","output_index":0,"content_index":0,"delta":"!"}

event: response.completed
data: {"type":"response.completed","response":{"id":"abc-123","object":"response","created_at":1700000000,"status":"completed","model":"claude-sonnet-4-20250514","output":[{"type":"message","id":"msg_1","role":"assistant","content":[{"type":"output_text","text":"Hello world!"}]}],"usage":{"input_tokens":10,"output_tokens":5,"total_tokens":15}}}

data: [DONE]

Error Handling

Error During Streaming

If an error occurs while streaming, the server sends a response.failed event with error details, followed by the [DONE] sentinel:

event: response.failed
data: {"type":"response.failed","response":{"id":"abc-123","status":"failed","error":{"message":"Request timed out","code":"request_timeout"}}}

data: [DONE]

Timeout During Streaming

If the plugin does not respond within the request timeout period (default: 2 minutes), a response.failed event is sent with code request_timeout.

Client Disconnect

If the client disconnects during streaming, the plugin is released back to idle and no payment is settled. The client is not charged for incomplete responses.