Streaming
Server-Sent Events (SSE) streaming for real-time inference responses.
Enabling Streaming
Set "stream": true in the POST /v1/responses request body to receive a streaming response.
{
"model": "claude-sonnet-4-20250514",
"input": "Hello!",
"stream": true
}
Response Format
Streaming responses use the Server-Sent Events (SSE) protocol:
- Content-Type:
text/event-stream; charset=utf-8 - Cache-Control:
no-cache - Connection:
keep-alive
Each event is formatted as:
event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_1","output_index":0,"content_index":0,"delta":"Hello"}
The stream ends with a [DONE] sentinel:
data: [DONE]
Event Lifecycle
Events arrive in a predictable order during a typical text response:
response.created— Response object created (status: in_progress)response.in_progress— Processing startedresponse.output_item.added— New output item (message or function_call)response.content_part.added— Content part startedresponse.output_text.delta— Text chunk (the actual streaming content, repeated)response.output_text.done— Text complete for this content partresponse.content_part.done— Content part completeresponse.output_item.done— Output item completeresponse.completed— Full response complete (includes usage)
Event Types Reference
| Event Type | Key Fields | Description |
|---|---|---|
response.created |
response (full ResponseResource) |
Response initialized |
response.in_progress |
response |
Processing started |
response.output_item.added |
output_index, item |
New output item added |
response.content_part.added |
item_id, output_index, content_index, part |
Content part started |
response.output_text.delta |
item_id, output_index, content_index, delta |
Text chunk (streaming content) |
response.output_text.done |
item_id, output_index, content_index, text |
Complete text for content part |
response.content_part.done |
item_id, output_index, content_index, part |
Content part complete |
response.output_item.done |
output_index, item |
Output item complete |
response.function_call_arguments.delta |
item_id, output_index, call_id, delta |
Function call argument chunk |
response.function_call_arguments.done |
item_id, output_index, call_id, name, arguments |
Function call complete |
response.completed |
response (with usage) |
Response complete |
response.failed |
response (with error) |
Response failed |
Example SSE Output
An abbreviated streaming response for a simple text reply:
event: response.created
data: {"type":"response.created","response":{"id":"abc-123","object":"response","created_at":1700000000,"status":"in_progress","model":"claude-sonnet-4-20250514","output":[],"usage":{"input_tokens":0,"output_tokens":0,"total_tokens":0}}}
event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_1","output_index":0,"content_index":0,"delta":"Hello"}
event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_1","output_index":0,"content_index":0,"delta":" world"}
event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_1","output_index":0,"content_index":0,"delta":"!"}
event: response.completed
data: {"type":"response.completed","response":{"id":"abc-123","object":"response","created_at":1700000000,"status":"completed","model":"claude-sonnet-4-20250514","output":[{"type":"message","id":"msg_1","role":"assistant","content":[{"type":"output_text","text":"Hello world!"}]}],"usage":{"input_tokens":10,"output_tokens":5,"total_tokens":15}}}
data: [DONE]
Error Handling
Error During Streaming
If an error occurs while streaming, the server sends a response.failed event with error details, followed by the [DONE] sentinel:
event: response.failed
data: {"type":"response.failed","response":{"id":"abc-123","status":"failed","error":{"message":"Request timed out","code":"request_timeout"}}}
data: [DONE]
Timeout During Streaming
If the plugin does not respond within the request timeout period (default: 2 minutes), a response.failed event is sent with code request_timeout.
Client Disconnect
If the client disconnects during streaming, the plugin is released back to idle and no payment is settled. The client is not charged for incomplete responses.