Streaming
Stream chat completions in real-time using Server-Sent Events (SSE).
The Chat Completions API supports real-time streaming using Server-Sent Events (SSE). When streaming is enabled, the response is delivered as a series of chunks, allowing your application to display content as it's generated.
Enabling Streaming
Set stream: true in your request body:
curl https://api.persly.ai/v1/chat/completions \
-H "Authorization: Bearer $PERSLY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "persly-chat-v1",
"messages": [{"role": "user", "content": "Explain hypertension treatment options."}],
"stream": true
}'SSE Protocol
The streaming response uses the text/event-stream content type. Each chunk is prefixed with data: and followed by two newlines. The format is fully compatible with the OpenAI streaming protocol.
Chunk Types
First chunk — Contains the assistant role:
data: {"id":"chatcmpl_a1b2c3...","object":"chat.completion.chunk","created":1709000000,"model":"persly-chat-v1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}Content chunks — Each contains a token in delta.content:
data: {"id":"chatcmpl_a1b2c3...","object":"chat.completion.chunk","created":1709000000,"model":"persly-chat-v1","choices":[{"index":0,"delta":{"content":"Hypertension"},"finish_reason":null}]}Final data chunk — Contains finish_reason, usage, and optionally sources:
data: {"id":"chatcmpl_a1b2c3...","object":"chat.completion.chunk","created":1709000000,"model":"persly-chat-v1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"request_count":1},"sources":[{"title":"Hypertension Guidelines - JNC 8","url":"https://...","relevance_score":0.92}]}Termination signal:
data: [DONE]Client Implementation
import httpx
import json
with httpx.stream(
"POST",
"https://api.persly.ai/v1/chat/completions",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"model": "persly-chat-v1",
"messages": [{"role": "user", "content": "Explain hypertension treatment."}],
"stream": True,
},
) as response:
for line in response.iter_lines():
if not line or not line.startswith("data: "):
continue
data = line[6:] # Remove "data: " prefix
if data == "[DONE]":
break
chunk = json.loads(data)
delta = chunk["choices"][0]["delta"]
if "content" in delta:
print(delta["content"], end="", flush=True)
if chunk["choices"][0].get("finish_reason") == "stop":
sources = chunk.get("sources", [])
if sources:
print("\n\nSources:")
for s in sources:
print(f" - {s['title']}: {s['url']}")const response = await fetch("https://api.persly.ai/v1/chat/completions", {
method: "POST",
headers: {
Authorization: "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "persly-chat-v1",
messages: [{ role: "user", content: "Explain hypertension treatment." }],
stream: true,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop(); // Keep incomplete line in buffer
for (const line of lines) {
if (!line.startsWith("data: ")) continue;
const data = line.slice(6);
if (data === "[DONE]") break;
const chunk = JSON.parse(data);
const delta = chunk.choices[0].delta;
if (delta.content) {
process.stdout.write(delta.content);
}
}
}Key Considerations
- Content-Type: The response header is
text/event-stream - OpenAI compatibility: The chunk format uses
chat.completion.chunkobject type anddeltafield, matching the OpenAI streaming protocol - Sources: Included in the final data chunk as an array when
include_sources: true(default). Returns[]if no sources were found. - Follow-up questions: Included in the final data chunk as an array when
include_follow_ups: true. Returns[]if no suggestions were generated. - Usage: Always included in the final data chunk
- Billing: Streaming and non-streaming requests have the same cost