PerslyPersly API

Streaming

Stream chat completions in real-time using Server-Sent Events (SSE).

The Chat Completions API supports real-time streaming using Server-Sent Events (SSE). When streaming is enabled, the response is delivered as a series of chunks, allowing your application to display content as it's generated.

Enabling Streaming

Set stream: true in your request body:

curl https://api.persly.ai/v1/chat/completions \
  -H "Authorization: Bearer $PERSLY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "persly-chat-v1",
    "messages": [{"role": "user", "content": "Explain hypertension treatment options."}],
    "stream": true
  }'

SSE Protocol

The streaming response uses the text/event-stream content type. Each chunk is prefixed with data: and followed by two newlines. The format is fully compatible with the OpenAI streaming protocol.

Chunk Types

First chunk — Contains the assistant role:

data: {"id":"chatcmpl_a1b2c3...","object":"chat.completion.chunk","created":1709000000,"model":"persly-chat-v1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

Content chunks — Each contains a token in delta.content:

data: {"id":"chatcmpl_a1b2c3...","object":"chat.completion.chunk","created":1709000000,"model":"persly-chat-v1","choices":[{"index":0,"delta":{"content":"Hypertension"},"finish_reason":null}]}

Final data chunk — Contains finish_reason, usage, and optionally sources:

data: {"id":"chatcmpl_a1b2c3...","object":"chat.completion.chunk","created":1709000000,"model":"persly-chat-v1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"request_count":1},"sources":[{"title":"Hypertension Guidelines - JNC 8","url":"https://...","relevance_score":0.92}]}

Termination signal:

data: [DONE]

Client Implementation

import httpx
import json

with httpx.stream(
    "POST",
    "https://api.persly.ai/v1/chat/completions",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "model": "persly-chat-v1",
        "messages": [{"role": "user", "content": "Explain hypertension treatment."}],
        "stream": True,
    },
) as response:
    for line in response.iter_lines():
        if not line or not line.startswith("data: "):
            continue
        data = line[6:]  # Remove "data: " prefix
        if data == "[DONE]":
            break
        chunk = json.loads(data)
        delta = chunk["choices"][0]["delta"]
        if "content" in delta:
            print(delta["content"], end="", flush=True)
        if chunk["choices"][0].get("finish_reason") == "stop":
            sources = chunk.get("sources", [])
            if sources:
                print("\n\nSources:")
                for s in sources:
                    print(f"  - {s['title']}: {s['url']}")
const response = await fetch("https://api.persly.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    Authorization: "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "persly-chat-v1",
    messages: [{ role: "user", content: "Explain hypertension treatment." }],
    stream: true,
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n");
  buffer = lines.pop(); // Keep incomplete line in buffer

  for (const line of lines) {
    if (!line.startsWith("data: ")) continue;
    const data = line.slice(6);
    if (data === "[DONE]") break;

    const chunk = JSON.parse(data);
    const delta = chunk.choices[0].delta;
    if (delta.content) {
      process.stdout.write(delta.content);
    }
  }
}

Key Considerations

  • Content-Type: The response header is text/event-stream
  • OpenAI compatibility: The chunk format uses chat.completion.chunk object type and delta field, matching the OpenAI streaming protocol
  • Sources: Included in the final data chunk as an array when include_sources: true (default). Returns [] if no sources were found.
  • Follow-up questions: Included in the final data chunk as an array when include_follow_ups: true. Returns [] if no suggestions were generated.
  • Usage: Always included in the final data chunk
  • Billing: Streaming and non-streaming requests have the same cost

On this page