Streaming

Stream chat completions in real-time using Server-Sent Events (SSE).

The Chat Completions API supports real-time streaming using Server-Sent Events (SSE). When streaming is enabled, the response is delivered as a series of events, allowing your application to display content as it's generated.

Enabling Streaming

Set stream: true in your request body:

curl https://api.persly.ai/v1/chat/completions \
  -H "Authorization: Bearer $PERSLY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "persly-chat-v1",
    "messages": [{"role": "user", "text": "Explain hypertension treatment options."}],
    "stream": true
  }'

SSE Protocol

The streaming response uses the text/event-stream content type. Each event is prefixed with data: and followed by two newlines. Every event is a JSON object with a type field that determines the payload structure.

Event Types

type	Delivery	Description
`steps`	Snapshot (replace)	Current state of all processing steps
`message`	Delta (append)	Response text chunk
`sources`	Snapshot (replace)	Source citations
`follow_up_questions`	Snapshot (replace)	Follow-up suggestions
`error`	Terminal	Streaming error

Snapshot events contain the full current value — always replace the previous value. Delta events contain a fragment — append to the accumulated value.

Steps Events

Each steps event contains the full current steps[] array. As processing progresses, new steps and actions are added to the array. Always replace the previous steps value with the latest event.

Each step object has:

description (string) — human-readable step label.
actions (array) — legacy compatibility slot. Empty while the step is in progress; populated with a thin {title, url} summary once the retrieving step completes. Prefer sources below for new integrations.
sources (array, optional) — only present on the retrieving step once retrieval completes. Carries the same {id, title, url, relevance_score} objects that will appear in the final sources event, letting clients resolve inline [SW1]-style citations while the message tokens are still streaming.

data: {"type":"steps","steps":[{"description":"Searching medical knowledge base","actions":[]}]}

data: {"type":"steps","steps":[{"description":"Searching medical knowledge base","actions":[{"type":"search_official_source","input":{"query":""},"result":[{"title":"JNC 8 Guidelines","url":"https://...","content":""}]}],"sources":[{"id":"SW1","title":"JNC 8 Guidelines","url":"https://...","relevance_score":0.92}]}]}

data: {"type":"steps","steps":[{"description":"Searching medical knowledge base","actions":[{"type":"search_official_source","input":{"query":""},"result":[{"title":"JNC 8 Guidelines","url":"https://...","content":""}]}],"sources":[{"id":"SW1","title":"JNC 8 Guidelines","url":"https://...","relevance_score":0.92}]},{"description":"Generating response","actions":[]}]}

Message Events

Message events contain partial text fragments of the AI response. Concatenate all content values to build the full message. Inline citation tokens such as [SW1], [PM1], or [PF1] are streamed through as-is in the content fragments and are not renumbered by the API. PF* tokens reference a specific page of a user-attached PDF (one PF* entry per page — see PDF attachments) and appear only when pdf_urls are provided. See the Citation prefixes table for what each two-letter prefix means.

data: {"type":"message","content":"Hypertension"}

data: {"type":"message","content":" treatment typically begins with lifestyle changes [SW1]"}

Sources Event

Emitted after message generation completes. Contains the full list of source citations. Each source includes a stable id (e.g. SW1, PM1, PF1) that matches the inline citation tokens in the streamed message content. Parse citations from the accumulated message with the regex \[([A-Z]{2,}\d+)\] and look each captured token up by sources[*].id. See the Citation prefixes table for what each prefix means. When pdf_urls are provided, PF* entries (one per PDF page, globally sequential) are prepended to sources[] ahead of retrieval sources; pages of the same PDF share the same url so clients can group them.

data: {"type":"sources","sources":[{"id":"SW1","title":"Hypertension Guidelines - JNC 8","url":"https://...","relevance_score":0.92}]}

Returns [] when no sources are found.

The same source objects are also surfaced earlier, on the retrieving step's sources field inside steps events, so clients can resolve citations while tokens are still streaming.

Follow-up Questions Event

Emitted when include_follow_ups: true:

data: {"type":"follow_up_questions","follow_up_questions":["What are the causes of hypertension?","How is hypertension diagnosed?"]}

Returns [] when included but none are generated.

Error Event

If streaming fails during generation, an error event is emitted:

data: {"type":"error","error":{"type":"server_error","code":"internal_error","message":"AI processing failed"}}

In this error path, the stream terminates with [DONE] without emitting further events. When streaming is already underway, the HTTP status is 200; errors are reported in-band through SSE.

Termination Signal

data: [DONE]

Event Order

Typical success-path order:

steps (emitted multiple times as processing progresses)
message chunks (multiple)
sources (always emitted, [] when none found)
follow_up_questions (only when include_follow_ups: true, [] when none generated)
[DONE]

Reconstructing the Response Object

Streaming delivers the same data as a non-streaming response, split across SSE events. Collect the events to build the identical response object:

SSE Event	Delivery	Maps to
`steps`	Replace	`steps`
`message`	Append `content`	`message`
`sources`	Replace	`sources`
`follow_up_questions`	Replace	`follow_up_questions`

For complete error envelope details (error vs 422 detail[]), see Error Handling.

import httpx
import json

result = {"steps": [], "message": "", "sources": None, "follow_up_questions": None}

with httpx.stream(
    "POST",
    "https://api.persly.ai/v1/chat/completions",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "model": "persly-chat-v1",
        "messages": [{"role": "user", "text": "Explain hypertension treatment."}],
        "stream": True,
    },
) as response:
    if response.status_code != 200:
        response.read()
        payload = response.json()
        if response.status_code == 422:
            first = (payload.get("detail") or [{}])[0]
            raise RuntimeError(
                f"Validation error [{first.get('type', 'value_error')}]: "
                f"{first.get('msg', 'Invalid request parameters')}"
            )
        error = payload.get("error", {})
        raise RuntimeError(
            f"API error [{error.get('code', 'unknown')}]: "
            f"{error.get('message', 'Request failed')}"
        )

    for line in response.iter_lines():
        if not line or not line.startswith("data: "):
            continue
        data = line[6:]
        if data == "[DONE]":
            break
        chunk = json.loads(data)

        if chunk["type"] == "error":
            raise RuntimeError(
                f"Streaming error [{chunk['error']['code']}]: "
                f"{chunk['error']['message']}"
            )
        elif chunk["type"] == "steps":
            result["steps"] = chunk["steps"]  # snapshot: replace
        elif chunk["type"] == "message":
            result["message"] += chunk["content"]  # delta: append
        elif chunk["type"] == "sources":
            result["sources"] = chunk["sources"] or None
        elif chunk["type"] == "follow_up_questions":
            result["follow_up_questions"] = chunk["follow_up_questions"] or None

# `result` is now identical to a non-streaming response
print(result["message"])
for source in (result["sources"] or []):
    print(f"  [{source['id']}] {source['title']}: {source['url']}")

const result = { steps: [], message: "", sources: null, follow_up_questions: null };

const response = await fetch("https://api.persly.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    Authorization: "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "persly-chat-v1",
    messages: [{ role: "user", text: "Explain hypertension treatment." }],
    stream: true,
  }),
});

if (!response.ok) {
  const payload = await response.json();
  if (response.status === 422) {
    const first = payload.detail?.[0];
    throw new Error(
      `Validation error [${first?.type ?? "value_error"}]: ${first?.msg ?? "Invalid request parameters"}`
    );
  } else {
    const error = payload?.error ?? {};
    throw new Error(
      `API error [${error.code ?? "unknown"}]: ${error.message ?? "Request failed"}`
    );
  }
}

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
let streamDone = false;

while (!streamDone) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n");
  buffer = lines.pop();

  for (const line of lines) {
    if (!line.startsWith("data: ")) continue;
    const data = line.slice(6);
    if (data === "[DONE]") {
      streamDone = true;
      break;
    }

    const chunk = JSON.parse(data);

    switch (chunk.type) {
      case "error":
        throw new Error(
          `Streaming error [${chunk.error.code}]: ${chunk.error.message}`
        );
      case "steps":
        result.steps = chunk.steps; // snapshot: replace
        break;
      case "message":
        result.message += chunk.content; // delta: append
        break;
      case "sources":
        result.sources = chunk.sources.length > 0 ? chunk.sources : null;
        break;
      case "follow_up_questions":
        result.follow_up_questions =
          chunk.follow_up_questions.length > 0 ? chunk.follow_up_questions : null;
        break;
    }
  }
}

// `result` is now identical to a non-streaming response
console.log(result.message);
for (const source of result.sources ?? []) {
  console.log(`  [${source.id}] ${source.title}: ${source.url}`);
}

Key Considerations

Content-Type: The response header is text/event-stream
Snapshot vs Delta: steps, sources, follow_up_questions are snapshot events (replace previous value); message is the only delta event (append content)
Error path: Runtime stream failures are sent as type: "error" events; clients should handle this before processing other event types
Forward compatibility: Clients should ignore unknown type values and unknown fields
Sources: Always emitted as a sources event; [] when none are found. Each source carries a stable id (e.g. SW1) that matches inline citation tokens in message. The retrieving step inside steps events also carries a sources field so clients can resolve citations while message tokens are still streaming
Citations: Inline citation tokens such as [SW1], [PM1], [PF1] are streamed as-is through message content fragments — parse with \[([A-Z]{2,}\d+)\] and match against sources[*].id. Prefix meaning: SW = curated medical web retrieval, PM = PubMed (pro only), PF = one page of a user-attached PDF (one entry per page, globally sequential). See the Chat Completions → Citation prefixes table for the full list. PF* entries are prepended to sources[] ahead of retrieval sources.
Follow-up questions: Only emitted when include_follow_ups: true; [] when included but none generated
Billing: Streaming and non-streaming use the same pricing rules; follow-up surcharge applies in both modes only when follow-up generation is actually invoked

Streaming

On this page