Documentation

Chat completions

Send chat messages to an open LLM and get a completion back — the OpenAI-compatible /v1/chat/completions endpoint.

The /v1/chat/completions endpoint takes a list of messages and returns the model's reply. It follows the OpenAI chat format, so existing code and SDKs work unchanged.

Request

curl https://paraloncloud.com/v1/chat/completions \
  -H "Authorization: Bearer prlc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-8b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain what a GPU is in one sentence."}
    ]
  }'

Common parameters

FieldTypeDescription
modelstringThe model id to use — see Models.
messagesarrayThe conversation so far. Each message has a role (system, user, or assistant) and content.
temperaturenumberSampling randomness. Lower is more deterministic, higher is more creative.
max_tokensintegerMaximum number of tokens to generate in the reply.
streambooleanWhen true, tokens are streamed back as they're generated.

Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "qwen3-8b",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "A GPU is ..." },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 18,
    "total_tokens": 42
  }
}

The reply text is in choices[0].message.content. The usage block reports how many tokens the request consumed — this is what your usage is metered on.

Streaming

Set stream: true to receive tokens as they're produced, which makes interfaces feel responsive. With the OpenAI SDK:

stream = client.chat.completions.create(
    model="qwen3-8b",
    messages=[{"role": "user", "content": "Write a haiku about GPUs."}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

System prompts

Use a system message to set the assistant's behavior or persona before the conversation starts:

{"role": "system", "content": "You are a terse assistant. Answer in one sentence."}

Want to experiment first? The Playground lets you tweak the system prompt and messages live, then copy the settings into your code.