Send chat messages to an open LLM and get a completion back — the OpenAI-compatible /v1/chat/completions endpoint.

The /v1/chat/completions endpoint takes a list of messages and returns the model's reply. It follows the OpenAI chat format, so existing code and SDKs work unchanged.

Request

curl https://paraloncloud.com/v1/chat/completions \
  -H "Authorization: Bearer prlc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-8b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain what a GPU is in one sentence."}
    ]
  }'

Common parameters

Field	Type	Description
`model`	string	The model id to use — see Models.
`messages`	array	The conversation so far. Each message has a `role` (`system`, `user`, or `assistant`) and `content`.
`temperature`	number	Sampling randomness. Lower is more deterministic, higher is more creative.
`max_tokens`	integer	Maximum number of tokens to generate in the reply.
`stream`	boolean	When `true`, tokens are streamed back as they're generated.

Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "qwen3-8b",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "A GPU is ..." },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 18,
    "total_tokens": 42
  }
}

The reply text is in choices[0].message.content. The usage block reports how many tokens the request consumed — this is what your usage is metered on.

Streaming

Set stream: true to receive tokens as they're produced, which makes interfaces feel responsive. With the OpenAI SDK:

stream = client.chat.completions.create(
    model="qwen3-8b",
    messages=[{"role": "user", "content": "Write a haiku about GPUs."}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

System prompts

Use a system message to set the assistant's behavior or persona before the conversation starts:

{"role": "system", "content": "You are a terse assistant. Answer in one sentence."}

Want to experiment first? The Playground lets you tweak the system prompt and messages live, then copy the settings into your code.

Chat completions

Request

Common parameters

Response

Streaming

System prompts