Chat completions
Send chat messages to an open LLM and get a completion back — the OpenAI-compatible /v1/chat/completions endpoint.
The /v1/chat/completions endpoint takes a list of messages and returns the model's reply. It follows the OpenAI chat format, so existing code and SDKs work unchanged.
Request
curl https://paraloncloud.com/v1/chat/completions \
-H "Authorization: Bearer prlc_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-8b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain what a GPU is in one sentence."}
]
}'
Common parameters
| Field | Type | Description |
|---|---|---|
model | string | The model id to use — see Models. |
messages | array | The conversation so far. Each message has a role (system, user, or assistant) and content. |
temperature | number | Sampling randomness. Lower is more deterministic, higher is more creative. |
max_tokens | integer | Maximum number of tokens to generate in the reply. |
stream | boolean | When true, tokens are streamed back as they're generated. |
Response
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "qwen3-8b",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "A GPU is ..." },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 18,
"total_tokens": 42
}
}
The reply text is in choices[0].message.content. The usage block reports how many tokens the request consumed — this is what your usage is metered on.
Streaming
Set stream: true to receive tokens as they're produced, which makes interfaces feel responsive. With the OpenAI SDK:
stream = client.chat.completions.create(
model="qwen3-8b",
messages=[{"role": "user", "content": "Write a haiku about GPUs."}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
System prompts
Use a system message to set the assistant's behavior or persona before the conversation starts:
{"role": "system", "content": "You are a terse assistant. Answer in one sentence."}
Want to experiment first? The Playground lets you tweak the system prompt and messages live, then copy the settings into your code.