Chat Completions
OpenAI-Compatible Endpoints
Chat Completions
Create a chat completion using OpenAI-compatible format
POST
Chat Completions
Overview
The/v1/chat/completions endpoint provides full OpenAI Chat Completions API compatibility. It accepts chat-formatted messages and maps them internally to the Responses API format while preserving streaming behavior and tool calling capabilities.
Authentication
Bearer token for API authentication. Format:
Bearer YOUR_API_KEYRequest Body
ID of the model to use. Must be a valid model slug from the
/v1/models endpoint.Example: "gpt-4.1", "gpt-5.2"Array of message objects representing the conversation history. Must contain at least one message.Each message object has:
role(string, required): One of"system","developer","user","assistant", or"tool"content(string | array): Message content. Forsystem/developerroles, must be text-only.tool_calls(array, optional): For assistant messages, array of tool call objectstool_call_id(string, required for tool role): ID of the tool call this message responds to
Array of tool definitions available to the model.Each tool object:
type(string):"function"or"web_search"function(object): For function tools, containsname,description, andparameters
function: Custom function callsweb_searchorweb_search_preview: Web search capability
file_search,code_interpreter,computer_use,computer_use_preview,image_generation
Controls which tool the model should use.Options:
"none": Model will not call tools"auto": Model decides whether to call tools"required": Model must call at least one tool- Object with
{"type": "function", "function": {"name": "tool_name"}}: Force specific tool
Whether to enable parallel tool calling. When true, the model can call multiple tools simultaneously.
Whether to stream the response as server-sent events.
true: Returnstext/event-streamwithchat.completion.chunkobjectsfalse: Returns a singlechat.completionobject
Options for streaming responses.Properties:
include_usage(boolean): Include token usage in final chunkinclude_obfuscation(boolean): Include obfuscation data in stream
Sampling temperature between 0 and 2. Higher values make output more random.
Nucleus sampling parameter. Alternative to temperature.
Maximum number of tokens to generate. Alias for
max_completion_tokens.Maximum number of tokens in the completion.
Format for the model’s output.Options:
{"type": "text"}: Plain text (default){"type": "json_object"}: Valid JSON object{"type": "json_schema", "json_schema": {...}}: JSON matching provided schema
json_schema type:json_schema.name(string): Schema name, 1-64 chars, alphanumeric/underscore/hyphenjson_schema.schema(object): JSON Schema definitionjson_schema.strict(boolean): Enable strict schema adherence
Stop sequence(s). Generation stops when these tokens are encountered.
Penalty for token presence. Range: -2.0 to 2.0.
Penalty for token frequency. Range: -2.0 to 2.0.
Whether to return log probabilities of output tokens.
Number of most likely tokens to return at each position (requires
logprobs: true).Random seed for deterministic sampling.
Number of completions to generate. Must be 1 (only value supported).
Response (Non-Streaming)
Whenstream is false or omitted, returns a chat.completion object:
Unique identifier for the completion.
Always
"chat.completion".Unix timestamp of creation.
Model used for completion.
Array of completion choices (always contains one choice).Each choice object:
index(integer): Choice index (always 0)message(object): The assistant’s messagerole(string): Always"assistant"content(string | null): Text content of the messagerefusal(string | null): Refusal message if model declinedtool_calls(array | null): Tool calls made by the model
finish_reason(string): Why generation stopped"stop": Natural completion"length": Max tokens reached"tool_calls": Model called tools"content_filter": Content filtered
Token usage information.Properties:
prompt_tokens(integer): Tokens in the promptcompletion_tokens(integer): Tokens in the completiontotal_tokens(integer): Total tokens usedprompt_tokens_details(object | null):cached_tokens(integer): Cached prompt tokens
completion_tokens_details(object | null):reasoning_tokens(integer): Tokens used for reasoning
Response (Streaming)
Whenstream is true, returns text/event-stream with chat.completion.chunk objects:
Unique identifier for the chunk stream.
Always
"chat.completion.chunk".Unix timestamp of creation.
Model being used.
Array of delta choices.Each choice contains:
index(integer): Always 0delta(object): Incremental contentrole(string | null): Role (only in first chunk)content(string | null): Content deltarefusal(string | null): Refusal deltatool_calls(array | null): Tool call deltas
finish_reason(string | null): Reason when complete
Token usage (only in final chunk when
stream_options.include_usage is true).Examples
Basic Chat Completion
Streaming Response
Streaming with Usage
Tool Calling
Web Search Tool
JSON Schema Response Format
Multi-turn Conversation
Content Type Restrictions
System and Developer Messages
- Must contain text-only content
- Cannot include images, files, or other media types
- Violations return
400withinvalid_request_error
User Messages
Supported content types:- Text: String or
{"type": "text", "text": "..."} - Images:
{"type": "image_url", "image_url": {"url": "..."}}- Data URLs and HTTP(S) URLs supported
- Images over 8MB are automatically dropped
- Files:
{"type": "file", "file": {...}}file_idis not supported and will return error
- Audio input:
input_audiotype returns400error
Assistant Messages
- Can include
content(text) and/ortool_calls - Tool calls must have valid
idandfunctionwithname
Tool Messages
- Must include
tool_call_idmatching a previous assistant tool call - Content becomes the tool output
Error Handling
All errors return OpenAI-compatible error envelopes:invalid_request_error: Invalid request parametersmodel_not_allowed: API key lacks access to requested modelno_accounts: No upstream accounts availableupstream_error: Upstream service error
data: [DONE].
Model Restrictions
If your API key hasallowed_models configured, only those models can be used. Requests for other models return:
/v1/models.