Chat Completions

Overview

The /v1/chat/completions endpoint provides full OpenAI Chat Completions API compatibility. It accepts chat-formatted messages and maps them internally to the Responses API format while preserving streaming behavior and tool calling capabilities.

Authentication

Authorization

string

required

Bearer token for API authentication. Format: Bearer YOUR_API_KEY

Request Body

model

string

required

ID of the model to use. Must be a valid model slug from the /v1/models endpoint.Example: "gpt-4.1", "gpt-5.2"

messages

array

required

Array of message objects representing the conversation history. Must contain at least one message.Each message object has:

role (string, required): One of "system", "developer", "user", "assistant", or "tool"
content (string | array): Message content. For system/developer roles, must be text-only.
tool_calls (array, optional): For assistant messages, array of tool call objects
tool_call_id (string, required for tool role): ID of the tool call this message responds to

tools

array

Array of tool definitions available to the model.Each tool object:

type (string): "function" or "web_search"
function (object): For function tools, contains name, description, and parameters

Supported tool types:

function: Custom function calls
web_search or web_search_preview: Web search capability

Unsupported types (will return error):

file_search, code_interpreter, computer_use, computer_use_preview, image_generation

tool_choice

string | object

Controls which tool the model should use.Options:

"none": Model will not call tools
"auto": Model decides whether to call tools
"required": Model must call at least one tool
Object with {"type": "function", "function": {"name": "tool_name"}}: Force specific tool

parallel_tool_calls

boolean

Whether to enable parallel tool calling. When true, the model can call multiple tools simultaneously.

stream

boolean

default:false

Whether to stream the response as server-sent events.

true: Returns text/event-stream with chat.completion.chunk objects
false: Returns a single chat.completion object

stream_options

object

Options for streaming responses.Properties:

include_usage (boolean): Include token usage in final chunk
include_obfuscation (boolean): Include obfuscation data in stream

temperature

number

Sampling temperature between 0 and 2. Higher values make output more random.

top_p

number

Nucleus sampling parameter. Alternative to temperature.

max_tokens

integer

Maximum number of tokens to generate. Alias for max_completion_tokens.

max_completion_tokens

integer

Maximum number of tokens in the completion.

response_format

object

Format for the model’s output.Options:

{"type": "text"}: Plain text (default)
{"type": "json_object"}: Valid JSON object
{"type": "json_schema", "json_schema": {...}}: JSON matching provided schema

For json_schema type:

json_schema.name (string): Schema name, 1-64 chars, alphanumeric/underscore/hyphen
json_schema.schema (object): JSON Schema definition
json_schema.strict (boolean): Enable strict schema adherence

stop

string | array

Stop sequence(s). Generation stops when these tokens are encountered.

presence_penalty

number

Penalty for token presence. Range: -2.0 to 2.0.

frequency_penalty

number

Penalty for token frequency. Range: -2.0 to 2.0.

logprobs

boolean

Whether to return log probabilities of output tokens.

top_logprobs

integer

Number of most likely tokens to return at each position (requires logprobs: true).

seed

integer

Random seed for deterministic sampling.

integer

default:1

Number of completions to generate. Must be 1 (only value supported).

Response (Non-Streaming)

When stream is false or omitted, returns a chat.completion object:

string

Unique identifier for the completion.

object

string

Always "chat.completion".

created

integer

Unix timestamp of creation.

model

string

Model used for completion.

choices

array

Array of completion choices (always contains one choice).Each choice object:

index (integer): Choice index (always 0)
message (object): The assistant’s message
- role (string): Always "assistant"
- content (string | null): Text content of the message
- refusal (string | null): Refusal message if model declined
- tool_calls (array | null): Tool calls made by the model
finish_reason (string): Why generation stopped
- "stop": Natural completion
- "length": Max tokens reached
- "tool_calls": Model called tools
- "content_filter": Content filtered

usage

object

Token usage information.Properties:

prompt_tokens (integer): Tokens in the prompt
completion_tokens (integer): Tokens in the completion
total_tokens (integer): Total tokens used
prompt_tokens_details (object | null):
- cached_tokens (integer): Cached prompt tokens
completion_tokens_details (object | null):
- reasoning_tokens (integer): Tokens used for reasoning

Response (Streaming)

When stream is true, returns text/event-stream with chat.completion.chunk objects:

string

Unique identifier for the chunk stream.

object

string

Always "chat.completion.chunk".

created

integer

Unix timestamp of creation.

model

string

Model being used.

choices

array

Array of delta choices.Each choice contains:

index (integer): Always 0
delta (object): Incremental content
- role (string | null): Role (only in first chunk)
- content (string | null): Content delta
- refusal (string | null): Refusal delta
- tool_calls (array | null): Tool call deltas
finish_reason (string | null): Reason when complete

usage

object | null

Token usage (only in final chunk when stream_options.include_usage is true).

Examples

Basic Chat Completion

curl https://api.example.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Streaming Response

curl https://api.example.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
      {"role": "user", "content": "Write a haiku about coding"}
    ],
    "stream": true
  }'

Streaming with Usage

curl https://api.example.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.2",
    "messages": [
      {"role": "user", "content": "Explain quantum computing"}
    ],
    "stream": true,
    "stream_options": {
      "include_usage": true
    }
  }'

Tool Calling

curl https://api.example.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
      {"role": "user", "content": "What is the weather in San Francisco?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City name"
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Web Search Tool

curl https://api.example.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
      {"role": "user", "content": "What are the latest news about AI?"}
    ],
    "tools": [
      {"type": "web_search"}
    ]
  }'

JSON Schema Response Format

curl https://api.example.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
      {"role": "user", "content": "Generate a person profile"}
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "person_profile",
        "schema": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "age": {"type": "number"},
            "city": {"type": "string"}
          },
          "required": ["name", "age"]
        },
        "strict": true
      }
    }
  }'

Multi-turn Conversation

curl https://api.example.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
      {"role": "system", "content": "You are a helpful math tutor."},
      {"role": "user", "content": "What is 25 * 4?"}
    ]
  }'

Content Type Restrictions

System and Developer Messages

Must contain text-only content
Cannot include images, files, or other media types
Violations return 400 with invalid_request_error

User Messages

Supported content types:

Text: String or {"type": "text", "text": "..."}
Images: {"type": "image_url", "image_url": {"url": "..."}}
- Data URLs and HTTP(S) URLs supported
- Images over 8MB are automatically dropped
Files: {"type": "file", "file": {...}}
- file_id is not supported and will return error

Unsupported:

Audio input: input_audio type returns 400 error

Assistant Messages

Can include content (text) and/or tool_calls
Tool calls must have valid id and function with name

Tool Messages

Must include tool_call_id matching a previous assistant tool call
Content becomes the tool output

Error Handling

All errors return OpenAI-compatible error envelopes:

{
  "error": {
    "message": "Error description",
    "type": "invalid_request_error",
    "code": "error_code",
    "param": "field_name"
  }
}

Common error codes:

invalid_request_error: Invalid request parameters
model_not_allowed: API key lacks access to requested model
no_accounts: No upstream accounts available
upstream_error: Upstream service error

For streaming requests, errors are emitted as error chunks followed by data: [DONE].

Model Restrictions

If your API key has allowed_models configured, only those models can be used. Requests for other models return:

{
  "error": {
    "message": "This API key does not have access to model 'gpt-5.2'",
    "type": "invalid_request_error",
    "code": "model_not_allowed"
  }
}

Check available models at /v1/models.

Overview

OpenAI-Compatible Endpoints

Codex Endpoints

Management API

Overview

Authentication

Request Body

Response (Non-Streaming)

Response (Streaming)

Examples

Basic Chat Completion

Streaming Response

Streaming with Usage

Tool Calling

Web Search Tool

JSON Schema Response Format

Multi-turn Conversation

Content Type Restrictions

System and Developer Messages

User Messages

Assistant Messages

Tool Messages

Error Handling

Model Restrictions

Overview

OpenAI-Compatible Endpoints

Codex Endpoints

Management API

Documentation Index

​Overview

​Authentication

​Request Body

​Response (Non-Streaming)

​Response (Streaming)

​Examples

​Basic Chat Completion

​Streaming Response

​Streaming with Usage

​Tool Calling

​Web Search Tool

​JSON Schema Response Format

​Multi-turn Conversation

​Content Type Restrictions

​System and Developer Messages

​User Messages

​Assistant Messages

​Tool Messages

​Error Handling

​Model Restrictions

Overview

Authentication

Request Body

Response (Non-Streaming)

Response (Streaming)

Examples

Basic Chat Completion

Streaming Response

Streaming with Usage

Tool Calling

Web Search Tool

JSON Schema Response Format

Multi-turn Conversation

Content Type Restrictions

System and Developer Messages

User Messages

Assistant Messages

Tool Messages

Error Handling

Model Restrictions