Audio Transcriptions
OpenAI-Compatible Endpoints
Audio Transcriptions
Transcribe audio to text using OpenAI-compatible format
POST
Audio Transcriptions
Overview
The/v1/audio/transcriptions endpoint provides OpenAI-compatible audio transcription. It accepts multipart audio file uploads and returns transcribed text.
This endpoint:
- Accepts various audio formats (WAV, MP3, M4A, etc.)
- Enforces strict model compatibility (
gpt-4o-transcribeonly) - Supports optional transcription prompts for context
- Applies API key authentication and rate limiting
Authentication
Bearer token for API authentication. Format:
Bearer YOUR_API_KEYRequest
This endpoint requiresmultipart/form-data encoding.
The audio file to transcribe.Supported formats:
- WAV
- MP3
- M4A
- FLAC
- OGG
- WebM
Model to use for transcription. Must be exactly
"gpt-4o-transcribe".Any other value will return a 400 error with:Optional text to guide the transcription style or context.Use this to:
- Provide context about the audio content
- Specify spelling of uncommon words or names
- Guide transcription style
Response
Returns a JSON object with the transcribed text.The transcribed text from the audio file.
Examples
Basic Transcription
Response Example
With Transcription Prompt
Different Audio Formats
WAV file:Using JavaScript
Using Python
Error Handling
All errors return OpenAI-compatible error envelopes:Common Errors
Invalid Model:400 Bad Request
Missing File:
400 Bad Request
Model Access Denied:
403 Forbidden
Rate Limit Exceeded:
429 Too Many Requests
Upstream Error:
502 Bad Gateway
No Accounts Available:
503 Service Unavailable
Model Restrictions
Fixed Model Requirement
Unlike Chat Completions and Responses endpoints, transcription only supports a single fixed model:gpt-4o-transcribe.
This is enforced for OpenAI API compatibility. If you need to use a different transcription model, contact your administrator.
API Key Restrictions
If your API key hasallowed_models configured, it must include gpt-4o-transcribe to use this endpoint.
API Key Configuration:
403 Forbidden and model_not_allowed error.
Valid Configuration:
/v1/models (note: gpt-4o-transcribe may not appear in the models list but is still accessible if allowed).
Rate Limiting
Transcription requests count toward your API key’s rate limits using the effective modelgpt-4o-transcribe.
Rate limit headers:
Best Practices
Audio Quality
- Clear audio: Higher quality audio produces better transcriptions
- Minimal background noise: Reduce noise for improved accuracy
- Appropriate volume: Ensure audio is not too quiet or distorted
Prompt Usage
- Provide context: Help the model understand domain-specific terminology
- Specify names: Include proper nouns that may be uncommon
- Set style: Indicate formal vs. casual transcription style
Error Handling
Comparison with OpenAI
This endpoint follows the OpenAI/v1/audio/transcriptions format with these specifics:
Similarities:
- Multipart form data format
- Required
fileandmodelparameters - Optional
promptparameter - JSON response with
textfield - OpenAI-compatible error envelopes
- Model restriction: Only
gpt-4o-transcribeis supported (OpenAI supportswhisper-1and variants) - Account routing: Uses model-agnostic account selection for reliability
- Rate limiting: Counts as request limit (not token limit)
- No streaming: Transcription is always non-streaming
Related Endpoints
- Backend Transcription:
/backend-api/transcribe(internal format, no model parameter required) - Chat Completions:
/v1/chat/completions(text generation with chat format) - Responses:
/v1/responses(text generation with responses format)