# Transcription Options ## Request Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `file` | file | Yes | Audio or video file to transcribe | | `model_id` | string | Yes | `scribe_v2` (or legacy `scribe_v1`) for batch transcription | | `language_code` | string | No | Language hint (ISO 639-1 or ISO 639-3, e.g., `en` or `eng`) | | `timestamps_granularity` | string | No | `none`, `word`, or `character` (default: `word`) | | `diarize` | boolean | No | Enable speaker diarization (up to 32 speakers for batch) | | `num_speakers` | integer | No | Maximum speakers to detect (up to 32 for batch) | | `diarization_threshold` | number | No | Tune diarization sensitivity when `diarize=true` | | `keyterms` | array | No | Terms to bias transcription (up to 100) | | `tag_audio_events` | boolean | No | Detect non-speech sounds (laughter, applause) | | `entity_detection` | string or array | No | Detect entities (e.g., `pii`, `phi`, `pci`, `offensive_language`) | | `use_multi_channel` | boolean | No | Split multichannel audio into separate transcripts | | `cloud_storage_url` | string | No | HTTPS URL to transcribe instead of uploading a file | | `webhook` | boolean | No | Process async and send result to webhook | | `webhook_metadata` | string or object | No | Custom metadata included in webhook responses | ## Python Example ```python from elevenlabs.client import ElevenLabs client = ElevenLabs() with open("audio.mp3", "rb") as audio_file: result = client.speech_to_text.convert( file=audio_file, model_id="scribe_v2", language_code="eng", timestamps_granularity="word", diarize=True, keyterms=["ElevenLabs", "Scribe"] ) ``` ## JavaScript Example ```javascript import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js"; import { createReadStream } from "fs"; const client = new ElevenLabsClient(); const result = await client.speechToText.convert({ file: createReadStream("audio.mp3"), modelId: "scribe_v2", languageCode: "eng", timestampsGranularity: "word", diarize: true, keyterms: ["ElevenLabs", "Scribe"], }); ``` ## cURL Example ```bash curl -X POST "https://api.elevenlabs.io/v1/speech-to-text" \ -H "xi-api-key: $ELEVENLABS_API_KEY" \ -F "file=@audio.mp3" \ -F "model_id=scribe_v2" \ -F "language_code=eng" \ -F "timestamps_granularity=word" \ -F "diarize=true" ``` ## Response Structure ```json { "text": "The complete transcribed text from the audio file.", "language_code": "eng", "language_probability": 0.98, "words": [ { "text": "The", "start": 0.0, "end": 0.15, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 0.15, "end": 0.16, "type": "spacing", "speaker_id": "speaker_0" } ] } ``` ## Response Fields | Field | Type | Description | |-------|------|-------------| | `text` | string | Full transcription text | | `language_code` | string | Detected language (ISO 639-1 or ISO 639-3) | | `language_probability` | float | Confidence in detection (0-1) | | `words` | array | Word-level timestamps (if requested) | | `words[].text` | string | The transcribed word or spacing | | `words[].start` | float | Start time in seconds | | `words[].end` | float | End time in seconds | | `words[].type` | string | `word`, `spacing`, or `audio_event` | | `words[].speaker_id` | string | Speaker identifier (if diarization enabled) | ## Supported Languages (90+) Common languages (ISO 639-3 codes): | Code | Language | Code | Language | |------|----------|------|----------| | `eng` | English | `jpn` | Japanese | | `spa` | Spanish | `kor` | Korean | | `fra` | French | `zho` | Mandarin | | `deu` | German | `ara` | Arabic | | `ita` | Italian | `hin` | Hindi | | `por` | Portuguese | `tur` | Turkish | | `nld` | Dutch | `swe` | Swedish | | `pol` | Polish | `dan` | Danish | | `rus` | Russian | `fin` | Finnish | Full list: Afrikaans, Amharic, Armenian, Azerbaijani, Belarusian, Bengali, Bosnian, Bulgarian, Burmese, Cantonese, Catalan, Cebuano, Croatian, Czech, Estonian, Filipino, Georgian, Greek, Gujarati, Hausa, Hebrew, Hungarian, Icelandic, Indonesian, Irish, Javanese, Kannada, Kazakh, Khmer, Kyrgyz, Lao, Latvian, Lithuanian, Luxembourgish, Macedonian, Malay, Malayalam, Maltese, Māori, Marathi, Mongolian, Nepali, Norwegian, Odia, Pashto, Persian, Punjabi, Romanian, Serbian, Shona, Sindhi, Slovak, Slovenian, Somali, Swahili, Tamil, Tajik, Telugu, Thai, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Wolof, Xhosa, Yoruba, Zulu. ## Format Requirements **Audio:** MP3, WAV, M4A, FLAC, OGG, WebM, AAC, AIFF, Opus **Video:** MP4, AVI, MKV, MOV, WMV, FLV, WebM, MPEG, 3GPP **Limits:** - Maximum file size: 3GB - Maximum duration: 10 hours ## Use Cases ### Subtitle Generation with Speakers ```python result = client.speech_to_text.convert( file=audio_file, model_id="scribe_v2", timestamps_granularity="word", diarize=True ) # Generate SRT with speaker labels for i, word in enumerate(result.words, 1): if word.type == "word": print(f"[{word.speaker_id}] {word.text} ({word.start:.2f}s)") ``` ### Meeting Transcription with Custom Terms ```python with open("meeting.mp3", "rb") as f: result = client.speech_to_text.convert( file=f, model_id="scribe_v2", diarize=True, keyterms=["Q4 forecast", "revenue target", "ACME Corp"] ) # Group by speaker current_speaker = None for word in result.words: if word.type == "word": if word.speaker_id != current_speaker: current_speaker = word.speaker_id print(f"\n[{current_speaker}]:", end=" ") print(word.text, end="") ```