175 lines
5.6 KiB
Markdown
175 lines
5.6 KiB
Markdown
# Transcription Options
|
|
|
|
## Request Parameters
|
|
|
|
| Parameter | Type | Required | Description |
|
|
|-----------|------|----------|-------------|
|
|
| `file` | file | Yes | Audio or video file to transcribe |
|
|
| `model_id` | string | Yes | `scribe_v2` (or legacy `scribe_v1`) for batch transcription |
|
|
| `language_code` | string | No | Language hint (ISO 639-1 or ISO 639-3, e.g., `en` or `eng`) |
|
|
| `timestamps_granularity` | string | No | `none`, `word`, or `character` (default: `word`) |
|
|
| `diarize` | boolean | No | Enable speaker diarization (up to 32 speakers for batch) |
|
|
| `num_speakers` | integer | No | Maximum speakers to detect (up to 32 for batch) |
|
|
| `diarization_threshold` | number | No | Tune diarization sensitivity when `diarize=true` |
|
|
| `keyterms` | array | No | Terms to bias transcription (up to 100) |
|
|
| `tag_audio_events` | boolean | No | Detect non-speech sounds (laughter, applause) |
|
|
| `entity_detection` | string or array | No | Detect entities (e.g., `pii`, `phi`, `pci`, `offensive_language`) |
|
|
| `use_multi_channel` | boolean | No | Split multichannel audio into separate transcripts |
|
|
| `cloud_storage_url` | string | No | HTTPS URL to transcribe instead of uploading a file |
|
|
| `webhook` | boolean | No | Process async and send result to webhook |
|
|
| `webhook_metadata` | string or object | No | Custom metadata included in webhook responses |
|
|
|
|
## Python Example
|
|
|
|
```python
|
|
from elevenlabs.client import ElevenLabs
|
|
|
|
client = ElevenLabs()
|
|
|
|
with open("audio.mp3", "rb") as audio_file:
|
|
result = client.speech_to_text.convert(
|
|
file=audio_file,
|
|
model_id="scribe_v2",
|
|
language_code="eng",
|
|
timestamps_granularity="word",
|
|
diarize=True,
|
|
keyterms=["ElevenLabs", "Scribe"]
|
|
)
|
|
```
|
|
|
|
## JavaScript Example
|
|
|
|
```javascript
|
|
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
|
|
import { createReadStream } from "fs";
|
|
|
|
const client = new ElevenLabsClient();
|
|
|
|
const result = await client.speechToText.convert({
|
|
file: createReadStream("audio.mp3"),
|
|
modelId: "scribe_v2",
|
|
languageCode: "eng",
|
|
timestampsGranularity: "word",
|
|
diarize: true,
|
|
keyterms: ["ElevenLabs", "Scribe"],
|
|
});
|
|
```
|
|
|
|
## cURL Example
|
|
|
|
```bash
|
|
curl -X POST "https://api.elevenlabs.io/v1/speech-to-text" \
|
|
-H "xi-api-key: $ELEVENLABS_API_KEY" \
|
|
-F "file=@audio.mp3" \
|
|
-F "model_id=scribe_v2" \
|
|
-F "language_code=eng" \
|
|
-F "timestamps_granularity=word" \
|
|
-F "diarize=true"
|
|
```
|
|
|
|
## Response Structure
|
|
|
|
```json
|
|
{
|
|
"text": "The complete transcribed text from the audio file.",
|
|
"language_code": "eng",
|
|
"language_probability": 0.98,
|
|
"words": [
|
|
{
|
|
"text": "The",
|
|
"start": 0.0,
|
|
"end": 0.15,
|
|
"type": "word",
|
|
"speaker_id": "speaker_0"
|
|
},
|
|
{
|
|
"text": " ",
|
|
"start": 0.15,
|
|
"end": 0.16,
|
|
"type": "spacing",
|
|
"speaker_id": "speaker_0"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## Response Fields
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `text` | string | Full transcription text |
|
|
| `language_code` | string | Detected language (ISO 639-1 or ISO 639-3) |
|
|
| `language_probability` | float | Confidence in detection (0-1) |
|
|
| `words` | array | Word-level timestamps (if requested) |
|
|
| `words[].text` | string | The transcribed word or spacing |
|
|
| `words[].start` | float | Start time in seconds |
|
|
| `words[].end` | float | End time in seconds |
|
|
| `words[].type` | string | `word`, `spacing`, or `audio_event` |
|
|
| `words[].speaker_id` | string | Speaker identifier (if diarization enabled) |
|
|
|
|
## Supported Languages (90+)
|
|
|
|
Common languages (ISO 639-3 codes):
|
|
|
|
| Code | Language | Code | Language |
|
|
|------|----------|------|----------|
|
|
| `eng` | English | `jpn` | Japanese |
|
|
| `spa` | Spanish | `kor` | Korean |
|
|
| `fra` | French | `zho` | Mandarin |
|
|
| `deu` | German | `ara` | Arabic |
|
|
| `ita` | Italian | `hin` | Hindi |
|
|
| `por` | Portuguese | `tur` | Turkish |
|
|
| `nld` | Dutch | `swe` | Swedish |
|
|
| `pol` | Polish | `dan` | Danish |
|
|
| `rus` | Russian | `fin` | Finnish |
|
|
|
|
Full list: Afrikaans, Amharic, Armenian, Azerbaijani, Belarusian, Bengali, Bosnian, Bulgarian, Burmese, Cantonese, Catalan, Cebuano, Croatian, Czech, Estonian, Filipino, Georgian, Greek, Gujarati, Hausa, Hebrew, Hungarian, Icelandic, Indonesian, Irish, Javanese, Kannada, Kazakh, Khmer, Kyrgyz, Lao, Latvian, Lithuanian, Luxembourgish, Macedonian, Malay, Malayalam, Maltese, Māori, Marathi, Mongolian, Nepali, Norwegian, Odia, Pashto, Persian, Punjabi, Romanian, Serbian, Shona, Sindhi, Slovak, Slovenian, Somali, Swahili, Tamil, Tajik, Telugu, Thai, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Wolof, Xhosa, Yoruba, Zulu.
|
|
|
|
## Format Requirements
|
|
|
|
**Audio:** MP3, WAV, M4A, FLAC, OGG, WebM, AAC, AIFF, Opus
|
|
**Video:** MP4, AVI, MKV, MOV, WMV, FLV, WebM, MPEG, 3GPP
|
|
|
|
**Limits:**
|
|
- Maximum file size: 3GB
|
|
- Maximum duration: 10 hours
|
|
|
|
## Use Cases
|
|
|
|
### Subtitle Generation with Speakers
|
|
|
|
```python
|
|
result = client.speech_to_text.convert(
|
|
file=audio_file,
|
|
model_id="scribe_v2",
|
|
timestamps_granularity="word",
|
|
diarize=True
|
|
)
|
|
|
|
# Generate SRT with speaker labels
|
|
for i, word in enumerate(result.words, 1):
|
|
if word.type == "word":
|
|
print(f"[{word.speaker_id}] {word.text} ({word.start:.2f}s)")
|
|
```
|
|
|
|
### Meeting Transcription with Custom Terms
|
|
|
|
```python
|
|
with open("meeting.mp3", "rb") as f:
|
|
result = client.speech_to_text.convert(
|
|
file=f,
|
|
model_id="scribe_v2",
|
|
diarize=True,
|
|
keyterms=["Q4 forecast", "revenue target", "ACME Corp"]
|
|
)
|
|
|
|
# Group by speaker
|
|
current_speaker = None
|
|
for word in result.words:
|
|
if word.type == "word":
|
|
if word.speaker_id != current_speaker:
|
|
current_speaker = word.speaker_id
|
|
print(f"\n[{current_speaker}]:", end=" ")
|
|
print(word.text, end="")
|
|
```
|