Transcribe Audio
Convert speech to text with automatic language detection.
/v1/transcribe
4 credits
Description
Transcribe audio from any video or audio URL using faster-whisper (OpenAI Whisper). Supports automatic language detection, multiple output formats (JSON with word-level timestamps, SRT, VTT), and configurable model sizes for speed vs accuracy tradeoffs.
Request Body
Send a JSON body with the following parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| url | string | Yes | — | Source video/audio URL |
| language | string | No | null | ISO 639-1 language code (e.g. "en", "fr"). Null = auto-detect. |
| output_format | string | No | "json" | Output format: "json", "srt", "vtt" |
| word_timestamps | boolean | No | true | Include word-level timestamps (JSON format only) |
| model_size | string | No | "base" | Whisper model: "tiny", "base", "small", "medium". Larger = more accurate but slower. |
Model Sizes
Choose a model size based on your accuracy and speed requirements:
| Model | Speed | Accuracy | Best For |
|---|---|---|---|
| tiny | ~10x realtime | Good for clear speech | Quick previews, clear audio |
| base | ~5x realtime | Good general accuracy | Default choice, balanced |
| small | ~2x realtime | High accuracy | Accented speech, noisy audio |
| medium | ~1x realtime | Highest accuracy | Professional transcription, difficult audio |
Start with Base
Start with 'base' for most use cases. Only upgrade to 'small' or 'medium' if accuracy is insufficient.
Code Examples
curl -X POST "https://videoconduit.com/v1/transcribe" \
-H "Authorization: Bearer vc_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
"output_format": "json",
"language": "en"
}'import requests
response = requests.post(
"https://videoconduit.com/v1/transcribe",
headers={"Authorization": "Bearer vc_your_api_key"},
json={
"url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
"output_format": "json",
"language": "en",
},
)
data = response.json()
print(data["job_id"])const response = await fetch("https://videoconduit.com/v1/transcribe", {
method: "POST",
headers: {
"Authorization": "Bearer vc_your_api_key",
"Content-Type": "application/json",
},
body: JSON.stringify({
url: "https://youtube.com/watch?v=dQw4w9WgXcQ",
output_format: "json",
language: "en",
}),
});
const data = await response.json();
console.log(data.job_id);$client = new GuzzleHttp\Client();
$response = $client->post("https://videoconduit.com/v1/transcribe", [
"headers" => ["Authorization" => "Bearer vc_your_api_key"],
"json" => [
"url" => "https://youtube.com/watch?v=dQw4w9WgXcQ",
"output_format" => "json",
"language" => "en",
],
]);
$data = json_decode($response->getBody(), true);
echo $data["job_id"];body := strings.NewReader(`{
"url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
"output_format": "json",
"language": "en"
}`)
req, _ := http.NewRequest("POST", "https://videoconduit.com/v1/transcribe", body)
req.Header.Set("Authorization", "Bearer vc_your_api_key")
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
var data map[string]interface{}
json.NewDecoder(resp.Body).Decode(&data)require "net/http"
require "json"
uri = URI("https://videoconduit.com/v1/transcribe")
req = Net::HTTP::Post.new(uri)
req["Authorization"] = "Bearer vc_your_api_key"
req["Content-Type"] = "application/json"
req.body = {
url: "https://youtube.com/watch?v=dQw4w9WgXcQ",
output_format: "json",
language: "en",
}.to_json
res = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) { |http| http.request(req) }
data = JSON.parse(res.body)
puts data["job_id"]Response
Initial Response
Returned immediately when the job is created:
{
"job_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
"status": "pending",
"credits_charged": 4
}
Completed Job
Returned from GET /v1/jobs/{id} when the job finishes:
{
"job_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
"status": "completed",
"download_url": "https://dl.videoconduit.com/files/b2c3d4e5.json",
"result_data": {
"language": "en",
"text": "Never gonna give you up, never gonna let you down...",
"duration": 212.0,
"segment_count": 45,
"output_format": "json"
}
}
Output Formats
JSON (default)
Richest output. Contains full text, detected language, segments with timestamps, and optionally word-level timestamps with confidence scores.
{
"text": "Never gonna give you up...",
"language": "en",
"segments": [
{
"start": 0.0,
"end": 3.5,
"text": " Never gonna give you up,",
"words": [
{"word": " Never", "start": 0.0, "end": 0.5, "probability": 0.95},
{"word": " gonna", "start": 0.5, "end": 0.8, "probability": 0.92}
]
}
]
}
SRT
Standard subtitle format. Compatible with most video players and editors.
1
00:00:00,000 --> 00:00:03,500
Never gonna give you up,
2
00:00:03,500 --> 00:00:07,200
never gonna let you down,
VTT
WebVTT format. Native support in HTML5 <track> elements for web video players.
WEBVTT
00:00:00.000 --> 00:00:03.500
Never gonna give you up,
00:00:03.500 --> 00:00:07.200
never gonna let you down,
Try It
{# Usage: {% include "docs/_playground.html" with endpoint_method="POST" endpoint_path="/v1/download" fields=playground_fields %} playground_fields is a list of dicts passed from the view: [ {"name": "url", "type": "text", "required": True, "placeholder": "https://youtube.com/watch?v=...", "label": "Video URL"}, {"name": "quality", "type": "select", "options": ["best", "1080p", "720p", "480p", "audio"], "default": "best", "label": "Quality"}, ] #}Try It
POST /v1/transcribe
Notes
Language Auto-Detection
When language is not specified, Whisper automatically detects the spoken language from the first 30 seconds of audio. Specifying the language improves both speed and accuracy.
Word Timestamps for Karaoke
Use word_timestamps with JSON output to build karaoke-style displays, precise audio editing, or word-level search indexing.