Transcribe Audio

Convert speech to text with automatic language detection.

POST /v1/transcribe 4 credits

Description

Transcribe audio from any video or audio URL using faster-whisper (OpenAI Whisper). Supports automatic language detection, multiple output formats (JSON with word-level timestamps, SRT, VTT), and configurable model sizes for speed vs accuracy tradeoffs.

Request Body

Send a JSON body with the following parameters:

Parameter Type Required Default Description
url string Yes Source video/audio URL
language string No null ISO 639-1 language code (e.g. "en", "fr"). Null = auto-detect.
output_format string No "json" Output format: "json", "srt", "vtt"
word_timestamps boolean No true Include word-level timestamps (JSON format only)
model_size string No "base" Whisper model: "tiny", "base", "small", "medium". Larger = more accurate but slower.

Model Sizes

Choose a model size based on your accuracy and speed requirements:

Model Speed Accuracy Best For
tiny ~10x realtime Good for clear speech Quick previews, clear audio
base ~5x realtime Good general accuracy Default choice, balanced
small ~2x realtime High accuracy Accented speech, noisy audio
medium ~1x realtime Highest accuracy Professional transcription, difficult audio

Start with Base

Start with 'base' for most use cases. Only upgrade to 'small' or 'medium' if accuracy is insufficient.

Code Examples

curl -X POST "https://videoconduit.com/v1/transcribe" \
  -H "Authorization: Bearer vc_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
    "output_format": "json",
    "language": "en"
  }'
import requests

response = requests.post(
    "https://videoconduit.com/v1/transcribe",
    headers={"Authorization": "Bearer vc_your_api_key"},
    json={
        "url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
        "output_format": "json",
        "language": "en",
    },
)
data = response.json()
print(data["job_id"])
const response = await fetch("https://videoconduit.com/v1/transcribe", {
  method: "POST",
  headers: {
    "Authorization": "Bearer vc_your_api_key",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://youtube.com/watch?v=dQw4w9WgXcQ",
    output_format: "json",
    language: "en",
  }),
});
const data = await response.json();
console.log(data.job_id);
$client = new GuzzleHttp\Client();
$response = $client->post("https://videoconduit.com/v1/transcribe", [
    "headers" => ["Authorization" => "Bearer vc_your_api_key"],
    "json" => [
        "url" => "https://youtube.com/watch?v=dQw4w9WgXcQ",
        "output_format" => "json",
        "language" => "en",
    ],
]);
$data = json_decode($response->getBody(), true);
echo $data["job_id"];
body := strings.NewReader(`{
  "url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
  "output_format": "json",
  "language": "en"
}`)
req, _ := http.NewRequest("POST", "https://videoconduit.com/v1/transcribe", body)
req.Header.Set("Authorization", "Bearer vc_your_api_key")
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
var data map[string]interface{}
json.NewDecoder(resp.Body).Decode(&data)
require "net/http"
require "json"

uri = URI("https://videoconduit.com/v1/transcribe")
req = Net::HTTP::Post.new(uri)
req["Authorization"] = "Bearer vc_your_api_key"
req["Content-Type"] = "application/json"
req.body = {
  url: "https://youtube.com/watch?v=dQw4w9WgXcQ",
  output_format: "json",
  language: "en",
}.to_json
res = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) { |http| http.request(req) }
data = JSON.parse(res.body)
puts data["job_id"]

Response

Initial Response

Returned immediately when the job is created:

{
  "job_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
  "status": "pending",
  "credits_charged": 4
}

Completed Job

Returned from GET /v1/jobs/{id} when the job finishes:

{
  "job_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
  "status": "completed",
  "download_url": "https://dl.videoconduit.com/files/b2c3d4e5.json",
  "result_data": {
    "language": "en",
    "text": "Never gonna give you up, never gonna let you down...",
    "duration": 212.0,
    "segment_count": 45,
    "output_format": "json"
  }
}

Output Formats

JSON (default)

Richest output. Contains full text, detected language, segments with timestamps, and optionally word-level timestamps with confidence scores.

{
  "text": "Never gonna give you up...",
  "language": "en",
  "segments": [
    {
      "start": 0.0,
      "end": 3.5,
      "text": " Never gonna give you up,",
      "words": [
        {"word": " Never", "start": 0.0, "end": 0.5, "probability": 0.95},
        {"word": " gonna", "start": 0.5, "end": 0.8, "probability": 0.92}
      ]
    }
  ]
}

SRT

Standard subtitle format. Compatible with most video players and editors.

1
00:00:00,000 --> 00:00:03,500
Never gonna give you up,

2
00:00:03,500 --> 00:00:07,200
never gonna let you down,

VTT

WebVTT format. Native support in HTML5 <track> elements for web video players.

WEBVTT

00:00:00.000 --> 00:00:03.500
Never gonna give you up,

00:00:03.500 --> 00:00:07.200
never gonna let you down,

Try It

{# Usage: {% include "docs/_playground.html" with endpoint_method="POST" endpoint_path="/v1/download" fields=playground_fields %} playground_fields is a list of dicts passed from the view: [ {"name": "url", "type": "text", "required": True, "placeholder": "https://youtube.com/watch?v=...", "label": "Video URL"}, {"name": "quality", "type": "select", "options": ["best", "1080p", "720p", "480p", "audio"], "default": "best", "label": "Quality"}, ] #}

Try It

POST /v1/transcribe
Response

Notes

Language Auto-Detection

When language is not specified, Whisper automatically detects the spoken language from the first 30 seconds of audio. Specifying the language improves both speed and accuracy.

Word Timestamps for Karaoke

Use word_timestamps with JSON output to build karaoke-style displays, precise audio editing, or word-level search indexing.

This site uses only essential cookies required for the service to function (session authentication and security). We do not use analytics, tracking, or advertising cookies. Learn more