Frequently asked questions
Everything you need to know about Speechlab's speech-to-speech localization platform. Can't find your answer? Contact us.
General
Speechlab is a speech-to-speech AI platform for video and audio localization and accessibility. Upload spoken content — video, audio, podcasts, audiobooks — and dub, caption, or subtitle it in 50+ languages with a full editor. This is not document or file translation; Speechlab works exclusively on spoken audio and video.
Google Translate and DeepL translate text documents. Speechlab translates spoken content — the input is audio or video, and the output is dubbed audio, captions, or subtitles. The entire pipeline is speech-to-speech: ASR transcribes, AI translates, TTS generates the dubbed voice. You get localized media, not a translated text file.
Input: Video (MP4, MOV, MKV, WebM), audio (MP3, WAV, M4A, FLAC), YouTube link paste, and SRT import. Files up to 1.5 GB.
Output: Dubbed video, dubbed audio, SRT subtitles, VTT subtitles, captions (sidecar or burned-in), and plain-text transcripts.
50+ languages for dubbing, captioning, and subtitling — including major European, Asian, Middle Eastern, and African languages. Each language has native-voice options. Voice cloning availability varies by language pair.
No. Speechlab runs entirely in the browser. No desktop software, no plugins, no downloads.
Yes. 3 free projects of dubbing. All features. No credit card required.
Dubbing
AI dubbing replaces the spoken audio in a video or audio file with synthesized speech in another language. Unlike subtitles, the audience hears the content — they don't read it. It combines automatic speech recognition (ASR), machine translation, and text-to-speech synthesis (TTS) to produce a dubbed version of the original content without a recording studio or voice actors.
The pipeline works in steps: (1) upload your video or audio, (2) ASR transcribes with speaker diarization, (3) AI translates segment by segment, (4) you assign a voice per speaker — clone the original, pick a native voice from the Voice Library, (5) TTS generates the dubbed audio, (6) you export dubbed video, audio, or subtitles. Every step is editable.
Subtitles are text overlaid on video — the audience reads them. Dubbing replaces the audio — the audience hears the content in their language. Dubbing is the only localization option for audio-only content (podcasts, audiobooks) and produces a more natural experience for video where reading and watching compete for attention.
Yes. Source-clone mode captures the original speaker's voice characteristics and synthesizes them speaking the target language. For multi-speaker content, each speaker can be cloned independently.
Two modes per speaker: (1) Source clone — the AI replicates the original voice in the new language, (2) Native speaker voice from voice library — a natural-sounding voice native to the target language from the Speechlab catalogue for consistent brand voice across projects.
Yes. Speechlab supports segment-level re-rendering. Change a word in the translation, click "Merge Changes to Dub," and only the affected segments re-render. Your credits pay for the fix, not a full re-run.
ASR automatically identifies and labels speakers (diarization). Each speaker gets their own voice assignment. You can rename, merge, or reassign speakers across the project. Voices are controlled per speaker, not per file.
Lip-sync is available for enterprise accounts on request. Contact sales for details.
Any spoken content: video (films, documentaries, YouTube, product demos), audio (podcasts, audiobooks, training modules, lectures), and more. Any format, any length.
Transcription
Upload a video or audio file (or paste a YouTube link). ASR produces a diarized transcript with timestamps, speaker labels, and editable segments. The transcript appears in the editor where you can fix errors inline, adjust timing, and lock reviewed segments.
50+ languages, including English, Spanish, French, German, Portuguese, Japanese, Chinese, Arabic, Hindi, Korean, and many more.
Accuracy depends on audio quality, accent, and background noise. On clean audio, modern ASR models achieve 95%+ word accuracy. Speechlab's inline editor lets you fix any errors directly — no export/re-import cycle.
Yes. Paste the YouTube URL and Speechlab fetches and transcribes it automatically. No need to download the video first.
Most files are transcribed in under a minute per 10 minutes of audio. Longer files and bulk uploads are queued and processed sequentially.
Yes. Import a .srt file and skip ASR entirely. Speechlab parses the segments, timestamps, and text so you can continue with translation, dubbing, or subtitle editing.
Yes. Speechlab supports concurrent editing with conflict detection — you'll see who else is editing and which segments they're working on. No silent overwrites.
Speech translation
Translation works on the speech in your video or audio
. Speechlab transcribes your media, then AI translates each segment preserving speaker attribution and timing. You edit inline, then the translation feeds directly into dubbing, captions, or subtitles.
Claude, DeepL, and GPT-4 — selected per language pair for best quality. The AI translates segment by segment, preserving the structure of the spoken content.
Yes. Every translated segment is editable inline. Fix errors, adjust phrasing, match the register your audience expects. Lock segments you've reviewed to protect them from re-processing.
50+ target languages. Each language gets its own tab within the project. Add as many target languages as you need from a single source.
Yes. Each language you add costs credits based on the source media duration.
The dub marks the edited segments as out of sync. Click "Merge Changes to Dub" to re-render only the changed segments — not the entire project.
No. Google Translate is a text-to-text tool for documents and web pages. Speechlab translates the speech in your video or audio as part of a localization pipeline — the output is dubbed audio, captions, or subtitles, not a translated text file.
Captions & accessibility
Captions are generated from the transcription. They inherit speaker labels, timestamps, and segment structure. Edit caption text inline, adjust display settings, then export as SRT/VTT sidecar files or burned-in captions.
Speechlab generates captions from a full ASR transcription pipeline — not a lightweight caption-specific model — so accuracy is typically higher than platform-native auto-captions (YouTube, TikTok, etc.). You can edit any errors inline before exporting.
Yes. Translate the speech in your video or audio into any target language, then export captions from that translation. A single project can have captions in as many languages as you need.
Yes. Speaker labels carry through from the transcription. Each caption segment knows which speaker is talking — important for accessibility compliance.
Yes. Speechlab captions include speaker identification, accurate timestamps, and editable text — core requirements for WCAG 2.1 Level AA and Section 508 compliance. Export as SRT/VTT sidecar files for web players that support accessible captions.
Captions are display-ready text synced to your media — adjustable font, position, and styling. SRT subtitles are a separate product surface with broadcast-grade formatting: frame-accurate split/merge, CPS validation, and profile-driven rules for professional distribution. Both export as .srt files, but the SRT product gives you production-level control.
Yes. Export captions rendered directly into the video file for social media, downloads, and offline viewing where sidecar files aren't supported.
SRT subtitles
An SRT (SubRip Subtitle) file is a plain-text subtitle format used by video players, streaming platforms, and broadcast systems. Each entry contains a sequence number, start/end timestamp, and the subtitle text. It's the most widely supported subtitle format.
VTT (WebVTT) is a web-native subtitle format similar to SRT but with additional styling options. Speechlab exports both. SRT is standard for video editing and broadcast; VTT is preferred for web and podcast players.
Yes. Drag segments on a waveform timeline, edit start/end times inline, split and merge segments, and validate against broadcast profiles — all in the browser. No desktop software required.
CPS (characters per second) measures reading speed. Broadcast standards typically require 15–25 CPS. Exceeding the limit means viewers can't read the subtitle before it disappears. Speechlab validates CPS per segment and highlights violations.
Yes. Translate the speech into any target language, then generate SRT files from each. Every language gets its own formatting rules, CPS calculation, and line-breaking logic.
Yes. The SRT Generator applies proper RTL formatting, Unicode handling, and script-direction rules for Arabic, Hebrew, and Farsi subtitle files.
Profiles define formatting rules — CPS, max line length, max lines per subtitle, min/max duration. Select a profile per project, validate against it, and fix violations before export. Custom profiles can be added for specific distribution requirements.
Pricing & plans
Credit-based, per-minute, per-language pricing. You pay credits based on the duration of your source media for each language you add.
Yes. 2 free projects of dubbing with all features, dubbing, captions, subtitles. No credit card required.
Per-minute credits, any file length, up to 4K resolution, API access, all voice modes, all export formats.
Volume discounts, team roles, linguist review, custom voices, lip-sync, bulk processing, API integration, invoice billing, SSO, and custom data retention. Contact sales for details.
Yes. Each language you add costs credits based on the source media duration. The rate per language is flat and visible upfront — no hidden multipliers or surprise charges.
No. Speechlab does not use a credit system. You pay per minute of processed content.
Files up to 1.5 GB, any duration. No per-file length cap on Pro or Enterprise plans.
Enterprise & teams
Bulk processing, API integration, linguist-reviewed outputs, custom voice creation, role-based team access, invoice billing, lip-sync, SSO, and custom data retention policies. Contact sales for details.
Yes. Pro and Enterprise plans include RESTful API access with per-project endpoints, webhook callbacks, and batch job tracking. Integrate Speechlab into your existing media asset management or content pipeline.
Upload hundreds of video and audio files at once via the dashboard or API. Queue localization jobs across languages. Track progress per file, per language, per speaker. Bulk export dubbed media, captions, and subtitles in one click.
Enterprise accounts include professional linguist review on every output — checking translation accuracy, cultural nuance, and brand voice. Flag segments for re-review, approve inline, and export only when quality clears your bar.
Yes. Assign creator, editor, and reviewer roles. Role-based permissions control who can edit, review, approve, and export. Conflict detection prevents silent overwrites when multiple people work on the same project.
Yes. Enterprise accounts can configure SSO for team authentication. Contact sales for setup details.
Yes. Enterprise accounts use invoice billing — no credit card required. NET-30 terms.
White-label options are available for enterprise localization agencies. Contact sales for details.
Security & privacy
All uploads are encrypted in transit (TLS) and at rest (AES-256). Files are stored in SOC 2-compliant infrastructure.
Only you and team members you've explicitly shared the project with. Speechlab employees do not access customer content unless required for support with your explicit permission.
Enterprise accounts can request custom data retention policies — including automatic deletion after a specified period.
Files are stored in cloud infrastructure with regional availability. Enterprise accounts can request specific data residency requirements. Contact sales for details.
Yes. Speechlab infrastructure is SOC 2-compliant. Enterprise accounts can request compliance documentation.
Workflow & editor
A waveform timeline with draggable, resizable segments. Each segment shows the transcript text, speaker label, and timing. Click to seek, drag to reposition, resize to adjust timing. Translation, captions, and dub status are visible alongside the source.
Yes. Change the translation text, click "Merge Changes to Dub," and only the edited segments re-render. You never have to re-process the whole file.
Yes. Multiple team members can work on the same project simultaneously. Conflict detection shows who's editing which segments. Edits auto-save.
Yes. Import .srt files to skip transcription entirely. Speechlab parses segments, timestamps, and text so you can continue with translation, dubbing, or subtitle editing from where you left off.
Dubbed video (MP4), dubbed audio (MP3, WAV), SRT subtitles, VTT subtitles, burned-in captions (video), plain-text transcripts, and JSON (structured data with timestamps and speakers).
Yes. Every product works standalone. Use Speechlab for transcription only, captions only, or subtitles only — without ever generating a dub. You only pay for what you use.
.png)