Frequently asked questions

Everything you need to know about Speechlab's speech-to-speech localization platform. Can't find your answer? Contact us.

General

What is Speechlab?

Speechlab is a speech-to-speech AI platform for video and audio localization and accessibility. Upload spoken content — video, audio, podcasts, audiobooks — and dub, caption, or subtitle it in 50+ languages with a full editor. This is not document or file translation; Speechlab works exclusively on spoken audio and video.

How is Speechlab different from Google Translate or DeepL?

Google Translate and DeepL translate text documents. Speechlab translates spoken content — the input is audio or video, and the output is dubbed audio, captions, or subtitles. The entire pipeline is speech-to-speech: ASR transcribes, AI translates, TTS generates the dubbed voice. You get localized media, not a translated text file.

What file types and formats does Speechlab support?

Input: Video (MP4, MOV, MKV, WebM), audio (MP3, WAV, M4A, FLAC), YouTube link paste, and SRT import. Files up to 1.5 GB.

Output: Dubbed video, dubbed audio, SRT subtitles, VTT subtitles, captions (sidecar or burned-in), and plain-text transcripts.

How many languages does Speechlab support?

50+ languages for dubbing, captioning, and subtitling — including major European, Asian, Middle Eastern, and African languages. Each language has native-voice options. Voice cloning availability varies by language pair.

Do I need to install anything?

No. Speechlab runs entirely in the browser. No desktop software, no plugins, no downloads.

Can I try Speechlab for free?

Yes. 3 free projects of dubbing. All features. No credit card required.

Dubbing

What is AI dubbing?

AI dubbing replaces the spoken audio in a video or audio file with synthesized speech in another language. Unlike subtitles, the audience hears the content — they don't read it. It combines automatic speech recognition (ASR), machine translation, and text-to-speech synthesis (TTS) to produce a dubbed version of the original content without a recording studio or voice actors.

How does AI dubbing work with Speechlab?

The pipeline works in steps: (1) upload your video or audio, (2) ASR transcribes with speaker diarization, (3) AI translates segment by segment, (4) you assign a voice per speaker — clone the original, pick a native voice from the Voice Library, (5) TTS generates the dubbed audio, (6) you export dubbed video, audio, or subtitles. Every step is editable.

How is dubbing different from subtitles?

Subtitles are text overlaid on video — the audience reads them. Dubbing replaces the audio — the audience hears the content in their language. Dubbing is the only localization option for audio-only content (podcasts, audiobooks) and produces a more natural experience for video where reading and watching compete for attention.

Can AI dubbing clone the original speaker's voice?

Yes. Source-clone mode captures the original speaker's voice characteristics and synthesizes them speaking the target language. For multi-speaker content, each speaker can be cloned independently.

What voice options are available?

Two modes per speaker: (1) Source clone — the AI replicates the original voice in the new language, (2) Native speaker voice from voice library — a natural-sounding voice native to the target language from the Speechlab catalogue for consistent brand voice across projects.

Can I fix one sentence without re-dubbing the entire file?

Yes. Speechlab supports segment-level re-rendering. Change a word in the translation, click "Merge Changes to Dub," and only the affected segments re-render. Your credits pay for the fix, not a full re-run.

How does Speechlab handle multiple speakers?

ASR automatically identifies and labels speakers (diarization). Each speaker gets their own voice assignment. You can rename, merge, or reassign speakers across the project. Voices are controlled per speaker, not per file.

Does Speechlab offer lip-sync?

Lip-sync is available for enterprise accounts on request. Contact sales for details.

What content types work for dubbing?

Any spoken content: video (films, documentaries, YouTube, product demos), audio (podcasts, audiobooks, training modules, lectures), and more. Any format, any length.

Transcription

How does transcription work?

Upload a video or audio file (or paste a YouTube link). ASR produces a diarized transcript with timestamps, speaker labels, and editable segments. The transcript appears in the editor where you can fix errors inline, adjust timing, and lock reviewed segments.

What languages does Speechlab transcribe?

50+ languages, including English, Spanish, French, German, Portuguese, Japanese, Chinese, Arabic, Hindi, Korean, and many more.

How accurate is AI transcription?

Accuracy depends on audio quality, accent, and background noise. On clean audio, modern ASR models achieve 95%+ word accuracy. Speechlab's inline editor lets you fix any errors directly — no export/re-import cycle.

Can I transcribe a YouTube video?

Yes. Paste the YouTube URL and Speechlab fetches and transcribes it automatically. No need to download the video first.

How long does transcription take?

Most files are transcribed in under a minute per 10 minutes of audio. Longer files and bulk uploads are queued and processed sequentially.

Can I import an existing SRT instead of transcribing?

Yes. Import a .srt file and skip ASR entirely. Speechlab parses the segments, timestamps, and text so you can continue with translation, dubbing, or subtitle editing.

Can multiple people edit the same transcript?

Yes. Speechlab supports concurrent editing with conflict detection — you'll see who else is editing and which segments they're working on. No silent overwrites.

Speech translation

How does translation work in Speechlab?

Translation works on the speech in your video or audio


. Speechlab transcribes your media, then AI translates each segment preserving speaker attribution and timing. You edit inline, then the translation feeds directly into dubbing, captions, or subtitles.

What translation engines does Speechlab use?

Claude, DeepL, and GPT-4 — selected per language pair for best quality. The AI translates segment by segment, preserving the structure of the spoken content.

Can I edit the AI translation before it's dubbed?

Yes. Every translated segment is editable inline. Fix errors, adjust phrasing, match the register your audience expects. Lock segments you've reviewed to protect them from re-processing.

How many languages can I translate into?

50+ target languages. Each language gets its own tab within the project. Add as many target languages as you need from a single source.

Does translating into more languages cost more per language?

Yes. Each language you add costs credits based on the source media duration.

What happens if I edit a translation after generating a dub?

The dub marks the edited segments as out of sync. Click "Merge Changes to Dub" to re-render only the changed segments — not the entire project.

Is this the same as Google Translate?

No. Google Translate is a text-to-text tool for documents and web pages. Speechlab translates the speech in your video or audio as part of a localization pipeline — the output is dubbed audio, captions, or subtitles, not a translated text file.

Captions & accessibility

How do captions work in Speechlab?

Captions are generated from the transcription. They inherit speaker labels, timestamps, and segment structure. Edit caption text inline, adjust display settings, then export as SRT/VTT sidecar files or burned-in captions.

How accurate are AI-generated captions?

Speechlab generates captions from a full ASR transcription pipeline — not a lightweight caption-specific model — so accuracy is typically higher than platform-native auto-captions (YouTube, TikTok, etc.). You can edit any errors inline before exporting.

Can I generate captions in multiple languages?

Yes. Translate the speech in your video or audio into any target language, then export captions from that translation. A single project can have captions in as many languages as you need.

Do captions include speaker identification?

Yes. Speaker labels carry through from the transcription. Each caption segment knows which speaker is talking — important for accessibility compliance.

Can I use Speechlab for accessibility compliance (WCAG, Section 508, ADA)?

Yes. Speechlab captions include speaker identification, accurate timestamps, and editable text — core requirements for WCAG 2.1 Level AA and Section 508 compliance. Export as SRT/VTT sidecar files for web players that support accessible captions.

What's the difference between captions and SRT subtitles in Speechlab?

Captions are display-ready text synced to your media — adjustable font, position, and styling. SRT subtitles are a separate product surface with broadcast-grade formatting: frame-accurate split/merge, CPS validation, and profile-driven rules for professional distribution. Both export as .srt files, but the SRT product gives you production-level control.

Does Speechlab support burned-in captions?

Yes. Export captions rendered directly into the video file for social media, downloads, and offline viewing where sidecar files aren't supported.

SRT subtitles

What is an SRT file?

An SRT (SubRip Subtitle) file is a plain-text subtitle format used by video players, streaming platforms, and broadcast systems. Each entry contains a sequence number, start/end timestamp, and the subtitle text. It's the most widely supported subtitle format.

What's the difference between SRT and VTT?

VTT (WebVTT) is a web-native subtitle format similar to SRT but with additional styling options. Speechlab exports both. SRT is standard for video editing and broadcast; VTT is preferred for web and podcast players.

Can I edit subtitle timing in the browser?

Yes. Drag segments on a waveform timeline, edit start/end times inline, split and merge segments, and validate against broadcast profiles — all in the browser. No desktop software required.

What are CPS limits and why do they matter?

CPS (characters per second) measures reading speed. Broadcast standards typically require 15–25 CPS. Exceeding the limit means viewers can't read the subtitle before it disappears. Speechlab validates CPS per segment and highlights violations.

Can I generate subtitles in multiple languages?

Yes. Translate the speech into any target language, then generate SRT files from each. Every language gets its own formatting rules, CPS calculation, and line-breaking logic.

Does Speechlab handle RTL languages (Arabic, Hebrew)?

Yes. The SRT Generator applies proper RTL formatting, Unicode handling, and script-direction rules for Arabic, Hebrew, and Farsi subtitle files.

What subtitle profiles are available?

Profiles define formatting rules — CPS, max line length, max lines per subtitle, min/max duration. Select a profile per project, validate against it, and fix violations before export. Custom profiles can be added for specific distribution requirements.

Pricing & plans

How does pricing work?

Credit-based, per-minute, per-language pricing. You pay credits based on the duration of your source media for each language you add.

Is there a free plan?

Yes. 2 free projects of dubbing with all features, dubbing, captions, subtitles. No credit card required.

What's included in the Pro plan?

Per-minute credits, any file length, up to 4K resolution, API access, all voice modes, all export formats.

What's included in the Enterprise plan?

Volume discounts, team roles, linguist review, custom voices, lip-sync, bulk processing, API integration, invoice billing, SSO, and custom data retention. Contact sales for details.

Are there per-language charges?

Yes. Each language you add costs credits based on the source media duration. The rate per language is flat and visible upfront — no hidden multipliers or surprise charges.          

Do credits expire?

No. Speechlab does not use a credit system. You pay per minute of processed content.

Is there a file length or size limit?

Files up to 1.5 GB, any duration. No per-file length cap on Pro or Enterprise plans.

Enterprise & teams

What enterprise features are available?

Bulk processing, API integration, linguist-reviewed outputs, custom voice creation, role-based team access, invoice billing, lip-sync, SSO, and custom data retention policies. Contact sales for details.

Does Speechlab offer an API?

Yes. Pro and Enterprise plans include RESTful API access with per-project endpoints, webhook callbacks, and batch job tracking. Integrate Speechlab into your existing media asset management or content pipeline.

How does bulk processing work?

Upload hundreds of video and audio files at once via the dashboard or API. Queue localization jobs across languages. Track progress per file, per language, per speaker. Bulk export dubbed media, captions, and subtitles in one click.

What is the human-in-the-loop linguist review?

Enterprise accounts include professional linguist review on every output — checking translation accuracy, cultural nuance, and brand voice. Flag segments for re-review, approve inline, and export only when quality clears your bar.

Can I manage team roles and permissions?

Yes. Assign creator, editor, and reviewer roles. Role-based permissions control who can edit, review, approve, and export. Conflict detection prevents silent overwrites when multiple people work on the same project.

Does Speechlab support SSO?

Yes. Enterprise accounts can configure SSO for team authentication. Contact sales for setup details.

Is NET-30 invoicing available?

Yes. Enterprise accounts use invoice billing — no credit card required. NET-30 terms.

Can I white-label Speechlab for my clients?

White-label options are available for enterprise localization agencies. Contact sales for details.

Security & privacy

Is my content secure?

All uploads are encrypted in transit (TLS) and at rest (AES-256). Files are stored in SOC 2-compliant infrastructure.

Who can access my uploaded content?

Only you and team members you've explicitly shared the project with. Speechlab employees do not access customer content unless required for support with your explicit permission.

Can I control data retention?

Enterprise accounts can request custom data retention policies — including automatic deletion after a specified period.

Where is data stored?

Files are stored in cloud infrastructure with regional availability. Enterprise accounts can request specific data residency requirements. Contact sales for details.

Is Speechlab SOC 2 compliant?

Yes. Speechlab infrastructure is SOC 2-compliant. Enterprise accounts can request compliance documentation.

Workflow & editor

What does the Speechlab editor look like?

A waveform timeline with draggable, resizable segments. Each segment shows the transcript text, speaker label, and timing. Click to seek, drag to reposition, resize to adjust timing. Translation, captions, and dub status are visible alongside the source.

Can I edit after the dub is generated?

Yes. Change the translation text, click "Merge Changes to Dub," and only the edited segments re-render. You never have to re-process the whole file.

Does Speechlab support real-time collaboration?

Yes. Multiple team members can work on the same project simultaneously. Conflict detection shows who's editing which segments. Edits auto-save.

Can I import an existing transcript or SRT?

Yes. Import .srt files to skip transcription entirely. Speechlab parses segments, timestamps, and text so you can continue with translation, dubbing, or subtitle editing from where you left off.

What export formats are available?

Dubbed video (MP4), dubbed audio (MP3, WAV), SRT subtitles, VTT subtitles, burned-in captions (video), plain-text transcripts, and JSON (structured data with timestamps and speakers).

Can I use Speechlab for just one step (e.g., only transcription)?

Yes. Every product works standalone. Use Speechlab for transcription only, captions only, or subtitles only — without ever generating a dub. You only pay for what you use.

Try it.
Hear your content in a new voice.