Gladia — Speech-to-Text & Subtitle Generation
Automate speech-to-text transcription with Gladia in Enveu Flow. Generate subtitles and transcripts as part of your media operations pipeline.
What this integration does
Automate speech-to-text transcription with Gladia in Enveu Flow. Generate subtitles and transcripts as part of your media operations pipeline.
Best for
Generating subtitle files and time-coded transcripts from video audio tracks automatically after ingest or live stream completion.
Transcribe audio to text with high accuracy using Gladia. Supports 100+ languages. Use in Live to VoD, Auto Multi-Audio, and Multilingual Metadata workflows to automatically generate SRT and VTT subtitle files.
Inputs & outputs.
Inputs
Audio or video file URL
Source media file for transcription
Language hint
Optional language code to improve accuracy
Diarization setting
Whether to identify multiple speakers in the transcript
Outputs
SRT subtitle file
Time-coded subtitle file ready for CMS upload
VTT subtitle file
WebVTT format for web player delivery
Raw transcript
Full text transcript with timestamps and confidence scores
Triggers & actions.
Triggers
Audio file ready
Fires when an upstream step outputs an audio or video file URL ready for transcription
Live stream ends
Recording URL available from broadcast platform triggers subtitle generation
Actions
Transcribe audio
Generate a time-coded transcript from an audio or video file
Generate SRT
Produce an SRT subtitle file from the transcription output
Generate VTT
Produce a WebVTT subtitle file for web player delivery
Detect language
Identify the spoken language of the audio track
Translate transcript
Translate the transcript into a target language after transcription
Example workflow.
Live stream ends
Broadcast platform
→
Transcribe recording
Gladia
→
Generate SRT and VTT
Gladia
→
Attach subtitles
CMS asset
→
Notify
Slack
Used in these workflows.
Works well with
Frequently asked questions.
Gladia achieves high accuracy across 100+ languages. Accuracy varies by audio quality, speaker clarity, and language. For media content with clear audio, transcription accuracy is typically above 95%.
Yes. Gladia can identify and label different speakers in a recording. This is useful for interview content, panel discussions, and multi-presenter shows where you want speaker-attributed transcripts.
Flow generates SRT and VTT subtitle files from Gladia transcriptions. Both formats include accurate word-level timestamps. The files are attached to your asset record in your CMS automatically.
Yes. Gladia supports transcription in 100+ languages. Flow detects the source language automatically or you can specify it explicitly in your workflow configuration.