Gladia
AI Services Available Requires secrets

Gladia — Speech-to-Text & Subtitle Generation

Automate speech-to-text transcription with Gladia in Enveu Flow. Generate subtitles and transcripts as part of your media operations pipeline.

What this integration does

Automate speech-to-text transcription with Gladia in Enveu Flow. Generate subtitles and transcripts as part of your media operations pipeline.

Best for

Generating subtitle files and time-coded transcripts from video audio tracks automatically after ingest or live stream completion.

Transcribe audio to text with high accuracy using Gladia. Supports 100+ languages. Use in Live to VoD, Auto Multi-Audio, and Multilingual Metadata workflows to automatically generate SRT and VTT subtitle files.

Inputs & outputs.

Inputs
Audio or video file URL Source media file for transcription
Language hint Optional language code to improve accuracy
Diarization setting Whether to identify multiple speakers in the transcript
Outputs
SRT subtitle file Time-coded subtitle file ready for CMS upload
VTT subtitle file WebVTT format for web player delivery
Raw transcript Full text transcript with timestamps and confidence scores

Triggers & actions.

Triggers
Audio file ready
Fires when an upstream step outputs an audio or video file URL ready for transcription
Live stream ends
Recording URL available from broadcast platform triggers subtitle generation
Actions
Transcribe audio
Generate a time-coded transcript from an audio or video file
Generate SRT
Produce an SRT subtitle file from the transcription output
Generate VTT
Produce a WebVTT subtitle file for web player delivery
Detect language
Identify the spoken language of the audio track
Translate transcript
Translate the transcript into a target language after transcription

Example workflow.

Live stream ends
Broadcast platform
Transcribe recording
Gladia
Generate SRT and VTT
Gladia
Attach subtitles
CMS asset
Notify
Slack

Frequently asked questions.

Gladia achieves high accuracy across 100+ languages. Accuracy varies by audio quality, speaker clarity, and language. For media content with clear audio, transcription accuracy is typically above 95%.
Yes. Gladia can identify and label different speakers in a recording. This is useful for interview content, panel discussions, and multi-presenter shows where you want speaker-attributed transcripts.
Flow generates SRT and VTT subtitle files from Gladia transcriptions. Both formats include accurate word-level timestamps. The files are attached to your asset record in your CMS automatically.
Yes. Gladia supports transcription in 100+ languages. Flow detects the source language automatically or you can specify it explicitly in your workflow configuration.