Flow AI Services

Gladia + Enveu

Speech-to-text transcription. Generate subtitle files, transcripts, and time-coded captions from any video audio track.

Best forGenerating subtitle files and time-coded transcripts from video audio tracks automatically after ingest or live stream completion.

Transcribe audio to text with high accuracy using Gladia. Supports 100+ languages. Use in Live to VoD, Auto Multi-Audio, and Multilingual Metadata workflows to automatically generate SRT and VTT subtitle files.

Inputs & outputs

Inputs

Audio or video file URLSource media file for transcription

Language hintOptional language code to improve accuracy

Diarization settingWhether to identify multiple speakers in the transcript

Outputs

SRT subtitle fileTime-coded subtitle file ready for CMS upload

VTT subtitle fileWebVTT format for web player delivery

Raw transcriptFull text transcript with timestamps and confidence scores

Triggers & actions

Triggers

Audio file readyFires when an upstream step outputs an audio or video file URL ready for transcription

Live stream endsRecording URL available from broadcast platform triggers subtitle generation

Actions

Transcribe audioGenerate a time-coded transcript from an audio or video file

Generate SRTProduce an SRT subtitle file from the transcription output

Generate VTTProduce a WebVTT subtitle file for web player delivery

Detect languageIdentify the spoken language of the audio track

Translate transcriptTranslate the transcript into a target language after transcription

Example workflow

Live stream endsBroadcast platform

→

Transcribe recordingGladia

→

Generate SRT and VTTGladia

→

Attach subtitlesCMS asset

→

NotifySlack

Used in these workflows

Works well with

Frequently asked questions

What is Gladia + Enveu?

Speech-to-text transcription. Generate subtitle files, transcripts, and time-coded captions from any video audio track.

How accurate is Gladia speech-to-text transcription?

Gladia achieves high accuracy across 100+ languages. Accuracy varies by audio quality, speaker clarity, and language. For media content with clear audio, transcription accuracy is typically above 95%.

Does Gladia support speaker diarization?

Yes. Gladia can identify and label different speakers in a recording. This is useful for interview content, panel discussions, and multi-presenter shows where you want speaker-attributed transcripts.

What subtitle formats does Flow generate from Gladia transcriptions?

Flow generates SRT and VTT subtitle files from Gladia transcriptions. Both formats include accurate word-level timestamps. The files are attached to your asset record in your CMS automatically.

Can Flow transcribe audio in multiple languages?

Yes. Gladia supports transcription in 100+ languages. Flow detects the source language automatically or you can specify it explicitly in your workflow configuration.

Launch with Enveu + Gladia

See how Enveu works with Gladia to power and automate OTT at scale.

Talk to an expert View all integrations