shell+python scripts that convert the audio track in a video to a subtitle track, using Google's speech recognition

Find a file

Tessa Nordgren 7e17b15767 first pass, seems to work		2021-11-04 23:03:00 -07:00
LICENSE.md	first pass, seems to work	2021-11-04 23:03:00 -07:00
ml2srt.py	first pass, seems to work	2021-11-04 23:03:00 -07:00
README.md	first pass, seems to work	2021-11-04 23:03:00 -07:00
voice2subs.sh	first pass, seems to work	2021-11-04 23:03:00 -07:00

README.md

voice2subs

a quick and dirty shell+python combo which uses the google cloud speech to text service to convert audio in a video file to subtitles.

usage

provide one or more video files to process to voice2subs.sh on the cli:

$ ./voice2subs.sh test.mp4
Processing 'test.mp4'...
------------========----
extracting audio...
converting audio to text...
Waiting for operation [operations/8540494017153580661] to  complete...done.                                                                             
converting google yaml data to subtitle data...
Finished, result is in: 'test_with_subs.mkv'

pre-reqs

this code requires the following to run:

ffmpeg
gcloud cli tool, configured with a gcs project
python3-yaml
python3-srt

design

ml2srt.py

a small python script that expects the output of a google ml command like gcloud -q --format yaml ml speech recognize-long-running --include-word-time-offsets, and converts it into an SRT format subtitles file.

voice2subs.sh

a small shell script that does the following:

rips audio track from a video file
processes the audio track with gcloud ml speech, per above
calls ml2srt.py to convert the google output to a subtitle file
remuxes the original video and the subtitle file into a new file, dropping audio