39 lines
1.3 KiB
Markdown
39 lines
1.3 KiB
Markdown
|
# voice2subs
|
||
|
|
||
|
a quick and dirty shell+python combo which uses the google cloud
|
||
|
speech to text service to convert audio in a video file to subtitles.
|
||
|
|
||
|
## usage
|
||
|
provide one or more video files to process to `voice2subs.sh` on the cli:
|
||
|
```sh
|
||
|
$ ./voice2subs.sh test.mp4
|
||
|
Processing 'test.mp4'...
|
||
|
------------========----
|
||
|
extracting audio...
|
||
|
converting audio to text...
|
||
|
Waiting for operation [operations/8540494017153580661] to complete...done.
|
||
|
converting google yaml data to subtitle data...
|
||
|
Finished, result is in: 'test_with_subs.mkv'
|
||
|
```
|
||
|
|
||
|
## pre-reqs
|
||
|
this code requires the following to run:
|
||
|
* ffmpeg
|
||
|
* gcloud cli tool, configured with a gcs project
|
||
|
* python3-yaml
|
||
|
* python3-srt
|
||
|
|
||
|
## design
|
||
|
### ml2srt.py
|
||
|
a small python script that expects the output of a google ml command like
|
||
|
`gcloud -q --format yaml ml speech recognize-long-running --include-word-time-offsets`,
|
||
|
and converts it into an [SRT format](https://en.wikipedia.org/wiki/SubRip)
|
||
|
subtitles file.
|
||
|
|
||
|
### voice2subs.sh
|
||
|
a small shell script that does the following:
|
||
|
* rips audio track from a video file
|
||
|
* processes the audio track with `gcloud ml speech`, per above
|
||
|
* calls `ml2srt.py` to convert the google output to a subtitle file
|
||
|
* remuxes the original video and the subtitle file into a new file, dropping audio
|