📽️ Transcribe video lessons

9.1.2021 2-minute read

tutorial • GCP

Starting from mp4 video, extract audio and transcribe the audio to text files using Google APIs

Intro

In this tutorial, we will see how to get the audio transcriptions (text files) from a batch of mp4 videos.
The aim is to help students to get the transcript from teachers online courses, using one of the best black box ML technique Google Speech to text API.
Note: maybe a better solution for speech to text exist for English, but the example here is from Italian lessons.

Extract audio from the mp4

We assume that in the video only the teacher speaks, so we will extract a mono channel.

# Get a wav file for each mp4 file found on current directory:
~/Downloads/video/wav
❯ for FILE in *.mp4; do ffmpeg -i $FILE -acodec pcm_s16le -ac 1 -ar 16000 "${FILE%.*}".wav ; done

Tip: normalize the file names with
```
~/Downloads/video/wav
❯ detox .
```

Initialize Google cloud platform (GCP)

Get the 300$ from the free tier link
Create a bucket (here named “example-sbobinate”)
Enable the speech to text API

Upload wav to gcs

~/Downloads/video/wav
❯ gsutil -m cp * gs://example-sbobinate/test/

Tip: Slow upload? be sure the bucket location is near your region

Use the speech to text API

Log into GCP account

~/Downloads/video/wav
❯ gcloud init

Call the API and store the transcriptions

# File: `api_call.sh`
# Require gsutil, gcloud, jq

mkdir -p transcriptions

for FILE_PATH in $(gsutil ls "gs://example-sbobinate/test/"); do
  echo "Submit file $FILE_PATH"
  RUN_ID=$(gcloud ml speech recognize-long-running "$FILE_PATH" --language-code=it-IT --async | jq -r .name)

  echo "Run id: $RUN_ID"
  FILENAME=${FILE_PATH##*/}
  OUTPUT="./transcriptions/""${FILENAME%.*}".json
  echo "OUTPUT: $OUTPUT"

  gcloud ml speech operations wait $RUN_ID >"$OUTPUT"
  echo "-------------"
done

Parse and store the transcriptions

Parse all the json received from Google API speech

# File: `results_parser.sh`

mkdir -p ./transcriptions/only_text/

for FILE in ./transcriptions/*; do
  echo "Start working on $FILE..."

  FILENAME=${FILE##*/}
  OUTPUT="./transcriptions/only_text/""${FILENAME%.*}".txt
  echo "OUTPUT: $OUTPUT"
  echo "" >$OUTPUT  # create the file

  RESULTS=$(cat "$FILE" | jq .results) # get the transcriptions

  for row in $(echo "${RESULTS}" | jq -r '.[] | @base64'); do
    TRANSCRIPTION=$(echo ${row} | base64 --decode | jq -r ${1} | jq '.[]|first' | jq .transcript) # Isolate only the text of the 1st alternative
    echo $TRANSCRIPTION >>"$OUTPUT"
  done
done

Check the results

Check the video transcriptions under ./trascriptions/only_text/

References

Google recognize-long-running API