Convert Audio to Text with Whisper AI + Google Colaboratory

Some of my most popular videos talk about transcription—how to transcribe audio to text. In today’s tutorial, I’m excited to show you how you can convert audio files or video files to text completely for free without any limit.

We are going to use something called Whisper AI, a machine learning model for speech recognition and transcription created by OpenAI, the creators of ChatGPT. Whisper is completely free and supports 99 languages. You can convert audio or video files to text in 99 different languages using this method.

Why Use Google Colaboratory?

There is a way to install Whisper on your computer, but not everyone has a fast, powerful computer. So, instead, we will use Google Colaboratory within our Google Drive account. This method allows you to write and run code directly in your browser, making it accessible from any computer without needing to install anything locally.

How to Set Up Google Colaboratory

Step 1: Open Google Drive

All you need is your Gmail account to access Google Drive, and it’s free.
Go to Google Drive and click on New.
Scroll down and click on More.
Click Connect More Apps.

Step 2: Install Google Colaboratory

Search for Colaboratory in the app search bar.
Click on the first result that pops up and click Install.
Click Continue and sign in with your Google account if prompted.
Click Done and close the marketplace window.

Step 3: Open Google Colaboratory

In Google Drive, click New again.
Click on More, and select Google Colaboratory.

Transcribing Audio Files with Whisper AI

Step 1: Set Up Your Colab File

Double-click where it says Untitled to rename the file, keeping the extension as it is, then press Enter.
Click on Runtime and select Change runtime type.
Change the hardware accelerator from CPU to T4 GPU and click Save.

Step 2: Install Whisper AI and FFmpeg

You need to install Whisper AI and FFmpeg to work with both audio and video files.
Copy and paste the following code into the cell in Google Colab:

!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg

Click the Run Cell icon to execute the code. This will install Whisper and FFmpeg and should only take a few minutes.
Make sure to remove any extra spaces or characters before or after these codes.

Step 3: Upload Your Audio File

Click on the folder icon on the left side of the screen.
Drag and drop your audio or video file into this section.
Wait until the file has finished uploading before beginning the next step. You can see the file upload progress in the navigation menu on the left.

Step 4: Transcribe the Audio File

Insert the following code in a new code cell:

!whisper "YOUR FILE NAME" --model medium

Replace your-file-name.mp3 with your exact file name including the extension, then click Run Cell.

Step 5: Download the Transcription

After the transcription is complete, you can download the text file or subtitle file.
Click on the file you want to download, hover over the icon, and click Download.

Transcribing Video Files

Step 1: Upload Your Video File

Drag and drop your video file into the Google Colab folder section.
Rename the file if necessary for easier handling.

Step 2: Transcribe the Video File

Use similar code as for audio files, but adjust for the video file format by adding the appropriate video extension:

!whisper "YOUR FILE NAME" --model medium

Replace your-file-name.mp4 with your exact video file name including the extension, then click Run Cell.

Step 3: Download the Transcription

Wait for the transcription to complete and download the text or subtitle file as previously shown.

Using Whisper AI and Google Colaboratory, you can easily transcribe audio and video files to text for free. This method is efficient and saves a lot of time compared to manual transcription. If you have any questions, feel free to ask in the comments section. I hope you enjoy using this method!

Additional menu

Why Use Google Colaboratory?

How to Set Up Google Colaboratory

Transcribing Audio Files with Whisper AI

Transcribing Video Files

Reader Interactions

Leave a Reply Cancel reply