Some of my most popular videos talk about transcription—how to transcribe audio to text. In today’s tutorial, I’m excited to show you how you can convert audio files or video files to text completely for free without any limit.
We are going to use something called Whisper AI, a machine learning model for speech recognition and transcription created by OpenAI, the creators of ChatGPT. Whisper is completely free and supports 99 languages. You can convert audio or video files to text in 99 different languages using this method.
Why Use Google Colaboratory?
There is a way to install Whisper on your computer, but not everyone has a fast, powerful computer. So, instead, we will use Google Colaboratory within our Google Drive account. This method allows you to write and run code directly in your browser, making it accessible from any computer without needing to install anything locally.
How to Set Up Google Colaboratory
Step 1: Open Google Drive
- All you need is your Gmail account to access Google Drive, and it’s free.
- Go to Google Drive and click on New.
- Scroll down and click on More.
- Click Connect More Apps.
Step 2: Install Google Colaboratory
- Search for Colaboratory in the app search bar.
- Click on the first result that pops up and click Install.
- Click Continue and sign in with your Google account if prompted.
- Click Done and close the marketplace window.
Step 3: Open Google Colaboratory
- In Google Drive, click New again.
- Click on More, and select Google Colaboratory.
Transcribing Audio Files with Whisper AI
Step 1: Set Up Your Colab File
- Double-click where it says Untitled to rename the file, keeping the extension as it is, then press Enter.
- Click on Runtime and select Change runtime type.
- Change the hardware accelerator from CPU to T4 GPU and click Save.
Step 2: Install Whisper AI and FFmpeg
- You need to install Whisper AI and FFmpeg to work with both audio and video files.
- Copy and paste the following code into the cell in Google Colab:
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
- Click the Run Cell icon to execute the code. This will install Whisper and FFmpeg and should only take a few minutes.
- Make sure to remove any extra spaces or characters before or after these codes.
Step 3: Upload Your Audio File
- Click on the folder icon on the left side of the screen.
- Drag and drop your audio or video file into this section.
- Wait until the file has finished uploading before beginning the next step. You can see the file upload progress in the navigation menu on the left.
Step 4: Transcribe the Audio File
- Insert the following code in a new code cell:
!whisper "YOUR FILE NAME" --model medium
- Replace your-file-name.mp3 with your exact file name including the extension, then click Run Cell.
Step 5: Download the Transcription
- After the transcription is complete, you can download the text file or subtitle file.
- Click on the file you want to download, hover over the icon, and click Download.
Transcribing Video Files
Step 1: Upload Your Video File
- Drag and drop your video file into the Google Colab folder section.
- Rename the file if necessary for easier handling.
Step 2: Transcribe the Video File
- Use similar code as for audio files, but adjust for the video file format by adding the appropriate video extension:
!whisper "YOUR FILE NAME" --model medium
- Replace your-file-name.mp4 with your exact video file name including the extension, then click Run Cell.
Step 3: Download the Transcription
- Wait for the transcription to complete and download the text or subtitle file as previously shown.
Using Whisper AI and Google Colaboratory, you can easily transcribe audio and video files to text for free. This method is efficient and saves a lot of time compared to manual transcription. If you have any questions, feel free to ask in the comments section. I hope you enjoy using this method!
Leave a Reply