Introduction
As discussed in our blog post on Navigating the Maze: A Strategic Approach to Selecting the Ideal AI Transcription Tool, there are several criteria you need to consider when choosing an AI Transcription Tool. Each of these criteria will have a different weight based on your use cases and priorities.
The Analysis
At a high level, all of these tools & companies offer the same capabilities: transcriptions that are customizable, scalable, highly accurate, based on cutting-edge natural language processing technology, and can be deployed on their servers or self-hosted. But, when you look under the hood, there are many criteria that can help differentiate them and bring you closer to making the right decision. To help you choose, we decided to make our analysis of the following tools public: OpenAI Whisper, Azure Speech to Text, Amazon Web Services (AWS) Transcribe, Deepgram, Google Cloud Universal Speech to Text, IBM Watson, Speechmatics and Assembly AI.
OpenAI Whisper |
Pricing: $0.006/min (One-time) Automatic Data Redaction: No Fine Tuning: Yes Commitment Required: No HIPAA Compliance: Must sign BAA File Size Limit: 25 MB Speed (For 11 min Audio File): ~35 seconds Real-Time Transcription: Yes Multilingual Support: Yes Can Be Deployed In Canada (Data Residency): Will have to contact for this information. They say they support Canada, but doesn’t mention if processing happens in Canada only. |
Azure Speech to Text |
Pricing: $1/hr or $0.0166/min (Real-time) $0.36/hr or $0.006/min (Batch) Automatic Data Redaction: Yes. Text-only through Language service, and will incur extra charges. For PII limitations, see here. Fine Tuning: Yes, through Custom Speech Model Commitment Required: Will help reduce price but not required HIPAA Compliance: Must sign BAA File Size Limit: 1 GB Speed (For 11 min Audio File): 3 min Real-Time Transcription: Yes Multilingual Support: Yes Can Be Deployed In Canada (Data Residency): Yes |
AWS Transcribe |
Pricing: $0.006/min (One-time) Automatic Data Redaction: No Fine Tuning: Yes Commitment Required: Will help reduce price, but not required HIPAA Compliance: Must sign BAA File Size Limit: 2 GB Speed (For 11 min Audio File): 1 min Real-Time Transcription: Yes Multilingual Support: Yes Can Be Deployed In Canada (Data Residency): Yes |
Deepgram |
Pricing: $0.006/min (One-time) Automatic Data Redaction: No Fine Tuning: Yes Commitment Required: $10,000/year HIPAA Compliance: Must sign BAA File Size Limit: 2 GB Speed (For 11 min Audio File): 10-15 seconds Real-Time Transcription: Yes Multilingual Support: Yes Can Be Deployed In Canada (Data Residency): The model will have to be uploaded to a server. This can lead to higher monthly DevOps costs |
Google Cloud Universal |
Pricing: $0.006/min (One-time) Automatic Data Redaction: No Fine Tuning: Yes Commitment Required: No HIPAA Compliance: Must sign BAA File Size Limit: 10 MB Speed (For 11 min Audio File): Not Available Real-Time Transcription: Yes Multilingual Support: Yes Can Be Deployed In Canada (Data Residency): Yes |
IBM Watson |
Pricing: $0.006/min (One-time) Automatic Data Redaction: No Fine Tuning: Yes Commitment Required: Requires signing up on a premium plan HIPAA Compliance: Must sign BAA and be on a Premium Plan File Size Limit: 100 MB Speed (For 11 min Audio File): 7 minutes Real-Time Transcription: Yes Multilingual Support: Yes (in beta mode) Can Be Deployed In Canada (Data Residency): No |
Speechmatics |
Pricing: $0.006/min (One-time) Automatic Data Redaction: No Fine Tuning: Yes Commitment Required: No HIPAA Compliance: Must sign BAA File Size Limit: 1 GB Speed (For 11 min Audio File): 1.25 min Real-Time Transcription: Yes Multilingual Support: Yes Can Be Deployed In Canada (Data Residency): The model will have to be uploaded to a server. This can lead to higher monthly DevOps costs. |
Assembly AI |
Pricing: $0.006/min (One-time) Automatic Data Redaction: No Fine Tuning: Yes Commitment Required: 2,000 to 3,000 hours a month. This amounts to $20k - $36k per year. HIPAA Compliance: Must sign BAA File Size Limit 1 GB Speed (For 11 min Audio File): 2 min Real-Time Transcription: Yes Multilingual Support: Yes Can Be Deployed In Canada (Data Residency): No, but on the roadmap for Q1 2024 |
Disclaimer: Please note that this analysis was completed in December 2023. Some of these parameters may have evolved since then.
Models Specific to Technical Fields:
Several tools, such as Deepgram, AWS Transcribe and GCP Universal Speech to Text offer specialized models catering to technical fields. While these models may incur additional costs, the heightened accuracy they provide is a crucial factor for industries like healthcare and law. When considering these specialized models, always refer to the fundamental criteria of error rates and hallucination rates to ensure alignment with your specific use case.