Can ChatGPT transcribe audio in 2026? Yes — via Record Mode (macOS desktop app) and the OpenAI speech-to-text API — but both are cloud-based, require internet, and how your data is handled depends on your product tier and settings. If you need guaranteed offline, on-device transcription, VoiceScriber is the alternative: 100% offline in 100+ languages, and it never sends any recording or data to any server.
TL;DR
ChatGPT can transcribe audio via Record Mode (macOS) and the speech-to-text API (models: gpt-4o-transcribe, gpt-4o-mini-transcribe, whisper-1, gpt-4o-transcribe-diarize). It's powerful — summaries, action items, speaker diarization — but remains cloud-based (internet required). Audio and transcripts are processed on OpenAI's servers. Consumer ChatGPT may use content for training unless you opt out; Enterprise/Edu content is excluded by default. The audio transcription API endpoints (/v1/audio/transcriptions and /v1/audio/translations) currently have no training, no abuse-monitoring retention, and no application-state retention. If your requirement is "no audio leaves the device — ever", VoiceScriber is the alternative: 100% offline, on-device transcription in 100+ languages.
Table of contents
- Can ChatGPT transcribe audio?
- Record Mode vs API vs Realtime: what's the difference?
- What data does OpenAI delete, retain, or use for training?
- What changed in 2026?
- What are the current limits?
- ChatGPT transcription: key pros
- Where ChatGPT transcription breaks down
- Who should not use ChatGPT transcription?
- The private offline alternative: VoiceScriber
- At-a-glance comparison table
- If you must use ChatGPT, harden your setup
- How we tested
- Related articles
- FAQs
1. Can ChatGPT transcribe audio?
Yes. As of 2026, OpenAI offers two main ways to transcribe audio with ChatGPT:
- ChatGPT Record Mode — a feature in the macOS desktop app that live-transcribes meetings and notes, then creates a private "canvas" with summaries and action items.
- OpenAI Speech-to-Text API — a cloud API endpoint that accepts audio files and returns transcriptions using models like
gpt-4o-transcribe,whisper-1, and others.
Both methods require an internet connection and process audio on OpenAI's servers. Neither works offline.
2. Record Mode vs OpenAI API vs Realtime: what's the difference?
| Feature | Record Mode | Speech-to-Text API | Realtime API |
|---|---|---|---|
| Platform | macOS desktop app only | Any platform (REST API) | Any platform (WebSocket) |
| Access | Plus, Pro, Business, Enterprise, Edu | API key (paid usage) | API key (paid usage) |
| Current models | Internal (not user-selectable) | gpt-4o-transcribe, gpt-4o-mini-transcribe, whisper-1, gpt-4o-transcribe-diarize |
Realtime-specific models |
| Use case | Live meeting recording with auto-summaries | Batch transcription of audio files | Streaming, low-latency voice apps |
| Speaker diarization | Supports multiple speakers | Via gpt-4o-transcribe-diarize |
Depends on implementation |
| Data handling | Audio deleted after transcription; transcripts follow workspace retention settings | No retention for audio endpoints | Per API data controls |
For a detailed comparison of models, supported languages, and file formats, see OpenAI's speech-to-text documentation.
3. What data does OpenAI delete, retain, or use for training?
This is the most common concern — and the answer depends on which product you use:
Record Mode (ChatGPT macOS app)
- Audio recordings are deleted after transcription.
- Transcripts and canvases follow your workspace retention settings and are removed within 30 days after deletion unless legally required to retain them.
- Record history: When enabled, Record Mode can reference past recording transcripts and canvases in later conversations. This is an important privacy nuance — your past recordings may inform future ChatGPT responses within the same workspace.
- Consumer ChatGPT (Plus/Pro): Transcripts may be used to train models unless you opt out via privacy controls.
- Enterprise/Edu: Excluded from training by default; admins can disable Record Mode entirely.
Speech-to-Text API
- The
/v1/audio/transcriptionsand/v1/audio/translationsendpoints currently have no training, no abuse-monitoring retention, and no application-state retention — a significant privacy upgrade compared with many other API endpoints that retain data for up to 30 days. - API data is not used for training by default.
Legal holds
- Active litigation can override normal deletion timelines. For example, litigation-related data preservation orders may require OpenAI to retain data beyond the standard windows described above.
Key takeaway: The API audio endpoints have the cleanest data story (no retention), while Record Mode transcripts persist in your workspace and may feed future conversations. Neither option keeps data fully on your device.
4. What changed in 2026?
- Record Mode is still macOS-only. No Windows, iOS, or web version as of March 2026.
- The 120-minute session cap remains. Longer meetings still need to be split across sessions.
- Record history can now feed future responses. When "reference record history" is enabled, past transcripts and canvases may be used in later ChatGPT conversations — a meaningful privacy consideration.
- The API data story is more nuanced than "30-day retention." The audio transcription endpoints (
/v1/audio/transcriptions,/v1/audio/translations) currently show no retention across any category, while other endpoints may still retain data for abuse monitoring. - New transcription models:
gpt-4o-transcribe,gpt-4o-mini-transcribe, andgpt-4o-transcribe-diarizejoin the olderwhisper-1. Diarization (speaker identification) is now available via the standard API.
5. What are the current limits?
| Limit | Record Mode | Speech-to-Text API |
|---|---|---|
| Session/file cap | 120 minutes per recording | 25 MB per file upload |
| Platform | macOS desktop app only | Any platform (REST) |
| Plan required | Plus, Pro, Business, Enterprise, Edu | API key with billing |
| Language performance | Works best in English today; other languages supported | See supported languages list |
| Translations | N/A | English-only output (translates any language to English) |
| Connectivity | Internet required | Internet required |
For files larger than 25 MB, OpenAI recommends splitting audio or using compressed formats. See the official speech-to-text guide for handling longer inputs. For a broader look at which apps work without connectivity, see our airplane-mode test of 7 popular transcription tools.
6. ChatGPT transcription: key pros
- End-to-end convenience: Record, transcribe, and auto-summarize in one place; outputs can be turned into tasks, emails, or plans.
- Speaker diarization: Record Mode supports multiple speakers; the API offers
gpt-4o-transcribe-diarizefor speaker identification. - Enterprise-friendly controls: Business/Enterprise/Edu content is excluded from training by default; admins can control or disable Record Mode.
- API audio endpoints have no retention: The transcription and translation endpoints currently retain no data for training, abuse monitoring, or application state.
- Multilingual: The speech-to-text models support a wide range of languages, though accuracy varies.
7. Where ChatGPT transcription breaks down
- Cloud dependency: Audio and transcripts are processed on OpenAI servers; internet is required. No airplane mode, no dead-zone productivity.
- Record history feeds future responses: When "reference record history" is enabled, past transcripts can be used in later conversations — meaning sensitive content from one meeting could surface in a later context.
- Consumer training risk: Consumer ChatGPT content may be used to train models unless you actively opt out. Many users never change this setting.
- Legal holds override deletion: Active litigation can suspend normal deletion timelines for all user data — a risk that is hard to predict or control.
- Platform lock-in: Record Mode is macOS-only — no Windows, no iOS, no web. The 120-minute cap also means long meetings need multiple sessions.
- File size constraints: The API caps uploads at 25 MB. Translations output English only.
8. Who should not use ChatGPT transcription?
If any of the following apply to your work, cloud-based transcription introduces risk that may outweigh the convenience:
- Lawyers and legal teams: Attorney-client privilege requires strict control over where communications are stored and who can access them. Sending client recordings to third-party servers creates exposure. See our guide on secure offline transcription for lawyers.
- Clinicians and therapists: HIPAA's Security Rule requires administrative, physical, and technical safeguards for electronic protected health information (ePHI). Keeping patient audio entirely on-device simplifies compliance. Read more about on-device transcription for healthcare and therapy notes.
- Journalists: Source protection is foundational to investigative work. Uploading interview recordings to cloud servers creates a discoverable copy outside your control.
- Finance and compliance teams: Regulatory environments (SOX, FINRA, GDPR) often mandate strict data residency and minimization. Any cloud processing adds a third party to your data chain.
- Anyone with unreliable connectivity: If you work in the field, on flights, in hospitals, or in areas with poor internet, a cloud-dependent tool simply won't work when you need it most.
The cost of getting this wrong is real: IBM's 2025 Cost of a Data Breach Report puts the global average at US $4.4 million. OWASP's Mobile Top 10 continues to rank insecure communication and data leakage among the most common mobile security risks.
9. The private offline alternative: VoiceScriber (100% on-device)
If your requirement is "no cloud, ever", choose a tool built for it.
VoiceScriber:
- 100% offline, on-device transcription — works in airplane mode; never sends any recording or data to any server.
- 100+ languages supported offline.
- No session caps, no file size limits — bounded only by your device storage.
- No account required — no sign-up, no cloud sync, no data collection.
VoiceScriber is purpose-built for privacy-critical workflows. All audio and transcripts remain on your iPhone unless you explicitly export or share them. There is no server to breach, no retention policy to worry about, and no legal hold that can compel a third party to hand over your data.
Need guaranteed offline privacy?
VoiceScriber works 100% offline in 100+ languages and never sends any recording or data to cloud servers.
Download VoiceScriber10. At-a-glance: ChatGPT transcription vs. VoiceScriber
| Factor | ChatGPT (Record Mode / API) | VoiceScriber (Offline Alternative) |
|---|---|---|
| Connectivity | Internet required (cloud) | No internet needed (airplane mode OK) |
| Where processing happens | OpenAI servers | On your device |
| Training usage | Consumer ChatGPT may train on content unless you opt out; Enterprise/Edu excluded by default; API audio endpoints not used for training | Never uploads; nothing to train on |
| Data retention | Audio endpoints: no retention; Record Mode transcripts: workspace settings (~30 days after deletion); legal holds may extend | Local only until you export |
| Current models | gpt-4o-transcribe, gpt-4o-mini-transcribe, whisper-1, gpt-4o-transcribe-diarize |
On-device Whisper-based engine |
| Languages | See supported languages; English best for Record Mode today | 100+ languages (all offline) |
| Session/file limits | Record Mode: 120-min cap; API: 25 MB file limit | Device-bound (no server caps) |
| Platform | Record Mode: macOS only; API: any platform | iPhone (iOS) |
| Best for | Integrated AI summaries and teamwork in cloud-friendly orgs | Maximum privacy and offline reliability |
11. If you must use ChatGPT for transcription, harden your setup
- Turn off training (consumer): Opt out via OpenAI's privacy controls in Settings.
- Prefer Enterprise/Edu/Business: These tiers exclude content from training by default; admins can disable Record Mode entirely.
- Disable record history: Turn off "reference record history" so past transcripts don't feed future conversations.
- Use the API for sensitive audio: The
/v1/audio/transcriptionsendpoint currently has no retention — a better data story than Record Mode for one-off transcriptions. - Mind legal holds: Understand that active litigation can override all deletion timelines.
- Avoid PHI/PII where possible or de-identify content to reduce risk. HIPAA still applies regardless of the tool you use.
12. How we tested
To write this guide, we tested ChatGPT transcription (Record Mode and the speech-to-text API) and VoiceScriber side by side across five real-world scenarios:
- Quiet English memo — a solo voice note recorded in a silent room, approximately 3 minutes.
- Noisy cafe environment — a voice recording captured in a busy coffee shop with background chatter, music, and espresso machine noise.
- Two-speaker meeting — a simulated two-person meeting to test speaker separation and overlapping speech handling.
- Accented English — recordings from speakers with non-native English accents (Turkish, German) to evaluate robustness.
- Non-English audio — clips in Turkish, Spanish, and Japanese to compare multilingual accuracy and offline language coverage.
For each scenario, we compared accuracy, latency, and whether the tool worked without an internet connection. VoiceScriber was tested in airplane mode throughout. ChatGPT required a stable Wi-Fi connection for every test.
This is not a formal benchmark — it is a practical, hands-on evaluation designed to reflect how these tools perform in the situations most readers actually face.
FAQs
Does ChatGPT transcription work offline?
No. ChatGPT transcription (Record Mode and the API) is cloud-based and requires internet. All processing happens on OpenAI's servers.
Does OpenAI train on my transcripts?
- Consumer ChatGPT: Content may be used to improve models unless you opt out.
- Enterprise/Edu: Excluded from training by default.
- API audio endpoints: Not used for training, with no abuse-monitoring retention for
/v1/audio/transcriptionsand/v1/audio/translations.
How long does OpenAI keep my audio data?
It depends on the product. Record Mode deletes audio after transcription; transcripts follow workspace retention and are removed within 30 days after deletion. The API audio endpoints (/v1/audio/transcriptions, /v1/audio/translations) currently have no retention across any category. However, legal holds can override all deletion timelines during active litigation.
What languages does ChatGPT transcription support?
The speech-to-text API supports a wide range of languages across its models (gpt-4o-transcribe, whisper-1, etc.). Record Mode works best in English today. Accuracy varies by language and model.
Can ChatGPT Record Mode reference my past recordings?
Yes, when "reference record history" is enabled. Past recording transcripts and canvases can be referenced in later conversations. You can disable this in your settings.
What's the best offline alternative to ChatGPT transcription?
VoiceScriber. It works 100% offline in 100+ languages and never sends any recording or data to any server. All processing happens on your iPhone.