Can ChatGPT Transcribe Audio? Limits, Privacy & the Best Offline Alternative (2026)

Q: Does OpenAI train on my transcripts?

Consumer ChatGPT may train on content unless you opt out via privacy controls. Enterprise/Edu is excluded from training by default. The API audio endpoints (/v1/audio/transcriptions and /v1/audio/translations) have no training and no abuse-monitoring retention.

Q: How long does OpenAI keep my audio data?

Record Mode deletes audio after transcription; transcripts follow workspace retention and are removed within 30 days after deletion. The API audio endpoints currently have no retention across any category. Legal holds can override all deletion timelines.

Q: Can ChatGPT Record Mode reference my past recordings?

Yes, when 'reference record history' is enabled, past recording transcripts and canvases can be referenced in later conversations. You can disable this in your settings.

Q: What's the best offline alternative to ChatGPT transcription?

VoiceScriber. It works 100% offline in 100+ languages and never sends any recording or data to any server. All processing happens on your iPhone.

Can ChatGPT transcribe audio in 2026? Yes — via Record Mode (macOS desktop app) and the OpenAI speech-to-text API — but both are cloud-based, require internet, and how your data is handled depends on your product tier and settings. If you need guaranteed offline, on-device transcription, VoiceScriber is the alternative: 100% offline in 100+ languages, and it never sends any recording or data to any server.

TL;DR

ChatGPT can transcribe audio via Record Mode (macOS) and the speech-to-text API (models: gpt-4o-transcribe, gpt-4o-mini-transcribe, whisper-1, gpt-4o-transcribe-diarize). It's powerful — summaries, action items, speaker diarization — but remains cloud-based (internet required). Audio and transcripts are processed on OpenAI's servers. Consumer ChatGPT may use content for training unless you opt out; Enterprise/Edu content is excluded by default. The audio transcription API endpoints (/v1/audio/transcriptions and /v1/audio/translations) currently have no training, no abuse-monitoring retention, and no application-state retention. If your requirement is "no audio leaves the device — ever", VoiceScriber is the alternative: 100% offline, on-device transcription in 100+ languages.

Can ChatGPT transcribe audio?
Record Mode vs API vs Realtime: what's the difference?
What data does OpenAI delete, retain, or use for training?
What changed in 2026?
What are the current limits?
ChatGPT transcription: key pros
Where ChatGPT transcription breaks down
Who should not use ChatGPT transcription?
The private offline alternative: VoiceScriber
At-a-glance comparison table
If you must use ChatGPT, harden your setup
How we tested
Related articles
FAQs

1. Can ChatGPT transcribe audio?

Yes. As of 2026, OpenAI offers two main ways to transcribe audio with ChatGPT:

ChatGPT Record Mode — a feature in the macOS desktop app that live-transcribes meetings and notes, then creates a private "canvas" with summaries and action items.
OpenAI Speech-to-Text API — a cloud API endpoint that accepts audio files and returns transcriptions using models like gpt-4o-transcribe, whisper-1, and others.

Both methods require an internet connection and process audio on OpenAI's servers. Neither works offline.

2. Record Mode vs OpenAI API vs Realtime: what's the difference?

Feature	Record Mode	Speech-to-Text API	Realtime API
Platform	macOS desktop app only	Any platform (REST API)	Any platform (WebSocket)
Access	Plus, Pro, Business, Enterprise, Edu	API key (paid usage)	API key (paid usage)
Current models	Internal (not user-selectable)	`gpt-4o-transcribe`, `gpt-4o-mini-transcribe`, `whisper-1`, `gpt-4o-transcribe-diarize`	Realtime-specific models
Use case	Live meeting recording with auto-summaries	Batch transcription of audio files	Streaming, low-latency voice apps
Speaker diarization	Supports multiple speakers	Via `gpt-4o-transcribe-diarize`	Depends on implementation
Data handling	Audio deleted after transcription; transcripts follow workspace retention settings	No retention for audio endpoints	Per API data controls

For a detailed comparison of models, supported languages, and file formats, see OpenAI's speech-to-text documentation.

3. What data does OpenAI delete, retain, or use for training?

This is the most common concern — and the answer depends on which product you use:

Record Mode (ChatGPT macOS app)

Audio recordings are deleted after transcription.
Transcripts and canvases follow your workspace retention settings and are removed within 30 days after deletion unless legally required to retain them.
Record history: When enabled, Record Mode can reference past recording transcripts and canvases in later conversations. This is an important privacy nuance — your past recordings may inform future ChatGPT responses within the same workspace.
Consumer ChatGPT (Plus/Pro): Transcripts may be used to train models unless you opt out via privacy controls.
Enterprise/Edu: Excluded from training by default; admins can disable Record Mode entirely.

Speech-to-Text API

The /v1/audio/transcriptions and /v1/audio/translations endpoints currently have no training, no abuse-monitoring retention, and no application-state retention — a significant privacy upgrade compared with many other API endpoints that retain data for up to 30 days.
API data is not used for training by default.

Legal holds

Active litigation can override normal deletion timelines. For example, litigation-related data preservation orders may require OpenAI to retain data beyond the standard windows described above.

Key takeaway: The API audio endpoints have the cleanest data story (no retention), while Record Mode transcripts persist in your workspace and may feed future conversations. Neither option keeps data fully on your device.

4. What changed in 2026?

Record Mode is still macOS-only. No Windows, iOS, or web version as of March 2026.
The 120-minute session cap remains. Longer meetings still need to be split across sessions.
Record history can now feed future responses. When "reference record history" is enabled, past transcripts and canvases may be used in later ChatGPT conversations — a meaningful privacy consideration.
The API data story is more nuanced than "30-day retention." The audio transcription endpoints (/v1/audio/transcriptions, /v1/audio/translations) currently show no retention across any category, while other endpoints may still retain data for abuse monitoring.
New transcription models: gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-transcribe-diarize join the older whisper-1. Diarization (speaker identification) is now available via the standard API.

5. What are the current limits?

Limit	Record Mode	Speech-to-Text API
Session/file cap	120 minutes per recording	25 MB per file upload
Platform	macOS desktop app only	Any platform (REST)
Plan required	Plus, Pro, Business, Enterprise, Edu	API key with billing
Language performance	Works best in English today; other languages supported	See supported languages list
Translations	N/A	English-only output (translates any language to English)
Connectivity	Internet required	Internet required

For files larger than 25 MB, OpenAI recommends splitting audio or using compressed formats. See the official speech-to-text guide for handling longer inputs. For a broader look at which apps work without connectivity, see our airplane-mode test of 7 popular transcription tools.

6. ChatGPT transcription: key pros

End-to-end convenience: Record, transcribe, and auto-summarize in one place; outputs can be turned into tasks, emails, or plans.
Speaker diarization: Record Mode supports multiple speakers; the API offers gpt-4o-transcribe-diarize for speaker identification.
Enterprise-friendly controls: Business/Enterprise/Edu content is excluded from training by default; admins can control or disable Record Mode.
API audio endpoints have no retention: The transcription and translation endpoints currently retain no data for training, abuse monitoring, or application state.
Multilingual: The speech-to-text models support a wide range of languages, though accuracy varies.

7. Where ChatGPT transcription breaks down

Cloud dependency: Audio and transcripts are processed on OpenAI servers; internet is required. No airplane mode, no dead-zone productivity.
Record history feeds future responses: When "reference record history" is enabled, past transcripts can be used in later conversations — meaning sensitive content from one meeting could surface in a later context.
Consumer training risk: Consumer ChatGPT content may be used to train models unless you actively opt out. Many users never change this setting.
Legal holds override deletion: Active litigation can suspend normal deletion timelines for all user data — a risk that is hard to predict or control.
Platform lock-in: Record Mode is macOS-only — no Windows, no iOS, no web. The 120-minute cap also means long meetings need multiple sessions.
File size constraints: The API caps uploads at 25 MB. Translations output English only.

8. Who should not use ChatGPT transcription?

If any of the following apply to your work, cloud-based transcription introduces risk that may outweigh the convenience:

Lawyers and legal teams: Attorney-client privilege requires strict control over where communications are stored and who can access them. Sending client recordings to third-party servers creates exposure. See our guide on secure offline transcription for lawyers.
Clinicians and therapists: HIPAA's Security Rule requires administrative, physical, and technical safeguards for electronic protected health information (ePHI). Keeping patient audio entirely on-device simplifies compliance. Read more about on-device transcription for healthcare and therapy notes.
Journalists: Source protection is foundational to investigative work. Uploading interview recordings to cloud servers creates a discoverable copy outside your control.
Finance and compliance teams: Regulatory environments (SOX, FINRA, GDPR) often mandate strict data residency and minimization. Any cloud processing adds a third party to your data chain.
Anyone with unreliable connectivity: If you work in the field, on flights, in hospitals, or in areas with poor internet, a cloud-dependent tool simply won't work when you need it most.

The cost of getting this wrong is real: IBM's 2025 Cost of a Data Breach Report puts the global average at US $4.4 million. OWASP's Mobile Top 10 continues to rank insecure communication and data leakage among the most common mobile security risks.

9. The private offline alternative: VoiceScriber (100% on-device)

If your requirement is "no cloud, ever", choose a tool built for it.

VoiceScriber:

100% offline, on-device transcription — works in airplane mode; never sends any recording or data to any server.
100+ languages supported offline.
No session caps, no file size limits — bounded only by your device storage.
No account required — no sign-up, no cloud sync, no data collection.

VoiceScriber is purpose-built for privacy-critical workflows. All audio and transcripts remain on your iPhone unless you explicitly export or share them. There is no server to breach, no retention policy to worry about, and no legal hold that can compel a third party to hand over your data.

Need guaranteed offline privacy?

VoiceScriber works 100% offline in 100+ languages and never sends any recording or data to cloud servers.

Download VoiceScriber

10. At-a-glance: ChatGPT transcription vs. VoiceScriber

Factor	ChatGPT (Record Mode / API)	VoiceScriber (Offline Alternative)
Connectivity	Internet required (cloud)	No internet needed (airplane mode OK)
Where processing happens	OpenAI servers	On your device
Training usage	Consumer ChatGPT may train on content unless you opt out; Enterprise/Edu excluded by default; API audio endpoints not used for training	Never uploads; nothing to train on
Data retention	Audio endpoints: no retention; Record Mode transcripts: workspace settings (~30 days after deletion); legal holds may extend	Local only until you export
Current models	`gpt-4o-transcribe`, `gpt-4o-mini-transcribe`, `whisper-1`, `gpt-4o-transcribe-diarize`	On-device Whisper-based engine
Languages	See supported languages; English best for Record Mode today	100+ languages (all offline)
Session/file limits	Record Mode: 120-min cap; API: 25 MB file limit	Device-bound (no server caps)
Platform	Record Mode: macOS only; API: any platform	iPhone (iOS)
Best for	Integrated AI summaries and teamwork in cloud-friendly orgs	Maximum privacy and offline reliability

11. If you must use ChatGPT for transcription, harden your setup

Turn off training (consumer): Opt out via OpenAI's privacy controls in Settings.
Prefer Enterprise/Edu/Business: These tiers exclude content from training by default; admins can disable Record Mode entirely.
Disable record history: Turn off "reference record history" so past transcripts don't feed future conversations.
Use the API for sensitive audio: The /v1/audio/transcriptions endpoint currently has no retention — a better data story than Record Mode for one-off transcriptions.
Mind legal holds: Understand that active litigation can override all deletion timelines.
Avoid PHI/PII where possible or de-identify content to reduce risk. HIPAA still applies regardless of the tool you use.

12. How we tested

To write this guide, we tested ChatGPT transcription (Record Mode and the speech-to-text API) and VoiceScriber side by side across five real-world scenarios:

Quiet English memo — a solo voice note recorded in a silent room, approximately 3 minutes.
Noisy cafe environment — a voice recording captured in a busy coffee shop with background chatter, music, and espresso machine noise.
Two-speaker meeting — a simulated two-person meeting to test speaker separation and overlapping speech handling.
Accented English — recordings from speakers with non-native English accents (Turkish, German) to evaluate robustness.
Non-English audio — clips in Turkish, Spanish, and Japanese to compare multilingual accuracy and offline language coverage.

For each scenario, we compared accuracy, latency, and whether the tool worked without an internet connection. VoiceScriber was tested in airplane mode throughout. ChatGPT required a stable Wi-Fi connection for every test.

This is not a formal benchmark — it is a practical, hands-on evaluation designed to reflect how these tools perform in the situations most readers actually face.

FAQs

Does ChatGPT transcription work offline?

No. ChatGPT transcription (Record Mode and the API) is cloud-based and requires internet. All processing happens on OpenAI's servers.

Does OpenAI train on my transcripts?

Consumer ChatGPT: Content may be used to improve models unless you opt out.
Enterprise/Edu: Excluded from training by default.
API audio endpoints: Not used for training, with no abuse-monitoring retention for /v1/audio/transcriptions and /v1/audio/translations.

How long does OpenAI keep my audio data?

It depends on the product. Record Mode deletes audio after transcription; transcripts follow workspace retention and are removed within 30 days after deletion. The API audio endpoints (/v1/audio/transcriptions, /v1/audio/translations) currently have no retention across any category. However, legal holds can override all deletion timelines during active litigation.

What languages does ChatGPT transcription support?

The speech-to-text API supports a wide range of languages across its models (gpt-4o-transcribe, whisper-1, etc.). Record Mode works best in English today. Accuracy varies by language and model.

Can ChatGPT Record Mode reference my past recordings?

Yes, when "reference record history" is enabled. Past recording transcripts and canvases can be referenced in later conversations. You can disable this in your settings.

What's the best offline alternative to ChatGPT transcription?

VoiceScriber. It works 100% offline in 100+ languages and never sends any recording or data to any server. All processing happens on your iPhone.

Can ChatGPT Transcribe Audio in 2026? Limits, Privacy, and the Best Offline Alternative

TL;DR

Table of contents

1. Can ChatGPT transcribe audio?

2. Record Mode vs OpenAI API vs Realtime: what's the difference?

3. What data does OpenAI delete, retain, or use for training?

Record Mode (ChatGPT macOS app)

Speech-to-Text API

Legal holds

4. What changed in 2026?

5. What are the current limits?

6. ChatGPT transcription: key pros

7. Where ChatGPT transcription breaks down

8. Who should not use ChatGPT transcription?

9. The private offline alternative: VoiceScriber (100% on-device)

10. At-a-glance: ChatGPT transcription vs. VoiceScriber

11. If you must use ChatGPT for transcription, harden your setup

12. How we tested

FAQs

Can ChatGPT Transcribe Audio in 2026? Limits, Privacy, and the Best Offline Alternative

TL;DR

Table of contents

1. Can ChatGPT transcribe audio?

2. Record Mode vs OpenAI API vs Realtime: what's the difference?

3. What data does OpenAI delete, retain, or use for training?

Record Mode (ChatGPT macOS app)

Speech-to-Text API

Legal holds

4. What changed in 2026?

5. What are the current limits?

6. ChatGPT transcription: key pros

7. Where ChatGPT transcription breaks down

8. Who should not use ChatGPT transcription?

9. The private offline alternative: VoiceScriber (100% on-device)

10. At-a-glance: ChatGPT transcription vs. VoiceScriber

11. If you must use ChatGPT for transcription, harden your setup

12. How we tested

Related articles

FAQs