๐๏ธ Transcription
Audio โ text, in-browser, HIPAA-grade. Whisper WASM for clinical sessions; cloud Whisper for high-fidelity batch jobs. Audio never leaves the device unless the user explicitly opts into cloud upgrade.
๐๏ธ Three-Path Transcription
| Path | Engine | When | Quality | Privacy |
|---|---|---|---|---|
| In-browser | Whisper WASM (Whisper.cpp compiled) | Default for live dictation, AAC voice input | Excellent for clear English/Spanish/French; noise-sensitive | โ Audio never leaves device |
| Server cloud | OpenAI Whisper-large-v3 via /api/v1/transcribe | Long-form batch jobs (1h+ session recordings); user opts in | Best โ handles noise + accents + medical terminology | Audit-logged; HIPAA BAA in place |
| Live SOAP dictation | WhisperX (word-aligned timestamps) | Clinician dictating SOAP notes during session | Excellent + word timestamps for speaker-attribution | In-browser by default |

๐ฉบ Live SOAP Dictation
The headline transcription use case. A clinician opens the session note, presses dictate, speaks the session out loud, and Synalux:
- Transcribes locally (WASM) with word timestamps.
- Identifies sections (Subjective / Objective / Assessment / Plan) by pattern.
- Extracts ABC data (Antecedent / Behavior / Consequence) from natural-language descriptions.
- Drafts the structured note for one-click sign-off.

See Applied Behavior Analysis and Clinical Notes Documentation for the downstream flow.
๐ฃ๏ธ AAC Voice Input
For Prism AAC users who can speak some words but use AAC for harder utterances:
- Whisper WASM transcribes speech in-browser, populates the keyboard input.
- Combined with autocorrect (Gemini 2.5 Flash-Lite) for typo recovery.
- Locale-aware: matches the userโs chosen language; supports code-switching (e.g. EN words inside a RO sentence).
๐๏ธ Architecture
POST /api/v1/transcribe Server-side cloud Whisper (long-form, audit-logged)
body: { audio_url | audio_b64, lang?, model?='whisper-large-v3' }
returns: { text, segments[], language, duration_ms }In-browser path (services/whisperService.ts):
- Whisper WASM model loaded lazily on first dictation use (~30MB cached in IndexedDB).
- WhisperX add-on (~10MB) loaded for word timestamps when user enables โspeaker trackingโ.
- Audio captured via MediaRecorder API โ fed in chunks to the WASM transcoder.
โ๏ธ HIPAA + Privacy
- In-browser default โ audio bytes never traverse Synalux infrastructure.
- Cloud upgrade requires explicit consent โ UI shows a one-time consent gate per session before audio uploads.
- Audit logging โ every cloud transcription writes to
transcription_auditwith user, session, audio duration, model used. - No retention โ server-side audio bytes deleted within 24h of transcription; only the text result + audit row persist.
๐ณ Plans
| Free | Standard | Advanced | Enterprise | |
|---|---|---|---|---|
| In-browser Whisper (live dictation) | โ | โ | โ | โ |
| AAC voice input (in-browser) | โ | โ | โ | โ |
| Cloud Whisper-large-v3 (long-form) | โ | โ 30 min/mo | โ 5 hr/mo | โ unlimited |
| Speaker-attribution (WhisperX) | โ | โ | โ | โ |
| Custom medical vocabulary boost | โ | โ | โ | โ |
๐ Inter-Module Integration
- SOAP / Clinical Notes โ primary consumer; live dictation flow.
- Prism AAC โ voice input โ keyboard pre-fill.
- Telehealth โ in-call live captions + recording-time transcription.
- Mail โ voice replies (record โ transcribe โ edit โ send).
- Translation โ transcribed text can pipe directly into the Translation module.