๐ Service Health Monitoring
Internal monitoring of every Synalux dependency (database, OAuth providers, LiveKit, Inworld TTS, Anthropic, Gemini, OpenRouter, Stripe). Failures email the admin team and surface a status banner to affected users.
๐ฉบ Whatโs Monitored
- Database โ Postgres / Supabase reachability, replication lag, RLS policy presence.
- OAuth providers โ Google / Microsoft / Telegram / Meta token-refresh path health.
- LiveKit SFU โ TURN reachability, room creation success rate.
- TTS โ Inworld TTS-2 latency + error rate; Azure Neural fallback availability.
- AI โ Anthropic, Gemini, OpenRouter latency + 5xx rate; trips fallback chain when degraded.
- Stripe โ checkout + webhook ingress.
- Storage โ Supabase Storage object writes.
- Mail / SMS / chat providers โ incoming webhook acceptance rate.

๐จ Alert Path
- Email to admin distribution list when a dependency drops below SLO.
- In-app banner to affected users when their experience is degraded โ e.g. โVoice cloning is temporarily unavailable; standard voices still work.โ
- Status page at
synalux.ai/status(planned) for public visibility.

๐ ๏ธ Critical Bug History
- Supabase RLS-disabled critical alert โ caught when a migration accidentally dropped RLS policies on the
patientstable. Auto-detected within 60 seconds; admins paged; rolled back same hour.
๐๏ธ Architecture
GET /api/v1/cron/services-health Aggregate health snapshot (cron-driven)
GET /api/v1/cron/tts-health TTS provider latency + availability
GET /api/v1/cron/chain-health-nightly Nightly deep probe of all dependencies
GET /api/v1/integrations/chain-health Integration health dashboardProbes run every 60 seconds via Vercel Cron; results written to service_health_checks with TTL retention.
๐ณ Plans
Always-on for every workspace. Admin-tier sees the full dashboard; users see degraded-feature banners only.