# FEATURE-002: LiveKit Voice Call with Julia AI ## Summary Полноценный голосовой звонок с Julia AI через LiveKit Cloud. Пользователь нажимает кнопку "Start Voice Call", открывается экран звонка в стиле телефона, и он может разговаривать с Julia AI голосом. ## Status: 🔴 Not Started (требуется полная переделка) ## Priority: Critical ## Problem Statement Текущая реализация имеет следующие проблемы: 1. **STT (Speech-to-Text) работает нестабильно** — микрофон иногда детектируется, иногда нет 2. **TTS работает** — голос Julia слышен 3. **Код сложный и запутанный** — много legacy кода, полифиллов, хаков 4. **Нет четкой архитектуры** — все в одном файле voice-call.tsx ## Root Cause Analysis ### Почему микрофон работает нестабильно: 1. **iOS AudioSession** — неправильная конфигурация или race condition при настройке 2. **registerGlobals()** — WebRTC polyfills могут не успевать инициализироваться 3. **Permissions** — микрофон может быть не разрешен или занят другим процессом 4. **Event handling** — события LiveKit могут теряться ### Что работает: - LiveKit Cloud connection ✅ - Token generation ✅ - TTS (Deepgram Asteria) ✅ - Backend agent (Julia AI) ✅ --- ## Architecture ### System Overview ``` ┌─────────────────────────────────────────────────────────────────────┐ │ WellNuo Lite App (iOS) │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Voice Tab │───▶│ VoiceCallScreen │───▶│ LiveKit Room │ │ │ │ (entry) │ │ (fullscreen) │ │ (WebRTC) │ │ │ └──────────────┘ └──────────────────┘ └──────────────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │useLiveKitRoom│ │ AudioSession │ │ │ │ (hook) │ │ (iOS native) │ │ │ └──────────────┘ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ WebSocket + WebRTC ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ LiveKit Cloud │ ├─────────────────────────────────────────────────────────────────────┤ │ Room: wellnuo-{userId}-{timestamp} │ │ Participants: user + julia-agent │ │ Audio Tracks: bidirectional │ └─────────────────────────────────────────────────────────────────────┘ │ │ Agent dispatch ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ Julia AI Agent (Python) │ ├─────────────────────────────────────────────────────────────────────┤ │ STT: Deepgram Nova-2 │ │ LLM: WellNuo voice_ask API │ │ TTS: Deepgram Aura Asteria │ │ Framework: LiveKit Agents SDK 1.3.11 │ └─────────────────────────────────────────────────────────────────────┘ ``` ### Data Flow ``` User speaks → iOS Mic → WebRTC → LiveKit Cloud → Agent → Deepgram STT │ ▼ WellNuo API (LLM) │ ▼ Agent receives text ← LiveKit Cloud ← WebRTC ← Deepgram TTS (audio) │ ▼ iOS Speaker → User hears Julia ``` --- ## Technical Requirements ### Dependencies (package.json) ```json { "@livekit/react-native": "^2.x", "livekit-client": "^2.x", "expo-keep-awake": "^14.x" } ``` ### iOS Permissions (app.json) ```json { "ios": { "infoPlist": { "NSMicrophoneUsageDescription": "WellNuo needs microphone access for voice calls with Julia AI", "UIBackgroundModes": ["audio", "voip"] } } } ``` ### Token Server (already exists) - **URL**: `https://wellnuo.smartlaunchhub.com/julia/token` - **Method**: POST - **Body**: `{ "userId": "string" }` - **Response**: `{ "success": true, "data": { "token", "roomName", "wsUrl" } }` --- ## Implementation Steps ### Phase 1: Cleanup (DELETE old code) - [ ] 1.1. Delete `app/voice-call.tsx` (current broken implementation) - [ ] 1.2. Keep `app/(tabs)/voice.tsx` (entry point) but simplify - [ ] 1.3. Keep `services/livekitService.ts` (token fetching) - [ ] 1.4. Keep `contexts/VoiceTranscriptContext.tsx` (transcript storage) - [ ] 1.5. Delete `components/VoiceIndicator.tsx` (unused) - [ ] 1.6. Delete `polyfills/livekit-globals.ts` (not needed with proper setup) ### Phase 2: New Architecture - [ ] 2.1. Create `hooks/useLiveKitRoom.ts` — encapsulate all LiveKit logic - [ ] 2.2. Create `app/voice-call.tsx` — simple UI component using the hook - [ ] 2.3. Create `utils/audioSession.ts` — iOS AudioSession helper ### Phase 3: useLiveKitRoom Hook **File**: `hooks/useLiveKitRoom.ts` ```typescript interface UseLiveKitRoomOptions { userId: string; onTranscript?: (role: 'user' | 'assistant', text: string) => void; } interface UseLiveKitRoomReturn { // Connection state state: 'idle' | 'connecting' | 'connected' | 'reconnecting' | 'disconnected' | 'error'; error: string | null; // Call info roomName: string | null; callDuration: number; // seconds // Audio state isMuted: boolean; isSpeaking: boolean; // agent is speaking // Actions connect: () => Promise; disconnect: () => Promise; toggleMute: () => void; } ``` **Implementation requirements**: 1. MUST call `registerGlobals()` BEFORE importing `livekit-client` 2. MUST configure iOS AudioSession BEFORE connecting to room 3. MUST handle all RoomEvents properly 4. MUST cleanup on unmount (disconnect, stop audio session) 5. MUST handle background/foreground transitions ### Phase 4: iOS AudioSession Configuration **Critical for microphone to work!** ```typescript // utils/audioSession.ts import { AudioSession } from '@livekit/react-native'; import { Platform } from 'react-native'; export async function configureAudioForVoiceCall(): Promise { if (Platform.OS !== 'ios') return; // Step 1: Set Apple audio configuration await AudioSession.setAppleAudioConfiguration({ audioCategory: 'playAndRecord', audioCategoryOptions: [ 'allowBluetooth', 'allowBluetoothA2DP', 'defaultToSpeaker', 'mixWithOthers', ], audioMode: 'voiceChat', }); // Step 2: Configure output await AudioSession.configureAudio({ ios: { defaultOutput: 'speaker', }, }); // Step 3: Start session await AudioSession.startAudioSession(); } export async function stopAudioSession(): Promise { if (Platform.OS !== 'ios') return; await AudioSession.stopAudioSession(); } ``` ### Phase 5: Voice Call Screen UI **File**: `app/voice-call.tsx` Simple, clean UI: - Avatar with Julia "J" letter - Call duration timer - Status text (Connecting... / Connected / Julia is speaking...) - Mute button - End call button - Debug logs toggle (for development) **NO complex logic in this file** — all LiveKit logic in the hook! ### Phase 6: Testing Checklist - [ ] 6.1. Fresh app launch → Start call → Can hear Julia greeting - [ ] 6.2. Speak → Julia responds → Conversation works - [ ] 6.3. Mute → Unmute → Still works - [ ] 6.4. End call → Clean disconnect - [ ] 6.5. App to background → Audio continues - [ ] 6.6. App to foreground → Still connected - [ ] 6.7. Multiple calls in a row → No memory leaks - [ ] 6.8. No microphone permission → Shows error --- ## Files to Create/Modify | File | Action | Description | |------|--------|-------------| | `hooks/useLiveKitRoom.ts` | CREATE | Main LiveKit hook with all logic | | `utils/audioSession.ts` | CREATE | iOS AudioSession helpers | | `app/voice-call.tsx` | REPLACE | Simple UI using the hook | | `app/(tabs)/voice.tsx` | SIMPLIFY | Just entry point, remove debug UI | | `services/livekitService.ts` | KEEP | Token fetching (already works) | | `contexts/VoiceTranscriptContext.tsx` | KEEP | Transcript storage | | `components/VoiceIndicator.tsx` | DELETE | Not needed | | `polyfills/livekit-globals.ts` | DELETE | Not needed | --- ## Key Principles ### 1. Separation of Concerns - **Hook** handles ALL LiveKit/WebRTC logic - **Screen** only renders UI based on hook state - **Utils** for platform-specific code (AudioSession) ### 2. Proper Initialization Order ``` 1. registerGlobals() — WebRTC polyfills 2. configureAudioForVoiceCall() — iOS audio 3. getToken() — fetch from server 4. room.connect() — connect to LiveKit 5. room.localParticipant.setMicrophoneEnabled(true) — enable mic ``` ### 3. Proper Cleanup Order ``` 1. room.disconnect() — leave room 2. stopAudioSession() — release iOS audio 3. Clear all refs and state ``` ### 4. Error Handling - Every async operation wrapped in try/catch - User-friendly error messages - Automatic retry for network issues - Graceful degradation --- ## Success Criteria 1. ✅ User can start voice call and hear Julia greeting 2. ✅ User can speak and Julia understands (STT works reliably) 3. ✅ Julia responds with voice (TTS works) 4. ✅ Conversation can continue back and forth 5. ✅ Mute/unmute works 6. ✅ End call cleanly disconnects 7. ✅ No console errors or warnings 8. ✅ Works on iOS device (not just simulator) --- ## Related Links - [LiveKit React Native SDK](https://docs.livekit.io/client-sdk-js/react-native/) - [LiveKit Agents Python](https://docs.livekit.io/agents/) - [Deepgram STT/TTS](https://deepgram.com/) - [iOS AVAudioSession](https://developer.apple.com/documentation/avfaudio/avaudiosession) --- ## Notes ### Why previous approach failed: 1. **Too much code in one file** — voice-call.tsx had 900+ lines with all logic mixed 2. **Polyfills applied wrong** — Event class polyfill was inside the component 3. **AudioSession configured too late** — sometimes after connect() already started 4. **No proper error boundaries** — errors silently failed 5. **Race conditions** — multiple async operations without proper sequencing ### What's different this time: 1. **Hook-based architecture** — single source of truth for state 2. **Proper initialization sequence** — documented and enforced 3. **Clean separation** — UI knows nothing about WebRTC 4. **Comprehensive logging** — every step logged for debugging 5. **Test-driven** — write tests before implementation