- Remove speaker button empty space (2-button centered layout) - Remove "Asteria voice" text from voice call screen - Fix chat input visibility with keyboard - Add keyboard show listener for auto-scroll 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
13 KiB
13 KiB
FEATURE-002: LiveKit Voice Call with Julia AI
Summary
Полноценный голосовой звонок с Julia AI через LiveKit Cloud. Пользователь нажимает кнопку "Start Voice Call", открывается экран звонка в стиле телефона, и он может разговаривать с Julia AI голосом.
Status: 🔴 Not Started (требуется полная переделка)
Priority: Critical
Problem Statement
Текущая реализация имеет следующие проблемы:
- STT (Speech-to-Text) работает нестабильно — микрофон иногда детектируется, иногда нет
- TTS работает — голос Julia слышен
- Код сложный и запутанный — много legacy кода, полифиллов, хаков
- Нет четкой архитектуры — все в одном файле voice-call.tsx
Root Cause Analysis
Почему микрофон работает нестабильно:
- iOS AudioSession — неправильная конфигурация или race condition при настройке
- registerGlobals() — WebRTC polyfills могут не успевать инициализироваться
- Permissions — микрофон может быть не разрешен или занят другим процессом
- Event handling — события LiveKit могут теряться
Что работает:
- LiveKit Cloud connection ✅
- Token generation ✅
- TTS (Deepgram Asteria) ✅
- Backend agent (Julia AI) ✅
Architecture
System Overview
┌─────────────────────────────────────────────────────────────────────┐
│ WellNuo Lite App (iOS) │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Voice Tab │───▶│ VoiceCallScreen │───▶│ LiveKit Room │ │
│ │ (entry) │ │ (fullscreen) │ │ (WebRTC) │ │
│ └──────────────┘ └──────────────────┘ └──────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │useLiveKitRoom│ │ AudioSession │ │
│ │ (hook) │ │ (iOS native) │ │
│ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
│
│ WebSocket + WebRTC
▼
┌─────────────────────────────────────────────────────────────────────┐
│ LiveKit Cloud │
├─────────────────────────────────────────────────────────────────────┤
│ Room: wellnuo-{userId}-{timestamp} │
│ Participants: user + julia-agent │
│ Audio Tracks: bidirectional │
└─────────────────────────────────────────────────────────────────────┘
│
│ Agent dispatch
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Julia AI Agent (Python) │
├─────────────────────────────────────────────────────────────────────┤
│ STT: Deepgram Nova-2 │
│ LLM: WellNuo voice_ask API │
│ TTS: Deepgram Aura Asteria │
│ Framework: LiveKit Agents SDK 1.3.11 │
└─────────────────────────────────────────────────────────────────────┘
Data Flow
User speaks → iOS Mic → WebRTC → LiveKit Cloud → Agent → Deepgram STT
│
▼
WellNuo API (LLM)
│
▼
Agent receives text ← LiveKit Cloud ← WebRTC ← Deepgram TTS (audio)
│
▼
iOS Speaker → User hears Julia
Technical Requirements
Dependencies (package.json)
{
"@livekit/react-native": "^2.x",
"livekit-client": "^2.x",
"expo-keep-awake": "^14.x"
}
iOS Permissions (app.json)
{
"ios": {
"infoPlist": {
"NSMicrophoneUsageDescription": "WellNuo needs microphone access for voice calls with Julia AI",
"UIBackgroundModes": ["audio", "voip"]
}
}
}
Token Server (already exists)
- URL:
https://wellnuo.smartlaunchhub.com/julia/token - Method: POST
- Body:
{ "userId": "string" } - Response:
{ "success": true, "data": { "token", "roomName", "wsUrl" } }
Implementation Steps
Phase 1: Cleanup (DELETE old code)
- 1.1. Delete
app/voice-call.tsx(current broken implementation) - 1.2. Keep
app/(tabs)/voice.tsx(entry point) but simplify - 1.3. Keep
services/livekitService.ts(token fetching) - 1.4. Keep
contexts/VoiceTranscriptContext.tsx(transcript storage) - 1.5. Delete
components/VoiceIndicator.tsx(unused) - 1.6. Delete
polyfills/livekit-globals.ts(not needed with proper setup)
Phase 2: New Architecture
- 2.1. Create
hooks/useLiveKitRoom.ts— encapsulate all LiveKit logic - 2.2. Create
app/voice-call.tsx— simple UI component using the hook - 2.3. Create
utils/audioSession.ts— iOS AudioSession helper
Phase 3: useLiveKitRoom Hook
File: hooks/useLiveKitRoom.ts
interface UseLiveKitRoomOptions {
userId: string;
onTranscript?: (role: 'user' | 'assistant', text: string) => void;
}
interface UseLiveKitRoomReturn {
// Connection state
state: 'idle' | 'connecting' | 'connected' | 'reconnecting' | 'disconnected' | 'error';
error: string | null;
// Call info
roomName: string | null;
callDuration: number; // seconds
// Audio state
isMuted: boolean;
isSpeaking: boolean; // agent is speaking
// Actions
connect: () => Promise<void>;
disconnect: () => Promise<void>;
toggleMute: () => void;
}
Implementation requirements:
- MUST call
registerGlobals()BEFORE importinglivekit-client - MUST configure iOS AudioSession BEFORE connecting to room
- MUST handle all RoomEvents properly
- MUST cleanup on unmount (disconnect, stop audio session)
- MUST handle background/foreground transitions
Phase 4: iOS AudioSession Configuration
Critical for microphone to work!
// utils/audioSession.ts
import { AudioSession } from '@livekit/react-native';
import { Platform } from 'react-native';
export async function configureAudioForVoiceCall(): Promise<void> {
if (Platform.OS !== 'ios') return;
// Step 1: Set Apple audio configuration
await AudioSession.setAppleAudioConfiguration({
audioCategory: 'playAndRecord',
audioCategoryOptions: [
'allowBluetooth',
'allowBluetoothA2DP',
'defaultToSpeaker',
'mixWithOthers',
],
audioMode: 'voiceChat',
});
// Step 2: Configure output
await AudioSession.configureAudio({
ios: {
defaultOutput: 'speaker',
},
});
// Step 3: Start session
await AudioSession.startAudioSession();
}
export async function stopAudioSession(): Promise<void> {
if (Platform.OS !== 'ios') return;
await AudioSession.stopAudioSession();
}
Phase 5: Voice Call Screen UI
File: app/voice-call.tsx
Simple, clean UI:
- Avatar with Julia "J" letter
- Call duration timer
- Status text (Connecting... / Connected / Julia is speaking...)
- Mute button
- End call button
- Debug logs toggle (for development)
NO complex logic in this file — all LiveKit logic in the hook!
Phase 6: Testing Checklist
- 6.1. Fresh app launch → Start call → Can hear Julia greeting
- 6.2. Speak → Julia responds → Conversation works
- 6.3. Mute → Unmute → Still works
- 6.4. End call → Clean disconnect
- 6.5. App to background → Audio continues
- 6.6. App to foreground → Still connected
- 6.7. Multiple calls in a row → No memory leaks
- 6.8. No microphone permission → Shows error
Files to Create/Modify
| File | Action | Description |
|---|---|---|
hooks/useLiveKitRoom.ts |
CREATE | Main LiveKit hook with all logic |
utils/audioSession.ts |
CREATE | iOS AudioSession helpers |
app/voice-call.tsx |
REPLACE | Simple UI using the hook |
app/(tabs)/voice.tsx |
SIMPLIFY | Just entry point, remove debug UI |
services/livekitService.ts |
KEEP | Token fetching (already works) |
contexts/VoiceTranscriptContext.tsx |
KEEP | Transcript storage |
components/VoiceIndicator.tsx |
DELETE | Not needed |
polyfills/livekit-globals.ts |
DELETE | Not needed |
Key Principles
1. Separation of Concerns
- Hook handles ALL LiveKit/WebRTC logic
- Screen only renders UI based on hook state
- Utils for platform-specific code (AudioSession)
2. Proper Initialization Order
1. registerGlobals() — WebRTC polyfills
2. configureAudioForVoiceCall() — iOS audio
3. getToken() — fetch from server
4. room.connect() — connect to LiveKit
5. room.localParticipant.setMicrophoneEnabled(true) — enable mic
3. Proper Cleanup Order
1. room.disconnect() — leave room
2. stopAudioSession() — release iOS audio
3. Clear all refs and state
4. Error Handling
- Every async operation wrapped in try/catch
- User-friendly error messages
- Automatic retry for network issues
- Graceful degradation
Success Criteria
- ✅ User can start voice call and hear Julia greeting
- ✅ User can speak and Julia understands (STT works reliably)
- ✅ Julia responds with voice (TTS works)
- ✅ Conversation can continue back and forth
- ✅ Mute/unmute works
- ✅ End call cleanly disconnects
- ✅ No console errors or warnings
- ✅ Works on iOS device (not just simulator)
Related Links
Notes
Why previous approach failed:
- Too much code in one file — voice-call.tsx had 900+ lines with all logic mixed
- Polyfills applied wrong — Event class polyfill was inside the component
- AudioSession configured too late — sometimes after connect() already started
- No proper error boundaries — errors silently failed
- Race conditions — multiple async operations without proper sequencing
What's different this time:
- Hook-based architecture — single source of truth for state
- Proper initialization sequence — documented and enforced
- Clean separation — UI knows nothing about WebRTC
- Comprehensive logging — every step logged for debugging
- Test-driven — write tests before implementation