wellnua-lite-Robert/specs/FEATURE-002-livekit-voice-call.md
Sergei 4b97689dd3 UI improvements: voice call layout and chat keyboard
- Remove speaker button empty space (2-button centered layout)
- Remove "Asteria voice" text from voice call screen
- Fix chat input visibility with keyboard
- Add keyboard show listener for auto-scroll

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-20 11:28:24 -08:00

337 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# FEATURE-002: LiveKit Voice Call with Julia AI
## Summary
Полноценный голосовой звонок с Julia AI через LiveKit Cloud. Пользователь нажимает кнопку "Start Voice Call", открывается экран звонка в стиле телефона, и он может разговаривать с Julia AI голосом.
## Status: 🔴 Not Started (требуется полная переделка)
## Priority: Critical
## Problem Statement
Текущая реализация имеет следующие проблемы:
1. **STT (Speech-to-Text) работает нестабильно** — микрофон иногда детектируется, иногда нет
2. **TTS работает** — голос Julia слышен
3. **Код сложный и запутанный** — много legacy кода, полифиллов, хаков
4. **Нет четкой архитектуры** — все в одном файле voice-call.tsx
## Root Cause Analysis
### Почему микрофон работает нестабильно:
1. **iOS AudioSession** — неправильная конфигурация или race condition при настройке
2. **registerGlobals()** — WebRTC polyfills могут не успевать инициализироваться
3. **Permissions** — микрофон может быть не разрешен или занят другим процессом
4. **Event handling** — события LiveKit могут теряться
### Что работает:
- LiveKit Cloud connection ✅
- Token generation ✅
- TTS (Deepgram Asteria) ✅
- Backend agent (Julia AI) ✅
---
## Architecture
### System Overview
```
┌─────────────────────────────────────────────────────────────────────┐
│ WellNuo Lite App (iOS) │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Voice Tab │───▶│ VoiceCallScreen │───▶│ LiveKit Room │ │
│ │ (entry) │ │ (fullscreen) │ │ (WebRTC) │ │
│ └──────────────┘ └──────────────────┘ └──────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │useLiveKitRoom│ │ AudioSession │ │
│ │ (hook) │ │ (iOS native) │ │
│ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
│ WebSocket + WebRTC
┌─────────────────────────────────────────────────────────────────────┐
│ LiveKit Cloud │
├─────────────────────────────────────────────────────────────────────┤
│ Room: wellnuo-{userId}-{timestamp} │
│ Participants: user + julia-agent │
│ Audio Tracks: bidirectional │
└─────────────────────────────────────────────────────────────────────┘
│ Agent dispatch
┌─────────────────────────────────────────────────────────────────────┐
│ Julia AI Agent (Python) │
├─────────────────────────────────────────────────────────────────────┤
│ STT: Deepgram Nova-2 │
│ LLM: WellNuo voice_ask API │
│ TTS: Deepgram Aura Asteria │
│ Framework: LiveKit Agents SDK 1.3.11 │
└─────────────────────────────────────────────────────────────────────┘
```
### Data Flow
```
User speaks → iOS Mic → WebRTC → LiveKit Cloud → Agent → Deepgram STT
WellNuo API (LLM)
Agent receives text ← LiveKit Cloud ← WebRTC ← Deepgram TTS (audio)
iOS Speaker → User hears Julia
```
---
## Technical Requirements
### Dependencies (package.json)
```json
{
"@livekit/react-native": "^2.x",
"livekit-client": "^2.x",
"expo-keep-awake": "^14.x"
}
```
### iOS Permissions (app.json)
```json
{
"ios": {
"infoPlist": {
"NSMicrophoneUsageDescription": "WellNuo needs microphone access for voice calls with Julia AI",
"UIBackgroundModes": ["audio", "voip"]
}
}
}
```
### Token Server (already exists)
- **URL**: `https://wellnuo.smartlaunchhub.com/julia/token`
- **Method**: POST
- **Body**: `{ "userId": "string" }`
- **Response**: `{ "success": true, "data": { "token", "roomName", "wsUrl" } }`
---
## Implementation Steps
### Phase 1: Cleanup (DELETE old code)
- [ ] 1.1. Delete `app/voice-call.tsx` (current broken implementation)
- [ ] 1.2. Keep `app/(tabs)/voice.tsx` (entry point) but simplify
- [ ] 1.3. Keep `services/livekitService.ts` (token fetching)
- [ ] 1.4. Keep `contexts/VoiceTranscriptContext.tsx` (transcript storage)
- [ ] 1.5. Delete `components/VoiceIndicator.tsx` (unused)
- [ ] 1.6. Delete `polyfills/livekit-globals.ts` (not needed with proper setup)
### Phase 2: New Architecture
- [ ] 2.1. Create `hooks/useLiveKitRoom.ts` — encapsulate all LiveKit logic
- [ ] 2.2. Create `app/voice-call.tsx` — simple UI component using the hook
- [ ] 2.3. Create `utils/audioSession.ts` — iOS AudioSession helper
### Phase 3: useLiveKitRoom Hook
**File**: `hooks/useLiveKitRoom.ts`
```typescript
interface UseLiveKitRoomOptions {
userId: string;
onTranscript?: (role: 'user' | 'assistant', text: string) => void;
}
interface UseLiveKitRoomReturn {
// Connection state
state: 'idle' | 'connecting' | 'connected' | 'reconnecting' | 'disconnected' | 'error';
error: string | null;
// Call info
roomName: string | null;
callDuration: number; // seconds
// Audio state
isMuted: boolean;
isSpeaking: boolean; // agent is speaking
// Actions
connect: () => Promise<void>;
disconnect: () => Promise<void>;
toggleMute: () => void;
}
```
**Implementation requirements**:
1. MUST call `registerGlobals()` BEFORE importing `livekit-client`
2. MUST configure iOS AudioSession BEFORE connecting to room
3. MUST handle all RoomEvents properly
4. MUST cleanup on unmount (disconnect, stop audio session)
5. MUST handle background/foreground transitions
### Phase 4: iOS AudioSession Configuration
**Critical for microphone to work!**
```typescript
// utils/audioSession.ts
import { AudioSession } from '@livekit/react-native';
import { Platform } from 'react-native';
export async function configureAudioForVoiceCall(): Promise<void> {
if (Platform.OS !== 'ios') return;
// Step 1: Set Apple audio configuration
await AudioSession.setAppleAudioConfiguration({
audioCategory: 'playAndRecord',
audioCategoryOptions: [
'allowBluetooth',
'allowBluetoothA2DP',
'defaultToSpeaker',
'mixWithOthers',
],
audioMode: 'voiceChat',
});
// Step 2: Configure output
await AudioSession.configureAudio({
ios: {
defaultOutput: 'speaker',
},
});
// Step 3: Start session
await AudioSession.startAudioSession();
}
export async function stopAudioSession(): Promise<void> {
if (Platform.OS !== 'ios') return;
await AudioSession.stopAudioSession();
}
```
### Phase 5: Voice Call Screen UI
**File**: `app/voice-call.tsx`
Simple, clean UI:
- Avatar with Julia "J" letter
- Call duration timer
- Status text (Connecting... / Connected / Julia is speaking...)
- Mute button
- End call button
- Debug logs toggle (for development)
**NO complex logic in this file** — all LiveKit logic in the hook!
### Phase 6: Testing Checklist
- [ ] 6.1. Fresh app launch → Start call → Can hear Julia greeting
- [ ] 6.2. Speak → Julia responds → Conversation works
- [ ] 6.3. Mute → Unmute → Still works
- [ ] 6.4. End call → Clean disconnect
- [ ] 6.5. App to background → Audio continues
- [ ] 6.6. App to foreground → Still connected
- [ ] 6.7. Multiple calls in a row → No memory leaks
- [ ] 6.8. No microphone permission → Shows error
---
## Files to Create/Modify
| File | Action | Description |
|------|--------|-------------|
| `hooks/useLiveKitRoom.ts` | CREATE | Main LiveKit hook with all logic |
| `utils/audioSession.ts` | CREATE | iOS AudioSession helpers |
| `app/voice-call.tsx` | REPLACE | Simple UI using the hook |
| `app/(tabs)/voice.tsx` | SIMPLIFY | Just entry point, remove debug UI |
| `services/livekitService.ts` | KEEP | Token fetching (already works) |
| `contexts/VoiceTranscriptContext.tsx` | KEEP | Transcript storage |
| `components/VoiceIndicator.tsx` | DELETE | Not needed |
| `polyfills/livekit-globals.ts` | DELETE | Not needed |
---
## Key Principles
### 1. Separation of Concerns
- **Hook** handles ALL LiveKit/WebRTC logic
- **Screen** only renders UI based on hook state
- **Utils** for platform-specific code (AudioSession)
### 2. Proper Initialization Order
```
1. registerGlobals() — WebRTC polyfills
2. configureAudioForVoiceCall() — iOS audio
3. getToken() — fetch from server
4. room.connect() — connect to LiveKit
5. room.localParticipant.setMicrophoneEnabled(true) — enable mic
```
### 3. Proper Cleanup Order
```
1. room.disconnect() — leave room
2. stopAudioSession() — release iOS audio
3. Clear all refs and state
```
### 4. Error Handling
- Every async operation wrapped in try/catch
- User-friendly error messages
- Automatic retry for network issues
- Graceful degradation
---
## Success Criteria
1. ✅ User can start voice call and hear Julia greeting
2. ✅ User can speak and Julia understands (STT works reliably)
3. ✅ Julia responds with voice (TTS works)
4. ✅ Conversation can continue back and forth
5. ✅ Mute/unmute works
6. ✅ End call cleanly disconnects
7. ✅ No console errors or warnings
8. ✅ Works on iOS device (not just simulator)
---
## Related Links
- [LiveKit React Native SDK](https://docs.livekit.io/client-sdk-js/react-native/)
- [LiveKit Agents Python](https://docs.livekit.io/agents/)
- [Deepgram STT/TTS](https://deepgram.com/)
- [iOS AVAudioSession](https://developer.apple.com/documentation/avfaudio/avaudiosession)
---
## Notes
### Why previous approach failed:
1. **Too much code in one file** — voice-call.tsx had 900+ lines with all logic mixed
2. **Polyfills applied wrong** — Event class polyfill was inside the component
3. **AudioSession configured too late** — sometimes after connect() already started
4. **No proper error boundaries** — errors silently failed
5. **Race conditions** — multiple async operations without proper sequencing
### What's different this time:
1. **Hook-based architecture** — single source of truth for state
2. **Proper initialization sequence** — documented and enforced
3. **Clean separation** — UI knows nothing about WebRTC
4. **Comprehensive logging** — every step logged for debugging
5. **Test-driven** — write tests before implementation