- Remove speaker button empty space (2-button centered layout) - Remove "Asteria voice" text from voice call screen - Fix chat input visibility with keyboard - Add keyboard show listener for auto-scroll 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
337 lines
13 KiB
Markdown
337 lines
13 KiB
Markdown
# FEATURE-002: LiveKit Voice Call with Julia AI
|
||
|
||
## Summary
|
||
|
||
Полноценный голосовой звонок с Julia AI через LiveKit Cloud. Пользователь нажимает кнопку "Start Voice Call", открывается экран звонка в стиле телефона, и он может разговаривать с Julia AI голосом.
|
||
|
||
## Status: 🔴 Not Started (требуется полная переделка)
|
||
|
||
## Priority: Critical
|
||
|
||
## Problem Statement
|
||
|
||
Текущая реализация имеет следующие проблемы:
|
||
1. **STT (Speech-to-Text) работает нестабильно** — микрофон иногда детектируется, иногда нет
|
||
2. **TTS работает** — голос Julia слышен
|
||
3. **Код сложный и запутанный** — много legacy кода, полифиллов, хаков
|
||
4. **Нет четкой архитектуры** — все в одном файле voice-call.tsx
|
||
|
||
## Root Cause Analysis
|
||
|
||
### Почему микрофон работает нестабильно:
|
||
1. **iOS AudioSession** — неправильная конфигурация или race condition при настройке
|
||
2. **registerGlobals()** — WebRTC polyfills могут не успевать инициализироваться
|
||
3. **Permissions** — микрофон может быть не разрешен или занят другим процессом
|
||
4. **Event handling** — события LiveKit могут теряться
|
||
|
||
### Что работает:
|
||
- LiveKit Cloud connection ✅
|
||
- Token generation ✅
|
||
- TTS (Deepgram Asteria) ✅
|
||
- Backend agent (Julia AI) ✅
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
### System Overview
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────┐
|
||
│ WellNuo Lite App (iOS) │
|
||
├─────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ┌──────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
|
||
│ │ Voice Tab │───▶│ VoiceCallScreen │───▶│ LiveKit Room │ │
|
||
│ │ (entry) │ │ (fullscreen) │ │ (WebRTC) │ │
|
||
│ └──────────────┘ └──────────────────┘ └──────────────────┘ │
|
||
│ │ │ │
|
||
│ ▼ ▼ │
|
||
│ ┌──────────────┐ ┌──────────────┐ │
|
||
│ │useLiveKitRoom│ │ AudioSession │ │
|
||
│ │ (hook) │ │ (iOS native) │ │
|
||
│ └──────────────┘ └──────────────┘ │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────────┘
|
||
│
|
||
│ WebSocket + WebRTC
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────────┐
|
||
│ LiveKit Cloud │
|
||
├─────────────────────────────────────────────────────────────────────┤
|
||
│ Room: wellnuo-{userId}-{timestamp} │
|
||
│ Participants: user + julia-agent │
|
||
│ Audio Tracks: bidirectional │
|
||
└─────────────────────────────────────────────────────────────────────┘
|
||
│
|
||
│ Agent dispatch
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────────┐
|
||
│ Julia AI Agent (Python) │
|
||
├─────────────────────────────────────────────────────────────────────┤
|
||
│ STT: Deepgram Nova-2 │
|
||
│ LLM: WellNuo voice_ask API │
|
||
│ TTS: Deepgram Aura Asteria │
|
||
│ Framework: LiveKit Agents SDK 1.3.11 │
|
||
└─────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Data Flow
|
||
|
||
```
|
||
User speaks → iOS Mic → WebRTC → LiveKit Cloud → Agent → Deepgram STT
|
||
│
|
||
▼
|
||
WellNuo API (LLM)
|
||
│
|
||
▼
|
||
Agent receives text ← LiveKit Cloud ← WebRTC ← Deepgram TTS (audio)
|
||
│
|
||
▼
|
||
iOS Speaker → User hears Julia
|
||
```
|
||
|
||
---
|
||
|
||
## Technical Requirements
|
||
|
||
### Dependencies (package.json)
|
||
|
||
```json
|
||
{
|
||
"@livekit/react-native": "^2.x",
|
||
"livekit-client": "^2.x",
|
||
"expo-keep-awake": "^14.x"
|
||
}
|
||
```
|
||
|
||
### iOS Permissions (app.json)
|
||
|
||
```json
|
||
{
|
||
"ios": {
|
||
"infoPlist": {
|
||
"NSMicrophoneUsageDescription": "WellNuo needs microphone access for voice calls with Julia AI",
|
||
"UIBackgroundModes": ["audio", "voip"]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### Token Server (already exists)
|
||
|
||
- **URL**: `https://wellnuo.smartlaunchhub.com/julia/token`
|
||
- **Method**: POST
|
||
- **Body**: `{ "userId": "string" }`
|
||
- **Response**: `{ "success": true, "data": { "token", "roomName", "wsUrl" } }`
|
||
|
||
---
|
||
|
||
## Implementation Steps
|
||
|
||
### Phase 1: Cleanup (DELETE old code)
|
||
|
||
- [ ] 1.1. Delete `app/voice-call.tsx` (current broken implementation)
|
||
- [ ] 1.2. Keep `app/(tabs)/voice.tsx` (entry point) but simplify
|
||
- [ ] 1.3. Keep `services/livekitService.ts` (token fetching)
|
||
- [ ] 1.4. Keep `contexts/VoiceTranscriptContext.tsx` (transcript storage)
|
||
- [ ] 1.5. Delete `components/VoiceIndicator.tsx` (unused)
|
||
- [ ] 1.6. Delete `polyfills/livekit-globals.ts` (not needed with proper setup)
|
||
|
||
### Phase 2: New Architecture
|
||
|
||
- [ ] 2.1. Create `hooks/useLiveKitRoom.ts` — encapsulate all LiveKit logic
|
||
- [ ] 2.2. Create `app/voice-call.tsx` — simple UI component using the hook
|
||
- [ ] 2.3. Create `utils/audioSession.ts` — iOS AudioSession helper
|
||
|
||
### Phase 3: useLiveKitRoom Hook
|
||
|
||
**File**: `hooks/useLiveKitRoom.ts`
|
||
|
||
```typescript
|
||
interface UseLiveKitRoomOptions {
|
||
userId: string;
|
||
onTranscript?: (role: 'user' | 'assistant', text: string) => void;
|
||
}
|
||
|
||
interface UseLiveKitRoomReturn {
|
||
// Connection state
|
||
state: 'idle' | 'connecting' | 'connected' | 'reconnecting' | 'disconnected' | 'error';
|
||
error: string | null;
|
||
|
||
// Call info
|
||
roomName: string | null;
|
||
callDuration: number; // seconds
|
||
|
||
// Audio state
|
||
isMuted: boolean;
|
||
isSpeaking: boolean; // agent is speaking
|
||
|
||
// Actions
|
||
connect: () => Promise<void>;
|
||
disconnect: () => Promise<void>;
|
||
toggleMute: () => void;
|
||
}
|
||
```
|
||
|
||
**Implementation requirements**:
|
||
1. MUST call `registerGlobals()` BEFORE importing `livekit-client`
|
||
2. MUST configure iOS AudioSession BEFORE connecting to room
|
||
3. MUST handle all RoomEvents properly
|
||
4. MUST cleanup on unmount (disconnect, stop audio session)
|
||
5. MUST handle background/foreground transitions
|
||
|
||
### Phase 4: iOS AudioSession Configuration
|
||
|
||
**Critical for microphone to work!**
|
||
|
||
```typescript
|
||
// utils/audioSession.ts
|
||
import { AudioSession } from '@livekit/react-native';
|
||
import { Platform } from 'react-native';
|
||
|
||
export async function configureAudioForVoiceCall(): Promise<void> {
|
||
if (Platform.OS !== 'ios') return;
|
||
|
||
// Step 1: Set Apple audio configuration
|
||
await AudioSession.setAppleAudioConfiguration({
|
||
audioCategory: 'playAndRecord',
|
||
audioCategoryOptions: [
|
||
'allowBluetooth',
|
||
'allowBluetoothA2DP',
|
||
'defaultToSpeaker',
|
||
'mixWithOthers',
|
||
],
|
||
audioMode: 'voiceChat',
|
||
});
|
||
|
||
// Step 2: Configure output
|
||
await AudioSession.configureAudio({
|
||
ios: {
|
||
defaultOutput: 'speaker',
|
||
},
|
||
});
|
||
|
||
// Step 3: Start session
|
||
await AudioSession.startAudioSession();
|
||
}
|
||
|
||
export async function stopAudioSession(): Promise<void> {
|
||
if (Platform.OS !== 'ios') return;
|
||
await AudioSession.stopAudioSession();
|
||
}
|
||
```
|
||
|
||
### Phase 5: Voice Call Screen UI
|
||
|
||
**File**: `app/voice-call.tsx`
|
||
|
||
Simple, clean UI:
|
||
- Avatar with Julia "J" letter
|
||
- Call duration timer
|
||
- Status text (Connecting... / Connected / Julia is speaking...)
|
||
- Mute button
|
||
- End call button
|
||
- Debug logs toggle (for development)
|
||
|
||
**NO complex logic in this file** — all LiveKit logic in the hook!
|
||
|
||
### Phase 6: Testing Checklist
|
||
|
||
- [ ] 6.1. Fresh app launch → Start call → Can hear Julia greeting
|
||
- [ ] 6.2. Speak → Julia responds → Conversation works
|
||
- [ ] 6.3. Mute → Unmute → Still works
|
||
- [ ] 6.4. End call → Clean disconnect
|
||
- [ ] 6.5. App to background → Audio continues
|
||
- [ ] 6.6. App to foreground → Still connected
|
||
- [ ] 6.7. Multiple calls in a row → No memory leaks
|
||
- [ ] 6.8. No microphone permission → Shows error
|
||
|
||
---
|
||
|
||
## Files to Create/Modify
|
||
|
||
| File | Action | Description |
|
||
|------|--------|-------------|
|
||
| `hooks/useLiveKitRoom.ts` | CREATE | Main LiveKit hook with all logic |
|
||
| `utils/audioSession.ts` | CREATE | iOS AudioSession helpers |
|
||
| `app/voice-call.tsx` | REPLACE | Simple UI using the hook |
|
||
| `app/(tabs)/voice.tsx` | SIMPLIFY | Just entry point, remove debug UI |
|
||
| `services/livekitService.ts` | KEEP | Token fetching (already works) |
|
||
| `contexts/VoiceTranscriptContext.tsx` | KEEP | Transcript storage |
|
||
| `components/VoiceIndicator.tsx` | DELETE | Not needed |
|
||
| `polyfills/livekit-globals.ts` | DELETE | Not needed |
|
||
|
||
---
|
||
|
||
## Key Principles
|
||
|
||
### 1. Separation of Concerns
|
||
- **Hook** handles ALL LiveKit/WebRTC logic
|
||
- **Screen** only renders UI based on hook state
|
||
- **Utils** for platform-specific code (AudioSession)
|
||
|
||
### 2. Proper Initialization Order
|
||
```
|
||
1. registerGlobals() — WebRTC polyfills
|
||
2. configureAudioForVoiceCall() — iOS audio
|
||
3. getToken() — fetch from server
|
||
4. room.connect() — connect to LiveKit
|
||
5. room.localParticipant.setMicrophoneEnabled(true) — enable mic
|
||
```
|
||
|
||
### 3. Proper Cleanup Order
|
||
```
|
||
1. room.disconnect() — leave room
|
||
2. stopAudioSession() — release iOS audio
|
||
3. Clear all refs and state
|
||
```
|
||
|
||
### 4. Error Handling
|
||
- Every async operation wrapped in try/catch
|
||
- User-friendly error messages
|
||
- Automatic retry for network issues
|
||
- Graceful degradation
|
||
|
||
---
|
||
|
||
## Success Criteria
|
||
|
||
1. ✅ User can start voice call and hear Julia greeting
|
||
2. ✅ User can speak and Julia understands (STT works reliably)
|
||
3. ✅ Julia responds with voice (TTS works)
|
||
4. ✅ Conversation can continue back and forth
|
||
5. ✅ Mute/unmute works
|
||
6. ✅ End call cleanly disconnects
|
||
7. ✅ No console errors or warnings
|
||
8. ✅ Works on iOS device (not just simulator)
|
||
|
||
---
|
||
|
||
## Related Links
|
||
|
||
- [LiveKit React Native SDK](https://docs.livekit.io/client-sdk-js/react-native/)
|
||
- [LiveKit Agents Python](https://docs.livekit.io/agents/)
|
||
- [Deepgram STT/TTS](https://deepgram.com/)
|
||
- [iOS AVAudioSession](https://developer.apple.com/documentation/avfaudio/avaudiosession)
|
||
|
||
---
|
||
|
||
## Notes
|
||
|
||
### Why previous approach failed:
|
||
|
||
1. **Too much code in one file** — voice-call.tsx had 900+ lines with all logic mixed
|
||
2. **Polyfills applied wrong** — Event class polyfill was inside the component
|
||
3. **AudioSession configured too late** — sometimes after connect() already started
|
||
4. **No proper error boundaries** — errors silently failed
|
||
5. **Race conditions** — multiple async operations without proper sequencing
|
||
|
||
### What's different this time:
|
||
|
||
1. **Hook-based architecture** — single source of truth for state
|
||
2. **Proper initialization sequence** — documented and enforced
|
||
3. **Clean separation** — UI knows nothing about WebRTC
|
||
4. **Comprehensive logging** — every step logged for debugging
|
||
5. **Test-driven** — write tests before implementation
|