wellnua-lite/specs/FEATURE-002-livekit-voice-call.md

# FEATURE-002: LiveKit Voice Call with Julia AI

## Summary

Полноценный голосовой звонок с Julia AI через LiveKit Cloud. Пользователь нажимает кнопку "Start Voice Call", открывается экран звонка в стиле телефона, и он может разговаривать с Julia AI голосом.

## Status: 🔴 Not Started (требуется полная переделка)

## Priority: Critical

## Problem Statement

Текущая реализация имеет следующие проблемы:
1. **STT (Speech-to-Text) работает нестабильно** — микрофон иногда детектируется, иногда нет
2. **TTS работает** — голос Julia слышен
3. **Код сложный и запутанный** — много legacy кода, полифиллов, хаков
4. **Нет четкой архитектуры** — все в одном файле voice-call.tsx

## Root Cause Analysis

### Почему микрофон работает нестабильно:
1. **iOS AudioSession** — неправильная конфигурация или race condition при настройке
2. **registerGlobals()** — WebRTC polyfills могут не успевать инициализироваться
3. **Permissions** — микрофон может быть не разрешен или занят другим процессом
4. **Event handling** — события LiveKit могут теряться

### Что работает:
- LiveKit Cloud connection ✅
- Token generation ✅
- TTS (Deepgram Asteria) ✅
- Backend agent (Julia AI) ✅

---

## Architecture

### System Overview

```
┌─────────────────────────────────────────────────────────────────────┐
│                        WellNuo Lite App (iOS)                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌──────────────┐    ┌──────────────────┐    ┌──────────────────┐  │
│  │  Voice Tab   │───▶│  VoiceCallScreen │───▶│ LiveKit Room     │  │
│  │  (entry)     │    │  (fullscreen)    │    │ (WebRTC)         │  │
│  └──────────────┘    └──────────────────┘    └──────────────────┘  │
│                              │                        │             │
│                              ▼                        ▼             │
│                      ┌──────────────┐         ┌──────────────┐     │
│                      │useLiveKitRoom│         │ AudioSession │     │
│                      │   (hook)     │         │ (iOS native) │     │
│                      └──────────────┘         └──────────────┘     │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘
                                    │
                                    │ WebSocket + WebRTC
                                    ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        LiveKit Cloud                                 │
├─────────────────────────────────────────────────────────────────────┤
│  Room: wellnuo-{userId}-{timestamp}                                  │
│  Participants: user + julia-agent                                    │
│  Audio Tracks: bidirectional                                        │
└─────────────────────────────────────────────────────────────────────┘
                                    │
                                    │ Agent dispatch
                                    ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     Julia AI Agent (Python)                         │
├─────────────────────────────────────────────────────────────────────┤
│  STT: Deepgram Nova-2                                               │
│  LLM: WellNuo voice_ask API                                         │
│  TTS: Deepgram Aura Asteria                                         │
│  Framework: LiveKit Agents SDK 1.3.11                               │
└─────────────────────────────────────────────────────────────────────┘
```

### Data Flow

```
User speaks → iOS Mic → WebRTC → LiveKit Cloud → Agent → Deepgram STT
                                                            │
                                                            ▼
                                                    WellNuo API (LLM)
                                                            │
                                                            ▼
Agent receives text ← LiveKit Cloud ← WebRTC ← Deepgram TTS (audio)
                │
                ▼
        iOS Speaker → User hears Julia
```

---

## Technical Requirements

### Dependencies (package.json)

```json
{
  "@livekit/react-native": "^2.x",
  "livekit-client": "^2.x",
  "expo-keep-awake": "^14.x"
}
```

### iOS Permissions (app.json)

```json
{
  "ios": {
    "infoPlist": {
      "NSMicrophoneUsageDescription": "WellNuo needs microphone access for voice calls with Julia AI",
      "UIBackgroundModes": ["audio", "voip"]
    }
  }
}
```

### Token Server (already exists)

- **URL**: `https://wellnuo.smartlaunchhub.com/julia/token`
- **Method**: POST
- **Body**: `{ "userId": "string" }`
- **Response**: `{ "success": true, "data": { "token", "roomName", "wsUrl" } }`

---

## Implementation Steps

### Phase 1: Cleanup (DELETE old code)

- [ ] 1.1. Delete `app/voice-call.tsx` (current broken implementation)
- [ ] 1.2. Keep `app/(tabs)/voice.tsx` (entry point) but simplify
- [ ] 1.3. Keep `services/livekitService.ts` (token fetching)
- [ ] 1.4. Keep `contexts/VoiceTranscriptContext.tsx` (transcript storage)
- [ ] 1.5. Delete `components/VoiceIndicator.tsx` (unused)
- [ ] 1.6. Delete `polyfills/livekit-globals.ts` (not needed with proper setup)

### Phase 2: New Architecture

- [ ] 2.1. Create `hooks/useLiveKitRoom.ts` — encapsulate all LiveKit logic
- [ ] 2.2. Create `app/voice-call.tsx` — simple UI component using the hook
- [ ] 2.3. Create `utils/audioSession.ts` — iOS AudioSession helper

### Phase 3: useLiveKitRoom Hook

**File**: `hooks/useLiveKitRoom.ts`

```typescript
interface UseLiveKitRoomOptions {
  userId: string;
  onTranscript?: (role: 'user' | 'assistant', text: string) => void;
}

interface UseLiveKitRoomReturn {
  // Connection state
  state: 'idle' | 'connecting' | 'connected' | 'reconnecting' | 'disconnected' | 'error';
  error: string | null;

  // Call info
  roomName: string | null;
  callDuration: number; // seconds

  // Audio state
  isMuted: boolean;
  isSpeaking: boolean; // agent is speaking

  // Actions
  connect: () => Promise<void>;
  disconnect: () => Promise<void>;
  toggleMute: () => void;
}
```

**Implementation requirements**:
1. MUST call `registerGlobals()` BEFORE importing `livekit-client`
2. MUST configure iOS AudioSession BEFORE connecting to room
3. MUST handle all RoomEvents properly
4. MUST cleanup on unmount (disconnect, stop audio session)
5. MUST handle background/foreground transitions

### Phase 4: iOS AudioSession Configuration

**Critical for microphone to work!**

```typescript
// utils/audioSession.ts
import { AudioSession } from '@livekit/react-native';
import { Platform } from 'react-native';

export async function configureAudioForVoiceCall(): Promise<void> {
  if (Platform.OS !== 'ios') return;

  // Step 1: Set Apple audio configuration
  await AudioSession.setAppleAudioConfiguration({
    audioCategory: 'playAndRecord',
    audioCategoryOptions: [
      'allowBluetooth',
      'allowBluetoothA2DP',
      'defaultToSpeaker',
      'mixWithOthers',
    ],
    audioMode: 'voiceChat',
  });

  // Step 2: Configure output
  await AudioSession.configureAudio({
    ios: {
      defaultOutput: 'speaker',
    },
  });

  // Step 3: Start session
  await AudioSession.startAudioSession();
}

export async function stopAudioSession(): Promise<void> {
  if (Platform.OS !== 'ios') return;
  await AudioSession.stopAudioSession();
}
```

### Phase 5: Voice Call Screen UI

**File**: `app/voice-call.tsx`

Simple, clean UI:
- Avatar with Julia "J" letter
- Call duration timer
- Status text (Connecting... / Connected / Julia is speaking...)
- Mute button
- End call button
- Debug logs toggle (for development)

**NO complex logic in this file** — all LiveKit logic in the hook!

### Phase 6: Testing Checklist

- [ ] 6.1. Fresh app launch → Start call → Can hear Julia greeting
- [ ] 6.2. Speak → Julia responds → Conversation works
- [ ] 6.3. Mute → Unmute → Still works
- [ ] 6.4. End call → Clean disconnect
- [ ] 6.5. App to background → Audio continues
- [ ] 6.6. App to foreground → Still connected
- [ ] 6.7. Multiple calls in a row → No memory leaks
- [ ] 6.8. No microphone permission → Shows error

---

## Files to Create/Modify

| File | Action | Description |
|------|--------|-------------|
| `hooks/useLiveKitRoom.ts` | CREATE | Main LiveKit hook with all logic |
| `utils/audioSession.ts` | CREATE | iOS AudioSession helpers |
| `app/voice-call.tsx` | REPLACE | Simple UI using the hook |
| `app/(tabs)/voice.tsx` | SIMPLIFY | Just entry point, remove debug UI |
| `services/livekitService.ts` | KEEP | Token fetching (already works) |
| `contexts/VoiceTranscriptContext.tsx` | KEEP | Transcript storage |
| `components/VoiceIndicator.tsx` | DELETE | Not needed |
| `polyfills/livekit-globals.ts` | DELETE | Not needed |

---

## Key Principles

### 1. Separation of Concerns
- **Hook** handles ALL LiveKit/WebRTC logic
- **Screen** only renders UI based on hook state
- **Utils** for platform-specific code (AudioSession)

### 2. Proper Initialization Order
```
1. registerGlobals() — WebRTC polyfills
2. configureAudioForVoiceCall() — iOS audio
3. getToken() — fetch from server
4. room.connect() — connect to LiveKit
5. room.localParticipant.setMicrophoneEnabled(true) — enable mic
```

### 3. Proper Cleanup Order
```
1. room.disconnect() — leave room
2. stopAudioSession() — release iOS audio
3. Clear all refs and state
```

### 4. Error Handling
- Every async operation wrapped in try/catch
- User-friendly error messages
- Automatic retry for network issues
- Graceful degradation

---

## Success Criteria

1. ✅ User can start voice call and hear Julia greeting
2. ✅ User can speak and Julia understands (STT works reliably)
3. ✅ Julia responds with voice (TTS works)
4. ✅ Conversation can continue back and forth
5. ✅ Mute/unmute works
6. ✅ End call cleanly disconnects
7. ✅ No console errors or warnings
8. ✅ Works on iOS device (not just simulator)

---

## Related Links

- [LiveKit React Native SDK](https://docs.livekit.io/client-sdk-js/react-native/)
- [LiveKit Agents Python](https://docs.livekit.io/agents/)
- [Deepgram STT/TTS](https://deepgram.com/)
- [iOS AVAudioSession](https://developer.apple.com/documentation/avfaudio/avaudiosession)

---

## Notes

### Why previous approach failed:

1. **Too much code in one file** — voice-call.tsx had 900+ lines with all logic mixed
2. **Polyfills applied wrong** — Event class polyfill was inside the component
3. **AudioSession configured too late** — sometimes after connect() already started
4. **No proper error boundaries** — errors silently failed
5. **Race conditions** — multiple async operations without proper sequencing

### What's different this time:

1. **Hook-based architecture** — single source of truth for state
2. **Proper initialization sequence** — documented and enforced
3. **Clean separation** — UI knows nothing about WebRTC
4. **Comprehensive logging** — every step logged for debugging
5. **Test-driven** — write tests before implementation