Sergei 4b97689dd3 UI improvements: voice call layout and chat keyboard

- Remove speaker button empty space (2-button centered layout)
- Remove "Asteria voice" text from voice call screen
- Fix chat input visibility with keyboard
- Add keyboard show listener for auto-scroll

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2026-01-20 11:28:24 -08:00

13 KiB

Raw Permalink Blame History

FEATURE-002: LiveKit Voice Call with Julia AI

Summary

Полноценный голосовой звонок с Julia AI через LiveKit Cloud. Пользователь нажимает кнопку "Start Voice Call", открывается экран звонка в стиле телефона, и он может разговаривать с Julia AI голосом.

Status: 🔴 Not Started (требуется полная переделка)

Priority: Critical

Problem Statement

Текущая реализация имеет следующие проблемы:

STT (Speech-to-Text) работает нестабильно — микрофон иногда детектируется, иногда нет
TTS работает — голос Julia слышен
Код сложный и запутанный — много legacy кода, полифиллов, хаков
Нет четкой архитектуры — все в одном файле voice-call.tsx

Root Cause Analysis

Почему микрофон работает нестабильно:

iOS AudioSession — неправильная конфигурация или race condition при настройке
registerGlobals() — WebRTC polyfills могут не успевать инициализироваться
Permissions — микрофон может быть не разрешен или занят другим процессом
Event handling — события LiveKit могут теряться

Что работает:

LiveKit Cloud connection ✅
Token generation ✅
TTS (Deepgram Asteria) ✅
Backend agent (Julia AI) ✅

Architecture

System Overview

┌─────────────────────────────────────────────────────────────────────┐
│                        WellNuo Lite App (iOS)                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌──────────────┐    ┌──────────────────┐    ┌──────────────────┐  │
│  │  Voice Tab   │───▶│  VoiceCallScreen │───▶│ LiveKit Room     │  │
│  │  (entry)     │    │  (fullscreen)    │    │ (WebRTC)         │  │
│  └──────────────┘    └──────────────────┘    └──────────────────┘  │
│                              │                        │             │
│                              ▼                        ▼             │
│                      ┌──────────────┐         ┌──────────────┐     │
│                      │useLiveKitRoom│         │ AudioSession │     │
│                      │   (hook)     │         │ (iOS native) │     │
│                      └──────────────┘         └──────────────┘     │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘
                                    │
                                    │ WebSocket + WebRTC
                                    ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        LiveKit Cloud                                 │
├─────────────────────────────────────────────────────────────────────┤
│  Room: wellnuo-{userId}-{timestamp}                                  │
│  Participants: user + julia-agent                                    │
│  Audio Tracks: bidirectional                                        │
└─────────────────────────────────────────────────────────────────────┘
                                    │
                                    │ Agent dispatch
                                    ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     Julia AI Agent (Python)                         │
├─────────────────────────────────────────────────────────────────────┤
│  STT: Deepgram Nova-2                                               │
│  LLM: WellNuo voice_ask API                                         │
│  TTS: Deepgram Aura Asteria                                         │
│  Framework: LiveKit Agents SDK 1.3.11                               │
└─────────────────────────────────────────────────────────────────────┘

Data Flow

User speaks → iOS Mic → WebRTC → LiveKit Cloud → Agent → Deepgram STT
                                                            │
                                                            ▼
                                                    WellNuo API (LLM)
                                                            │
                                                            ▼
Agent receives text ← LiveKit Cloud ← WebRTC ← Deepgram TTS (audio)
                │
                ▼
        iOS Speaker → User hears Julia

Technical Requirements

Dependencies (package.json)

{
  "@livekit/react-native": "^2.x",
  "livekit-client": "^2.x",
  "expo-keep-awake": "^14.x"
}

iOS Permissions (app.json)

{
  "ios": {
    "infoPlist": {
      "NSMicrophoneUsageDescription": "WellNuo needs microphone access for voice calls with Julia AI",
      "UIBackgroundModes": ["audio", "voip"]
    }
  }
}

Token Server (already exists)

URL: https://wellnuo.smartlaunchhub.com/julia/token
Method: POST
Body: { "userId": "string" }
Response: { "success": true, "data": { "token", "roomName", "wsUrl" } }

Implementation Steps

Phase 1: Cleanup (DELETE old code)

1.1. Delete app/voice-call.tsx (current broken implementation)
1.2. Keep app/(tabs)/voice.tsx (entry point) but simplify
1.3. Keep services/livekitService.ts (token fetching)
1.4. Keep contexts/VoiceTranscriptContext.tsx (transcript storage)
1.5. Delete components/VoiceIndicator.tsx (unused)
1.6. Delete polyfills/livekit-globals.ts (not needed with proper setup)

Phase 2: New Architecture

2.1. Create hooks/useLiveKitRoom.ts — encapsulate all LiveKit logic
2.2. Create app/voice-call.tsx — simple UI component using the hook
2.3. Create utils/audioSession.ts — iOS AudioSession helper

Phase 3: useLiveKitRoom Hook

File: hooks/useLiveKitRoom.ts

interface UseLiveKitRoomOptions {
  userId: string;
  onTranscript?: (role: 'user' | 'assistant', text: string) => void;
}

interface UseLiveKitRoomReturn {
  // Connection state
  state: 'idle' | 'connecting' | 'connected' | 'reconnecting' | 'disconnected' | 'error';
  error: string | null;

  // Call info
  roomName: string | null;
  callDuration: number; // seconds

  // Audio state
  isMuted: boolean;
  isSpeaking: boolean; // agent is speaking

  // Actions
  connect: () => Promise<void>;
  disconnect: () => Promise<void>;
  toggleMute: () => void;
}

Implementation requirements:

MUST call registerGlobals() BEFORE importing livekit-client
MUST configure iOS AudioSession BEFORE connecting to room
MUST handle all RoomEvents properly
MUST cleanup on unmount (disconnect, stop audio session)
MUST handle background/foreground transitions

Phase 4: iOS AudioSession Configuration

Critical for microphone to work!

// utils/audioSession.ts
import { AudioSession } from '@livekit/react-native';
import { Platform } from 'react-native';

export async function configureAudioForVoiceCall(): Promise<void> {
  if (Platform.OS !== 'ios') return;

  // Step 1: Set Apple audio configuration
  await AudioSession.setAppleAudioConfiguration({
    audioCategory: 'playAndRecord',
    audioCategoryOptions: [
      'allowBluetooth',
      'allowBluetoothA2DP',
      'defaultToSpeaker',
      'mixWithOthers',
    ],
    audioMode: 'voiceChat',
  });

  // Step 2: Configure output
  await AudioSession.configureAudio({
    ios: {
      defaultOutput: 'speaker',
    },
  });

  // Step 3: Start session
  await AudioSession.startAudioSession();
}

export async function stopAudioSession(): Promise<void> {
  if (Platform.OS !== 'ios') return;
  await AudioSession.stopAudioSession();
}

Phase 5: Voice Call Screen UI

File: app/voice-call.tsx

Simple, clean UI:

Avatar with Julia "J" letter
Call duration timer
Status text (Connecting... / Connected / Julia is speaking...)
Mute button
End call button
Debug logs toggle (for development)

NO complex logic in this file — all LiveKit logic in the hook!

Phase 6: Testing Checklist

6.1. Fresh app launch → Start call → Can hear Julia greeting
6.2. Speak → Julia responds → Conversation works
6.3. Mute → Unmute → Still works
6.4. End call → Clean disconnect
6.5. App to background → Audio continues
6.6. App to foreground → Still connected
6.7. Multiple calls in a row → No memory leaks
6.8. No microphone permission → Shows error

Files to Create/Modify

File	Action	Description
`hooks/useLiveKitRoom.ts`	CREATE	Main LiveKit hook with all logic
`utils/audioSession.ts`	CREATE	iOS AudioSession helpers
`app/voice-call.tsx`	REPLACE	Simple UI using the hook
`app/(tabs)/voice.tsx`	SIMPLIFY	Just entry point, remove debug UI
`services/livekitService.ts`	KEEP	Token fetching (already works)
`contexts/VoiceTranscriptContext.tsx`	KEEP	Transcript storage
`components/VoiceIndicator.tsx`	DELETE	Not needed
`polyfills/livekit-globals.ts`	DELETE	Not needed

Key Principles

1. Separation of Concerns

Hook handles ALL LiveKit/WebRTC logic
Screen only renders UI based on hook state
Utils for platform-specific code (AudioSession)

2. Proper Initialization Order

1. registerGlobals() — WebRTC polyfills
2. configureAudioForVoiceCall() — iOS audio
3. getToken() — fetch from server
4. room.connect() — connect to LiveKit
5. room.localParticipant.setMicrophoneEnabled(true) — enable mic

3. Proper Cleanup Order

1. room.disconnect() — leave room
2. stopAudioSession() — release iOS audio
3. Clear all refs and state

4. Error Handling

Every async operation wrapped in try/catch
User-friendly error messages
Automatic retry for network issues
Graceful degradation

Success Criteria

✅ User can start voice call and hear Julia greeting
✅ User can speak and Julia understands (STT works reliably)
✅ Julia responds with voice (TTS works)
✅ Conversation can continue back and forth
✅ Mute/unmute works
✅ End call cleanly disconnects
✅ No console errors or warnings
✅ Works on iOS device (not just simulator)

Notes

Why previous approach failed:

Too much code in one file — voice-call.tsx had 900+ lines with all logic mixed
Polyfills applied wrong — Event class polyfill was inside the component
AudioSession configured too late — sometimes after connect() already started
No proper error boundaries — errors silently failed
Race conditions — multiple async operations without proper sequencing

What's different this time:

Hook-based architecture — single source of truth for state
Proper initialization sequence — documented and enforced
Clean separation — UI knows nothing about WebRTC
Comprehensive logging — every step logged for debugging
Test-driven — write tests before implementation

13 KiB Raw Permalink Blame History Unescape Escape