wellnua-lite-Robert/specs/voice-integration-flow.json
Sergei cde44adc5c Add TTS model metadata and documentation
TTS Model (Piper VITS):
- MODEL_CARD: Voice model information
- tokens.txt: Phoneme tokenization
- onnx.json: Model configuration
- Model: en_US-lessac-medium (60MB ONNX - not in git)

Documentation:
- APP_REVIEW_NOTES.txt: App Store review notes
- specs/: Feature specifications
- plugins/: Expo config plugins

.gitignore updates:
- Exclude large ONNX models (60MB+)
- Exclude espeak-ng-data (phoneme data)
- Exclude credentials.json
- Exclude store-screenshots/

Note: TTS models must be downloaded separately.
See specs/ for setup instructions.
2026-01-14 19:10:13 -08:00

379 lines
16 KiB
JSON
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"elements": [
{
"id": "legend",
"type": "card",
"title": "LEGEND: Voice Integration",
"borderColor": "gray",
"tags": ["Reference"],
"description": "**Color Coding:**\n\n🔴 `red` = User Action (tap, speak)\n🔵 `blue` = App Logic / Screen\n🟣 `purple` = Native Module\n🟢 `green` = External Service (AI API)\n🟠 `orange` = Warning / Edge Case\n⚫ `gray` = Reference\n\n**States:**\n- `isListening` - Microphone active\n- `isSpeaking` - TTS playing\n- `ttsInitialized` - TTS ready\n- `recognizedText` - Speech transcript",
"x": 50,
"y": 50,
"connections": []
},
{
"id": "step-001",
"type": "card",
"title": "Chat Screen",
"borderColor": "blue",
"tags": ["Screen"],
"description": "**User sees:**\n- Message list\n- Input field\n- 🎤 Microphone button\n- Send button\n\n**Initial state:**\n```\nisListening: false\nisSpeaking: false\nttsInitialized: false\n```\n\n**On mount:** Initialize TTS",
"x": 100,
"y": 200,
"connections": [
{ "to": "step-002" },
{ "to": "step-010" }
]
},
{
"id": "step-002",
"type": "card",
"title": "App: Initialize TTS",
"borderColor": "blue",
"tags": ["App"],
"description": "**useEffect on mount:**\n```javascript\nconst initTTS = async () => {\n const success = await \n sherpaTTS.initialize();\n setTtsInitialized(success);\n};\ninitTTS();\n```\n\n**Cleanup on unmount:**\n```javascript\nsherpaTTS.deinitialize();\n```",
"x": 500,
"y": 200,
"connections": [
{ "to": "step-003" },
{ "to": "step-004" }
]
},
{
"id": "step-003",
"type": "card",
"title": "SherpaTTS: Load Model",
"borderColor": "purple",
"tags": ["Native"],
"description": "**Native Module: TTSManager**\n\n1. Load Piper ONNX model\n2. Load tokens.txt\n3. Initialize espeak-ng-data\n\n**Model paths (iOS):**\n```\nassets/tts-models/\n vits-piper-en_US-lessac-medium/\n en_US-lessac-medium.onnx\n tokens.txt\n espeak-ng-data/\n```",
"x": 900,
"y": 200,
"connections": [
{ "to": "step-005" },
{ "to": "err-001" }
]
},
{
"id": "step-004",
"type": "card",
"title": "Fallback: expo-speech",
"borderColor": "orange",
"tags": ["App", "Fallback"],
"description": "**When SherpaTTS unavailable:**\n- Expo Go mode (no native)\n- Model files missing\n- Device not supported\n\n**Fallback:**\n```javascript\nif (!sherpaTTS.isAvailable()) {\n ExpoSpeech.speak(text, {\n language: 'en-US',\n rate: 0.9\n });\n}\n```",
"x": 500,
"y": 400,
"connections": []
},
{
"id": "step-005",
"type": "card",
"title": "TTS Ready",
"borderColor": "blue",
"tags": ["App"],
"description": "**State updated:**\n```\nttsInitialized: true\n```\n\n**Available voices:**\n| ID | Name | Gender |\n|----|------|--------|\n| lessac | Lessac | Female US |\n| ryan | Ryan | Male US |\n| alba | Alba | Female UK |",
"x": 900,
"y": 400,
"connections": []
},
{
"id": "err-001",
"type": "card",
"title": "ERROR: TTS Init Failed",
"borderColor": "red",
"tags": ["Error"],
"description": "**When:**\n- Native module missing\n- Model files not found\n- Memory allocation failed\n\n**App state:**\n```\nttsInitialized: false\nerror: 'Native module not available'\n```\n\n**Fallback:** Use expo-speech",
"x": 1300,
"y": 200,
"connections": [
{ "to": "step-004" }
]
},
{
"id": "step-010",
"type": "card",
"title": "User: Tap 🎤 Button",
"borderColor": "red",
"tags": ["User"],
"description": "**User taps microphone button**\n\nButton appearance:\n- Default: Outline mic icon\n- Active: Filled mic, primary color\n- Disabled: Grayed out (0.5 opacity)\n\n**Triggers:** `handleVoiceToggle()`",
"x": 100,
"y": 600,
"connections": [
{ "to": "step-011" }
]
},
{
"id": "step-011",
"type": "card",
"title": "App: handleVoiceToggle()",
"borderColor": "blue",
"tags": ["App"],
"description": "**Decision logic:**\n```javascript\nif (isListening) {\n stopListening();\n handleVoiceSend();\n} else {\n startListening();\n}\n```\n\n**Check availability:**\n```javascript\nif (!speechRecognitionAvailable) {\n Alert.alert('Not Available');\n return;\n}\n```",
"x": 500,
"y": 600,
"connections": [
{ "to": "step-012" },
{ "to": "step-020" },
{ "to": "err-002" }
]
},
{
"id": "err-002",
"type": "card",
"title": "ERROR: No Mic Permission",
"borderColor": "red",
"tags": ["Error"],
"description": "**When:**\n- User denied microphone access\n- Permission not requested\n\n**App shows:**\n```\nAlert: 'Microphone Access Required'\n\n'Please enable microphone access\nin Settings to use voice input.'\n```\n\n**Resolution:** Open Settings",
"x": 500,
"y": 800,
"connections": []
},
{
"id": "step-012",
"type": "card",
"title": "App: Start Listening",
"borderColor": "blue",
"tags": ["App"],
"description": "**Actions:**\n1. Reset `recognizedText`\n2. Start pulse animation\n3. Call native speech recognition\n\n```javascript\nsetRecognizedText('');\nAnimated.loop(\n Animated.sequence([...])\n).start();\nawait startListening();\n```",
"x": 900,
"y": 600,
"connections": [
{ "to": "step-013" }
]
},
{
"id": "step-013",
"type": "card",
"title": "expo-speech-recognition",
"borderColor": "purple",
"tags": ["Native"],
"description": "**Native Module: ExpoSpeechRecognition**\n\n```javascript\nExpoSpeechRecognitionModule.start({\n lang: 'en-US',\n interimResults: true,\n maxAlternatives: 1,\n continuous: false\n});\n```\n\n**Events:**\n- `start` → setIsListening(true)\n- `result` → setRecognizedText()\n- `end` → setIsListening(false)\n- `error` → handle error",
"x": 1300,
"y": 600,
"connections": [
{ "to": "step-014" }
]
},
{
"id": "step-014",
"type": "card",
"title": "UI: Listening State",
"borderColor": "blue",
"tags": ["Screen"],
"description": "**Visual indicators:**\n\n1. **Mic button:**\n - Background: Primary color\n - Pulsing animation (scale 1.0 → 1.2)\n\n2. **Status bar:**\n ```\n 🔵 Listening...\n ```\n\n3. **Input field:**\n - Shows real-time transcript\n - Updates on each interim result",
"x": 1300,
"y": 800,
"connections": [
{ "to": "step-015" }
]
},
{
"id": "step-015",
"type": "card",
"title": "User: Speaking",
"borderColor": "red",
"tags": ["User"],
"description": "**User speaks into microphone**\n\n**Real-time transcript:**\n```\n\"Hello, how are you today?\"\n```\n\n**Interim results update:**\n- Partial words appear as spoken\n- Final result when silence detected\n\n**To stop:** Tap mic again OR stop speaking",
"x": 1300,
"y": 1000,
"connections": [
{ "to": "step-020" }
]
},
{
"id": "step-020",
"type": "card",
"title": "App: Stop & Send",
"borderColor": "blue",
"tags": ["App"],
"description": "**handleVoiceSend():**\n```javascript\nconst textToSend = \n recognizedText.trim();\n\nif (textToSend) {\n setInputText(textToSend);\n sendMessage(textToSend);\n setRecognizedText('');\n}\n```\n\n**Validation:**\n- Skip if empty transcript\n- Trim whitespace",
"x": 100,
"y": 1000,
"connections": [
{ "to": "step-021" },
{ "to": "err-003" }
]
},
{
"id": "err-003",
"type": "card",
"title": "WARNING: Empty Transcript",
"borderColor": "orange",
"tags": ["Warning"],
"description": "**When:**\n- User tapped mic but didn't speak\n- Background noise only\n- Recognition failed\n\n**Behavior:**\n- Don't send empty message\n- Return to idle state\n- No error shown to user",
"x": 100,
"y": 1200,
"connections": []
},
{
"id": "step-021",
"type": "card",
"title": "App: Send Message",
"borderColor": "blue",
"tags": ["App", "API"],
"description": "**Add user message to chat:**\n```javascript\nsetMessages(prev => [...prev, {\n role: 'user',\n content: textToSend\n}]);\n```\n\n**Call AI API:**\n```\nPOST /ai/stream\nBody: { messages, beneficiaryId }\n```",
"x": 500,
"y": 1000,
"connections": [
{ "to": "step-022" }
]
},
{
"id": "step-022",
"type": "card",
"title": "AI Backend: Process",
"borderColor": "green",
"tags": ["External", "API"],
"description": "**Server processes request:**\n\n1. Validate JWT token\n2. Get beneficiary context\n3. Call OpenAI/OpenRouter API\n4. Stream response chunks\n\n**Response:**\n```\ndata: {\"delta\":\"Hello\"}\ndata: {\"delta\":\"! How\"}\ndata: {\"delta\":\" can I\"}\ndata: {\"delta\":\" help?\"}\n[DONE]\n```",
"x": 900,
"y": 1000,
"connections": [
{ "to": "step-023" },
{ "to": "err-004" }
]
},
{
"id": "err-004",
"type": "card",
"title": "ERROR: AI API Failed",
"borderColor": "red",
"tags": ["Error"],
"description": "**When:**\n- Network error\n- API rate limit\n- Invalid token\n- Server error (500)\n\n**App shows:**\n```\n\"Sorry, I couldn't process your \nrequest. Please try again.\"\n```\n\n**TTS:** Speaks error message",
"x": 900,
"y": 1200,
"connections": []
},
{
"id": "step-023",
"type": "card",
"title": "App: Receive AI Response",
"borderColor": "blue",
"tags": ["App"],
"description": "**Stream handling:**\n```javascript\nfor await (const chunk of stream) {\n setMessages(prev => {\n // Append chunk to last message\n const updated = [...prev];\n updated[updated.length-1]\n .content += chunk;\n return updated;\n });\n}\n```\n\n**On complete:** Trigger TTS",
"x": 1300,
"y": 1000,
"connections": [
{ "to": "step-030" }
]
},
{
"id": "step-030",
"type": "card",
"title": "App: speakText(response)",
"borderColor": "blue",
"tags": ["App"],
"description": "**Auto-speak AI response:**\n```javascript\nconst speakText = async (text) => {\n if (!ttsInitialized) {\n // Fallback to expo-speech\n ExpoSpeech.speak(text);\n return;\n }\n \n setIsSpeaking(true);\n await sherpaTTS.speak(text, {\n speed: 1.0,\n onDone: () => setIsSpeaking(false)\n });\n};\n```",
"x": 100,
"y": 1400,
"connections": [
{ "to": "step-031" }
]
},
{
"id": "step-031",
"type": "card",
"title": "SherpaTTS: Generate Audio",
"borderColor": "purple",
"tags": ["Native"],
"description": "**Native TTS processing:**\n\n1. Text → phonemes (espeak-ng)\n2. Phonemes → audio (Piper VITS)\n3. Audio → device speaker\n\n**Parameters:**\n```javascript\nTTSManager.generateAndPlay(\n text,\n speakerId: 0,\n speed: 1.0\n);\n```\n\n**Model:** ~20MB neural network",
"x": 500,
"y": 1400,
"connections": [
{ "to": "step-032" }
]
},
{
"id": "step-032",
"type": "card",
"title": "UI: Speaking State",
"borderColor": "blue",
"tags": ["Screen"],
"description": "**Visual indicators:**\n\n1. **Status bar:**\n ```\n 🟢 Speaking... [⏹ Stop]\n ```\n\n2. **Stop button:**\n - Red stop circle icon\n - Tapping interrupts speech\n\n3. **Mic button:**\n - Disabled while speaking\n - Prevents overlap",
"x": 900,
"y": 1400,
"connections": [
{ "to": "step-033" },
{ "to": "step-040" }
]
},
{
"id": "step-033",
"type": "card",
"title": "TTS: Playback Complete",
"borderColor": "blue",
"tags": ["App"],
"description": "**On done callback:**\n```javascript\nonDone: () => {\n setIsSpeaking(false);\n}\n```\n\n**State reset:**\n```\nisSpeaking: false\n```\n\n**User can:**\n- Start new voice input\n- Type manually\n- Scroll chat history",
"x": 1300,
"y": 1400,
"connections": []
},
{
"id": "step-040",
"type": "card",
"title": "User: Tap Stop",
"borderColor": "red",
"tags": ["User"],
"description": "**User interrupts speech:**\n\nTaps stop button (⏹) to cancel TTS playback immediately.\n\n**Use cases:**\n- Response too long\n- User wants to ask follow-up\n- Wrong response",
"x": 900,
"y": 1600,
"connections": [
{ "to": "step-041" }
]
},
{
"id": "step-041",
"type": "card",
"title": "App: stopSpeaking()",
"borderColor": "blue",
"tags": ["App"],
"description": "**Stop playback:**\n```javascript\nconst stopSpeaking = () => {\n if (ttsInitialized) {\n sherpaTTS.stop();\n } else {\n ExpoSpeech.stop();\n }\n setIsSpeaking(false);\n};\n```\n\n**Immediate effect:**\n- Audio stops\n- UI returns to idle",
"x": 1300,
"y": 1600,
"connections": []
},
{
"id": "state-machine",
"type": "card",
"title": "STATE MACHINE: Voice",
"borderColor": "gray",
"tags": ["Reference"],
"description": "```\n ┌─────────────┐\n │ IDLE │\n │ isListening:│\n │ false │\n │ isSpeaking: │\n │ false │\n └──────┬──────┘\n │ tap mic\n ┌──────▼──────┐\n │ LISTENING │\n │ isListening:│\n │ true │\n │ (pulsing) │\n └──────┬──────┘\n │ stop/send\n ┌──────▼──────┐\n │ PROCESSING │\n │ isSending: │\n │ true │\n └──────┬──────┘\n │ AI responds\n ┌──────▼──────┐\n │ SPEAKING │\n │ isSpeaking: │\n │ true │\n └──────┬──────┘\n │ done/stop\n ┌──────▼──────┐\n │ IDLE │\n └─────────────┘\n```",
"x": 50,
"y": 1800,
"connections": []
},
{
"id": "files-ref",
"type": "card",
"title": "FILES: Voice Integration",
"borderColor": "gray",
"tags": ["Reference"],
"description": "**Modified files:**\n\n📄 `package.json`\n- expo-speech\n- expo-speech-recognition\n- react-native-sherpa-onnx-offline-tts\n\n📄 `services/sherpaTTS.ts`\n- Initialize, speak, stop\n- Voice selection\n- Native bridge\n\n📄 `hooks/useSpeechRecognition.ts`\n- Start/stop listening\n- Event handlers\n- Permission request\n\n📄 `app/(tabs)/chat.tsx`\n- Voice states\n- UI integration\n- Handlers",
"x": 500,
"y": 1800,
"connections": []
},
{
"id": "voices-ref",
"type": "card",
"title": "VOICES: Piper Models",
"borderColor": "gray",
"tags": ["Reference"],
"description": "**Available neural voices:**\n\n| Voice | Gender | Accent | Quality |\n|-------|--------|--------|--------|\n| Lessac | Female | US | Natural |\n| Ryan | Male | US | Natural |\n| Alba | Female | UK | Clear |\n\n**Model size:** ~20MB each\n\n**Audio:** 22kHz mono\n\n**Location:**\n```\nassets/tts-models/\n vits-piper-en_US-lessac-medium/\n vits-piper-en_US-ryan-medium/\n vits-piper-en_GB-alba-medium/\n```",
"x": 900,
"y": 1800,
"connections": []
},
{
"id": "build-ref",
"type": "card",
"title": "BUILD REQUIREMENTS",
"borderColor": "orange",
"tags": ["Reference"],
"description": "**Native build required!**\n\n⚠ Will NOT work in Expo Go\n\n**Steps:**\n1. `npm install`\n2. `npx expo prebuild --clean`\n3. `npx expo run:ios`\n4. Test on simulator/device\n\n**iOS:** Native modules bridged\n**Android:** JNI/Kotlin bindings\n\n**Permissions:**\n- iOS: `NSMicrophoneUsageDescription`\n- Android: `RECORD_AUDIO`",
"x": 1300,
"y": 1800,
"connections": []
}
]
}