TTS Model (Piper VITS): - MODEL_CARD: Voice model information - tokens.txt: Phoneme tokenization - onnx.json: Model configuration - Model: en_US-lessac-medium (60MB ONNX - not in git) Documentation: - APP_REVIEW_NOTES.txt: App Store review notes - specs/: Feature specifications - plugins/: Expo config plugins .gitignore updates: - Exclude large ONNX models (60MB+) - Exclude espeak-ng-data (phoneme data) - Exclude credentials.json - Exclude store-screenshots/ Note: TTS models must be downloaded separately. See specs/ for setup instructions.
379 lines
16 KiB
JSON
379 lines
16 KiB
JSON
{
|
||
"elements": [
|
||
{
|
||
"id": "legend",
|
||
"type": "card",
|
||
"title": "LEGEND: Voice Integration",
|
||
"borderColor": "gray",
|
||
"tags": ["Reference"],
|
||
"description": "**Color Coding:**\n\n🔴 `red` = User Action (tap, speak)\n🔵 `blue` = App Logic / Screen\n🟣 `purple` = Native Module\n🟢 `green` = External Service (AI API)\n🟠 `orange` = Warning / Edge Case\n⚫ `gray` = Reference\n\n**States:**\n- `isListening` - Microphone active\n- `isSpeaking` - TTS playing\n- `ttsInitialized` - TTS ready\n- `recognizedText` - Speech transcript",
|
||
"x": 50,
|
||
"y": 50,
|
||
"connections": []
|
||
},
|
||
{
|
||
"id": "step-001",
|
||
"type": "card",
|
||
"title": "Chat Screen",
|
||
"borderColor": "blue",
|
||
"tags": ["Screen"],
|
||
"description": "**User sees:**\n- Message list\n- Input field\n- 🎤 Microphone button\n- Send button\n\n**Initial state:**\n```\nisListening: false\nisSpeaking: false\nttsInitialized: false\n```\n\n**On mount:** Initialize TTS",
|
||
"x": 100,
|
||
"y": 200,
|
||
"connections": [
|
||
{ "to": "step-002" },
|
||
{ "to": "step-010" }
|
||
]
|
||
},
|
||
{
|
||
"id": "step-002",
|
||
"type": "card",
|
||
"title": "App: Initialize TTS",
|
||
"borderColor": "blue",
|
||
"tags": ["App"],
|
||
"description": "**useEffect on mount:**\n```javascript\nconst initTTS = async () => {\n const success = await \n sherpaTTS.initialize();\n setTtsInitialized(success);\n};\ninitTTS();\n```\n\n**Cleanup on unmount:**\n```javascript\nsherpaTTS.deinitialize();\n```",
|
||
"x": 500,
|
||
"y": 200,
|
||
"connections": [
|
||
{ "to": "step-003" },
|
||
{ "to": "step-004" }
|
||
]
|
||
},
|
||
{
|
||
"id": "step-003",
|
||
"type": "card",
|
||
"title": "SherpaTTS: Load Model",
|
||
"borderColor": "purple",
|
||
"tags": ["Native"],
|
||
"description": "**Native Module: TTSManager**\n\n1. Load Piper ONNX model\n2. Load tokens.txt\n3. Initialize espeak-ng-data\n\n**Model paths (iOS):**\n```\nassets/tts-models/\n vits-piper-en_US-lessac-medium/\n en_US-lessac-medium.onnx\n tokens.txt\n espeak-ng-data/\n```",
|
||
"x": 900,
|
||
"y": 200,
|
||
"connections": [
|
||
{ "to": "step-005" },
|
||
{ "to": "err-001" }
|
||
]
|
||
},
|
||
{
|
||
"id": "step-004",
|
||
"type": "card",
|
||
"title": "Fallback: expo-speech",
|
||
"borderColor": "orange",
|
||
"tags": ["App", "Fallback"],
|
||
"description": "**When SherpaTTS unavailable:**\n- Expo Go mode (no native)\n- Model files missing\n- Device not supported\n\n**Fallback:**\n```javascript\nif (!sherpaTTS.isAvailable()) {\n ExpoSpeech.speak(text, {\n language: 'en-US',\n rate: 0.9\n });\n}\n```",
|
||
"x": 500,
|
||
"y": 400,
|
||
"connections": []
|
||
},
|
||
{
|
||
"id": "step-005",
|
||
"type": "card",
|
||
"title": "TTS Ready",
|
||
"borderColor": "blue",
|
||
"tags": ["App"],
|
||
"description": "**State updated:**\n```\nttsInitialized: true\n```\n\n**Available voices:**\n| ID | Name | Gender |\n|----|------|--------|\n| lessac | Lessac | Female US |\n| ryan | Ryan | Male US |\n| alba | Alba | Female UK |",
|
||
"x": 900,
|
||
"y": 400,
|
||
"connections": []
|
||
},
|
||
{
|
||
"id": "err-001",
|
||
"type": "card",
|
||
"title": "ERROR: TTS Init Failed",
|
||
"borderColor": "red",
|
||
"tags": ["Error"],
|
||
"description": "**When:**\n- Native module missing\n- Model files not found\n- Memory allocation failed\n\n**App state:**\n```\nttsInitialized: false\nerror: 'Native module not available'\n```\n\n**Fallback:** Use expo-speech",
|
||
"x": 1300,
|
||
"y": 200,
|
||
"connections": [
|
||
{ "to": "step-004" }
|
||
]
|
||
},
|
||
{
|
||
"id": "step-010",
|
||
"type": "card",
|
||
"title": "User: Tap 🎤 Button",
|
||
"borderColor": "red",
|
||
"tags": ["User"],
|
||
"description": "**User taps microphone button**\n\nButton appearance:\n- Default: Outline mic icon\n- Active: Filled mic, primary color\n- Disabled: Grayed out (0.5 opacity)\n\n**Triggers:** `handleVoiceToggle()`",
|
||
"x": 100,
|
||
"y": 600,
|
||
"connections": [
|
||
{ "to": "step-011" }
|
||
]
|
||
},
|
||
{
|
||
"id": "step-011",
|
||
"type": "card",
|
||
"title": "App: handleVoiceToggle()",
|
||
"borderColor": "blue",
|
||
"tags": ["App"],
|
||
"description": "**Decision logic:**\n```javascript\nif (isListening) {\n stopListening();\n handleVoiceSend();\n} else {\n startListening();\n}\n```\n\n**Check availability:**\n```javascript\nif (!speechRecognitionAvailable) {\n Alert.alert('Not Available');\n return;\n}\n```",
|
||
"x": 500,
|
||
"y": 600,
|
||
"connections": [
|
||
{ "to": "step-012" },
|
||
{ "to": "step-020" },
|
||
{ "to": "err-002" }
|
||
]
|
||
},
|
||
{
|
||
"id": "err-002",
|
||
"type": "card",
|
||
"title": "ERROR: No Mic Permission",
|
||
"borderColor": "red",
|
||
"tags": ["Error"],
|
||
"description": "**When:**\n- User denied microphone access\n- Permission not requested\n\n**App shows:**\n```\nAlert: 'Microphone Access Required'\n\n'Please enable microphone access\nin Settings to use voice input.'\n```\n\n**Resolution:** Open Settings",
|
||
"x": 500,
|
||
"y": 800,
|
||
"connections": []
|
||
},
|
||
{
|
||
"id": "step-012",
|
||
"type": "card",
|
||
"title": "App: Start Listening",
|
||
"borderColor": "blue",
|
||
"tags": ["App"],
|
||
"description": "**Actions:**\n1. Reset `recognizedText`\n2. Start pulse animation\n3. Call native speech recognition\n\n```javascript\nsetRecognizedText('');\nAnimated.loop(\n Animated.sequence([...])\n).start();\nawait startListening();\n```",
|
||
"x": 900,
|
||
"y": 600,
|
||
"connections": [
|
||
{ "to": "step-013" }
|
||
]
|
||
},
|
||
{
|
||
"id": "step-013",
|
||
"type": "card",
|
||
"title": "expo-speech-recognition",
|
||
"borderColor": "purple",
|
||
"tags": ["Native"],
|
||
"description": "**Native Module: ExpoSpeechRecognition**\n\n```javascript\nExpoSpeechRecognitionModule.start({\n lang: 'en-US',\n interimResults: true,\n maxAlternatives: 1,\n continuous: false\n});\n```\n\n**Events:**\n- `start` → setIsListening(true)\n- `result` → setRecognizedText()\n- `end` → setIsListening(false)\n- `error` → handle error",
|
||
"x": 1300,
|
||
"y": 600,
|
||
"connections": [
|
||
{ "to": "step-014" }
|
||
]
|
||
},
|
||
{
|
||
"id": "step-014",
|
||
"type": "card",
|
||
"title": "UI: Listening State",
|
||
"borderColor": "blue",
|
||
"tags": ["Screen"],
|
||
"description": "**Visual indicators:**\n\n1. **Mic button:**\n - Background: Primary color\n - Pulsing animation (scale 1.0 → 1.2)\n\n2. **Status bar:**\n ```\n 🔵 Listening...\n ```\n\n3. **Input field:**\n - Shows real-time transcript\n - Updates on each interim result",
|
||
"x": 1300,
|
||
"y": 800,
|
||
"connections": [
|
||
{ "to": "step-015" }
|
||
]
|
||
},
|
||
{
|
||
"id": "step-015",
|
||
"type": "card",
|
||
"title": "User: Speaking",
|
||
"borderColor": "red",
|
||
"tags": ["User"],
|
||
"description": "**User speaks into microphone**\n\n**Real-time transcript:**\n```\n\"Hello, how are you today?\"\n```\n\n**Interim results update:**\n- Partial words appear as spoken\n- Final result when silence detected\n\n**To stop:** Tap mic again OR stop speaking",
|
||
"x": 1300,
|
||
"y": 1000,
|
||
"connections": [
|
||
{ "to": "step-020" }
|
||
]
|
||
},
|
||
{
|
||
"id": "step-020",
|
||
"type": "card",
|
||
"title": "App: Stop & Send",
|
||
"borderColor": "blue",
|
||
"tags": ["App"],
|
||
"description": "**handleVoiceSend():**\n```javascript\nconst textToSend = \n recognizedText.trim();\n\nif (textToSend) {\n setInputText(textToSend);\n sendMessage(textToSend);\n setRecognizedText('');\n}\n```\n\n**Validation:**\n- Skip if empty transcript\n- Trim whitespace",
|
||
"x": 100,
|
||
"y": 1000,
|
||
"connections": [
|
||
{ "to": "step-021" },
|
||
{ "to": "err-003" }
|
||
]
|
||
},
|
||
{
|
||
"id": "err-003",
|
||
"type": "card",
|
||
"title": "WARNING: Empty Transcript",
|
||
"borderColor": "orange",
|
||
"tags": ["Warning"],
|
||
"description": "**When:**\n- User tapped mic but didn't speak\n- Background noise only\n- Recognition failed\n\n**Behavior:**\n- Don't send empty message\n- Return to idle state\n- No error shown to user",
|
||
"x": 100,
|
||
"y": 1200,
|
||
"connections": []
|
||
},
|
||
{
|
||
"id": "step-021",
|
||
"type": "card",
|
||
"title": "App: Send Message",
|
||
"borderColor": "blue",
|
||
"tags": ["App", "API"],
|
||
"description": "**Add user message to chat:**\n```javascript\nsetMessages(prev => [...prev, {\n role: 'user',\n content: textToSend\n}]);\n```\n\n**Call AI API:**\n```\nPOST /ai/stream\nBody: { messages, beneficiaryId }\n```",
|
||
"x": 500,
|
||
"y": 1000,
|
||
"connections": [
|
||
{ "to": "step-022" }
|
||
]
|
||
},
|
||
{
|
||
"id": "step-022",
|
||
"type": "card",
|
||
"title": "AI Backend: Process",
|
||
"borderColor": "green",
|
||
"tags": ["External", "API"],
|
||
"description": "**Server processes request:**\n\n1. Validate JWT token\n2. Get beneficiary context\n3. Call OpenAI/OpenRouter API\n4. Stream response chunks\n\n**Response:**\n```\ndata: {\"delta\":\"Hello\"}\ndata: {\"delta\":\"! How\"}\ndata: {\"delta\":\" can I\"}\ndata: {\"delta\":\" help?\"}\n[DONE]\n```",
|
||
"x": 900,
|
||
"y": 1000,
|
||
"connections": [
|
||
{ "to": "step-023" },
|
||
{ "to": "err-004" }
|
||
]
|
||
},
|
||
{
|
||
"id": "err-004",
|
||
"type": "card",
|
||
"title": "ERROR: AI API Failed",
|
||
"borderColor": "red",
|
||
"tags": ["Error"],
|
||
"description": "**When:**\n- Network error\n- API rate limit\n- Invalid token\n- Server error (500)\n\n**App shows:**\n```\n\"Sorry, I couldn't process your \nrequest. Please try again.\"\n```\n\n**TTS:** Speaks error message",
|
||
"x": 900,
|
||
"y": 1200,
|
||
"connections": []
|
||
},
|
||
{
|
||
"id": "step-023",
|
||
"type": "card",
|
||
"title": "App: Receive AI Response",
|
||
"borderColor": "blue",
|
||
"tags": ["App"],
|
||
"description": "**Stream handling:**\n```javascript\nfor await (const chunk of stream) {\n setMessages(prev => {\n // Append chunk to last message\n const updated = [...prev];\n updated[updated.length-1]\n .content += chunk;\n return updated;\n });\n}\n```\n\n**On complete:** Trigger TTS",
|
||
"x": 1300,
|
||
"y": 1000,
|
||
"connections": [
|
||
{ "to": "step-030" }
|
||
]
|
||
},
|
||
{
|
||
"id": "step-030",
|
||
"type": "card",
|
||
"title": "App: speakText(response)",
|
||
"borderColor": "blue",
|
||
"tags": ["App"],
|
||
"description": "**Auto-speak AI response:**\n```javascript\nconst speakText = async (text) => {\n if (!ttsInitialized) {\n // Fallback to expo-speech\n ExpoSpeech.speak(text);\n return;\n }\n \n setIsSpeaking(true);\n await sherpaTTS.speak(text, {\n speed: 1.0,\n onDone: () => setIsSpeaking(false)\n });\n};\n```",
|
||
"x": 100,
|
||
"y": 1400,
|
||
"connections": [
|
||
{ "to": "step-031" }
|
||
]
|
||
},
|
||
{
|
||
"id": "step-031",
|
||
"type": "card",
|
||
"title": "SherpaTTS: Generate Audio",
|
||
"borderColor": "purple",
|
||
"tags": ["Native"],
|
||
"description": "**Native TTS processing:**\n\n1. Text → phonemes (espeak-ng)\n2. Phonemes → audio (Piper VITS)\n3. Audio → device speaker\n\n**Parameters:**\n```javascript\nTTSManager.generateAndPlay(\n text,\n speakerId: 0,\n speed: 1.0\n);\n```\n\n**Model:** ~20MB neural network",
|
||
"x": 500,
|
||
"y": 1400,
|
||
"connections": [
|
||
{ "to": "step-032" }
|
||
]
|
||
},
|
||
{
|
||
"id": "step-032",
|
||
"type": "card",
|
||
"title": "UI: Speaking State",
|
||
"borderColor": "blue",
|
||
"tags": ["Screen"],
|
||
"description": "**Visual indicators:**\n\n1. **Status bar:**\n ```\n 🟢 Speaking... [⏹ Stop]\n ```\n\n2. **Stop button:**\n - Red stop circle icon\n - Tapping interrupts speech\n\n3. **Mic button:**\n - Disabled while speaking\n - Prevents overlap",
|
||
"x": 900,
|
||
"y": 1400,
|
||
"connections": [
|
||
{ "to": "step-033" },
|
||
{ "to": "step-040" }
|
||
]
|
||
},
|
||
{
|
||
"id": "step-033",
|
||
"type": "card",
|
||
"title": "TTS: Playback Complete",
|
||
"borderColor": "blue",
|
||
"tags": ["App"],
|
||
"description": "**On done callback:**\n```javascript\nonDone: () => {\n setIsSpeaking(false);\n}\n```\n\n**State reset:**\n```\nisSpeaking: false\n```\n\n**User can:**\n- Start new voice input\n- Type manually\n- Scroll chat history",
|
||
"x": 1300,
|
||
"y": 1400,
|
||
"connections": []
|
||
},
|
||
{
|
||
"id": "step-040",
|
||
"type": "card",
|
||
"title": "User: Tap Stop",
|
||
"borderColor": "red",
|
||
"tags": ["User"],
|
||
"description": "**User interrupts speech:**\n\nTaps stop button (⏹) to cancel TTS playback immediately.\n\n**Use cases:**\n- Response too long\n- User wants to ask follow-up\n- Wrong response",
|
||
"x": 900,
|
||
"y": 1600,
|
||
"connections": [
|
||
{ "to": "step-041" }
|
||
]
|
||
},
|
||
{
|
||
"id": "step-041",
|
||
"type": "card",
|
||
"title": "App: stopSpeaking()",
|
||
"borderColor": "blue",
|
||
"tags": ["App"],
|
||
"description": "**Stop playback:**\n```javascript\nconst stopSpeaking = () => {\n if (ttsInitialized) {\n sherpaTTS.stop();\n } else {\n ExpoSpeech.stop();\n }\n setIsSpeaking(false);\n};\n```\n\n**Immediate effect:**\n- Audio stops\n- UI returns to idle",
|
||
"x": 1300,
|
||
"y": 1600,
|
||
"connections": []
|
||
},
|
||
{
|
||
"id": "state-machine",
|
||
"type": "card",
|
||
"title": "STATE MACHINE: Voice",
|
||
"borderColor": "gray",
|
||
"tags": ["Reference"],
|
||
"description": "```\n ┌─────────────┐\n │ IDLE │\n │ isListening:│\n │ false │\n │ isSpeaking: │\n │ false │\n └──────┬──────┘\n │ tap mic\n ┌──────▼──────┐\n │ LISTENING │\n │ isListening:│\n │ true │\n │ (pulsing) │\n └──────┬──────┘\n │ stop/send\n ┌──────▼──────┐\n │ PROCESSING │\n │ isSending: │\n │ true │\n └──────┬──────┘\n │ AI responds\n ┌──────▼──────┐\n │ SPEAKING │\n │ isSpeaking: │\n │ true │\n └──────┬──────┘\n │ done/stop\n ┌──────▼──────┐\n │ IDLE │\n └─────────────┘\n```",
|
||
"x": 50,
|
||
"y": 1800,
|
||
"connections": []
|
||
},
|
||
{
|
||
"id": "files-ref",
|
||
"type": "card",
|
||
"title": "FILES: Voice Integration",
|
||
"borderColor": "gray",
|
||
"tags": ["Reference"],
|
||
"description": "**Modified files:**\n\n📄 `package.json`\n- expo-speech\n- expo-speech-recognition\n- react-native-sherpa-onnx-offline-tts\n\n📄 `services/sherpaTTS.ts`\n- Initialize, speak, stop\n- Voice selection\n- Native bridge\n\n📄 `hooks/useSpeechRecognition.ts`\n- Start/stop listening\n- Event handlers\n- Permission request\n\n📄 `app/(tabs)/chat.tsx`\n- Voice states\n- UI integration\n- Handlers",
|
||
"x": 500,
|
||
"y": 1800,
|
||
"connections": []
|
||
},
|
||
{
|
||
"id": "voices-ref",
|
||
"type": "card",
|
||
"title": "VOICES: Piper Models",
|
||
"borderColor": "gray",
|
||
"tags": ["Reference"],
|
||
"description": "**Available neural voices:**\n\n| Voice | Gender | Accent | Quality |\n|-------|--------|--------|--------|\n| Lessac | Female | US | Natural |\n| Ryan | Male | US | Natural |\n| Alba | Female | UK | Clear |\n\n**Model size:** ~20MB each\n\n**Audio:** 22kHz mono\n\n**Location:**\n```\nassets/tts-models/\n vits-piper-en_US-lessac-medium/\n vits-piper-en_US-ryan-medium/\n vits-piper-en_GB-alba-medium/\n```",
|
||
"x": 900,
|
||
"y": 1800,
|
||
"connections": []
|
||
},
|
||
{
|
||
"id": "build-ref",
|
||
"type": "card",
|
||
"title": "BUILD REQUIREMENTS",
|
||
"borderColor": "orange",
|
||
"tags": ["Reference"],
|
||
"description": "**Native build required!**\n\n⚠️ Will NOT work in Expo Go\n\n**Steps:**\n1. `npm install`\n2. `npx expo prebuild --clean`\n3. `npx expo run:ios`\n4. Test on simulator/device\n\n**iOS:** Native modules bridged\n**Android:** JNI/Kotlin bindings\n\n**Permissions:**\n- iOS: `NSMicrophoneUsageDescription`\n- Android: `RECORD_AUDIO`",
|
||
"x": 1300,
|
||
"y": 1800,
|
||
"connections": []
|
||
}
|
||
]
|
||
}
|