Add local Whisper speech input

This commit is contained in:
Ismail Ali
2026-04-28 16:35:59 +02:00
parent ec69117e6f
commit d54aae7bac
10 changed files with 614 additions and 7 deletions

View File

@@ -109,6 +109,26 @@ npm run start
## Speech Input And Playback
Speech input can run through local Whisper on the laptop. The iPhone records audio, sends it to the local STT server, Whisper transcribes it to English text, and the app sends that text to Ollama automatically.
Start the STT server in a third terminal:
```bash
npm run stt:start
```
For Expo Go on iPhone, `.env` must point to the laptop IP:
```text
EXPO_PUBLIC_STT_BASE_URL=http://192.168.10.33:3334
```
The default local Whisper model is `tiny.en`. It is downloaded on first use and then runs locally without API costs. You can change it with:
```powershell
$env:STT_MODEL="base.en"; npm run stt:start
```
Playback uses a local MP3 TTS server on the laptop. AI replies are sent to the laptop, converted to an MP3 with a Microsoft neural English voice, and then played on the iPhone. This avoids the robotic iPhone system voice.
Start the TTS server in a second terminal: