Add local Whisper speech input
This commit is contained in:
20
README.md
20
README.md
@@ -109,6 +109,26 @@ npm run start
|
||||
|
||||
## Speech Input And Playback
|
||||
|
||||
Speech input can run through local Whisper on the laptop. The iPhone records audio, sends it to the local STT server, Whisper transcribes it to English text, and the app sends that text to Ollama automatically.
|
||||
|
||||
Start the STT server in a third terminal:
|
||||
|
||||
```bash
|
||||
npm run stt:start
|
||||
```
|
||||
|
||||
For Expo Go on iPhone, `.env` must point to the laptop IP:
|
||||
|
||||
```text
|
||||
EXPO_PUBLIC_STT_BASE_URL=http://192.168.10.33:3334
|
||||
```
|
||||
|
||||
The default local Whisper model is `tiny.en`. It is downloaded on first use and then runs locally without API costs. You can change it with:
|
||||
|
||||
```powershell
|
||||
$env:STT_MODEL="base.en"; npm run stt:start
|
||||
```
|
||||
|
||||
Playback uses a local MP3 TTS server on the laptop. AI replies are sent to the laptop, converted to an MP3 with a Microsoft neural English voice, and then played on the iPhone. This avoids the robotic iPhone system voice.
|
||||
|
||||
Start the TTS server in a second terminal:
|
||||
|
||||
Reference in New Issue
Block a user