Speech-to-Text Input
Feature Detail
Description
Speech-to-Text Input allows peer mentors to dictate free-text fields — such as activity summaries and notes — rather than typing them. A microphone widget appears inline within the activity wizard and any text-input field that opts in. The user taps to start recording, speaks their notes, and the recognized text is inserted into the field for review and editing before submission. Recording is explicitly designed for post-conversation note-taking, not ambient recording during peer mentor interactions, in line with the strong organizational preference against live recording expressed by Blindeforbundet.
Analysis
Both Blindeforbundet and HLF explicitly requested speech-to-text for report writing. For visually impaired users and those with motor difficulties, dictation can mean the difference between completing a registration independently and needing assistance. Reducing typing load also lowers cognitive effort for all users, supporting the overarching WCAG and accessibility goals. From an engagement standpoint, removing friction from the summary field — the most free-form and effortful part of the wizard — is likely to improve the quality and completeness of submitted notes, which directly benefits coordinator oversight and Bufdir documentation quality.
Implemented using Flutter's speech_to_text package which delegates to the platform's native recognition engine (SpeechRecognition on Android, SFSpeechRecognizer on iOS) avoiding third-party API costs and data-residency concerns for sensitive health conversations. The Voice Recording Control widget manages microphone permission requests via permission_handler, displays animated waveform feedback during recognition, and handles error states (no permission, no speech detected, recognition timeout) with accessible error messages. Audio is processed entirely on-device — no audio data is transmitted to external servers. The Speech Input Widget is a composable overlay that injects recognized text into any Flutter TextEditingController, making it reusable across all text fields in the app with a single widget wrapper.
Dependencies
Definition of Done
Components (6)
User Stories (8)
As a As a Peer Mentor (Likeperson)
I want I want speech dictation to be available in any text field across the app that has opted in to the Speech Input Widget
So that So that I can dictate not just the activity summary but also contact notes, event descriptions, and other free-text areas without switching to a separate tool
- Given a text field has been wrapped with the Speech Input Widget, when the peer mentor views the field, then a microphone icon is visible adjacent to the field
- Given multiple text fields on the same screen are wrapped with Speech Input Widget, when the peer mentor taps the mic on one field, then only that field's session is activated and the others remain inactive
- Given a recognition session completes on a wrapped field, when the transcript is inserted, then the text is injected into the correct TextEditingController for that field
- +2 more
As a As a Peer Mentor (Likeperson)
I want I want to configure my speech input settings — including preferred recognition language and whether speech input is globally enabled — and have these settings persist across app sessions
So that So that the feature works in my preferred language and I can disable it entirely if I prefer typing or if I am in an environment where speaking aloud is not practical
- Given the peer mentor navigates to Settings, when they select Speech Input, then a settings panel is shown with locale, noise gate, and enable/disable options
- Given the peer mentor selects a different recognition locale, when they start a new recording session, then the Speech Recognition Service uses the selected locale for recognition
- Given the peer mentor disables speech input globally, when they view text fields that previously showed a microphone icon, then no microphone icon is displayed
- +2 more
As a As a Peer Mentor (Likeperson)
I want I want to explicitly stop a recording when I am done speaking or cancel it without inserting text
So that So that I have full control over when recognition ends and can discard an unsuccessful attempt without polluting the text field
- Given a recording session is active, when the peer mentor taps Stop, then the session finalizes, the last recognized transcript is inserted into the text field, and the recording UI closes
- Given a recording session is active, when the peer mentor taps Cancel, then the session is aborted, no text is inserted or modified, and the field retains its previous content
- Given a recording session is active and the peer mentor stops speaking, when silence continues for the configured timeout (e.g., 3 seconds), then the session auto-stops and inserts the recognized transcript
- +2 more
As a As a Peer Mentor (Likeperson)
I want I want the app to request microphone permission the first time I try to use speech input, with a clear explanation of why it is needed
So that So that I understand what I am consenting to and can grant permission confidently without feeling surveilled, and so the feature activates without friction on subsequent uses
- Given the peer mentor has never used speech input, when they tap the microphone icon for the first time, then a rationale dialog appears before the system permission prompt explaining the dictation-only use case
- Given the rationale dialog is shown, when the peer mentor dismisses it and the system prompt appears, then they can grant or deny microphone access
- Given the peer mentor grants microphone permission, when they tap the mic icon on the same or future sessions, then recording begins immediately without repeating the permission flow
- +2 more
As a As a Peer Mentor (Likeperson)
I want I want to see an animated waveform and a clear recording indicator while I am speaking
So that So that I know the app is actively listening, can tell when my speech is being detected versus silence, and can judge when to stop speaking
- Given a recording session is active, when the peer mentor is speaking, then the waveform animates with amplitude proportional to input volume
- Given a recording session is active, when the environment is quiet or the mentor is not speaking, then the waveform animation reduces or pauses indicating silence detection
- Given the recording indicator is displayed, when a screen reader is active, then the widget announces 'Recording in progress' or equivalent accessible label
- +2 more
As a As a Peer Mentor (Likeperson)
I want I want the recognized text to be inserted into the text field as editable content so I can correct any recognition errors before saving
So that So that I can submit accurate activity notes even when the speech engine mishears a word, maintaining the quality of documentation without needing to retype everything from scratch
- Given a recording session completes successfully, when the transcript is inserted, then the text field displays the recognized text as editable content with the cursor at the end
- Given the transcript has been inserted, when the peer mentor taps any word in the field, then the cursor moves to that position for correction
- Given the peer mentor edits the transcript, when they submit the activity form, then the edited (not original) transcript is saved as the activity summary
- +2 more
As a As a Peer Mentor (Likeperson)
I want I want to receive clear, accessible error messages when speech recognition fails
So that So that I understand what went wrong and know how to recover — whether that means retrying, adjusting my environment, or falling back to typing
- Given the peer mentor taps the mic but does not speak within the timeout window, when the session expires, then a 'No speech detected' message is shown with a retry prompt
- Given microphone permission was revoked in device Settings between sessions, when the peer mentor taps the mic, then a 'Microphone access required' message is shown with a link to device Settings
- Given the native speech engine fails to initialize (e.g., locale not supported), when recording is attempted, then an accessible error message is shown and the field remains unchanged
- +2 more
As a As a Peer Mentor (Likeperson)
I want I want to tap a microphone icon next to the activity summary field and speak my notes aloud
So that So that I can complete the free-text summary after a peer support visit without typing on a small phone screen, reducing effort and increasing the likelihood I will submit a complete report
- Given the peer mentor is on the activity summary step of the wizard, when they tap the microphone icon, then the Voice Recording Control activates and a recording session begins with clear visual indication
- Given a recording session is active, when the peer mentor speaks aloud, then the Speech Recognition Service processes audio via the native engine and produces a recognized transcript
- Given recognition produces a transcript, when the session ends, then the full transcript is inserted into the activity summary text field replacing any placeholder text
- +2 more