Speech-to-Text Input

medium complexity high priority should have v1.1 extracted Activity Registration Confidence: 100%

Components

Shared

User Stories

Yes

Analyzed

Description

Speech-to-Text Input allows peer mentors to dictate free-text fields — such as activity summaries and notes — rather than typing them. A microphone widget appears inline within the activity wizard and any text-input field that opts in. The user taps to start recording, speaks their notes, and the recognized text is inserted into the field for review and editing before submission. Recording is explicitly designed for post-conversation note-taking, not ambient recording during peer mentor interactions, in line with the strong organizational preference against live recording expressed by Blindeforbundet.

Analysis

Business Value

Both Blindeforbundet and HLF explicitly requested speech-to-text for report writing. For visually impaired users and those with motor difficulties, dictation can mean the difference between completing a registration independently and needing assistance. Reducing typing load also lowers cognitive effort for all users, supporting the overarching WCAG and accessibility goals. From an engagement standpoint, removing friction from the summary field — the most free-form and effortful part of the wizard — is likely to improve the quality and completeness of submitted notes, which directly benefits coordinator oversight and Bufdir documentation quality.

Implementation Notes

Implemented using Flutter's speech_to_text package which delegates to the platform's native recognition engine (SpeechRecognition on Android, SFSpeechRecognizer on iOS) avoiding third-party API costs and data-residency concerns for sensitive health conversations. The Voice Recording Control widget manages microphone permission requests via permission_handler, displays animated waveform feedback during recognition, and handles error states (no permission, no speech detected, recognition timeout) with accessible error messages. Audio is processed entirely on-device — no audio data is transmitted to external servers. The Speech Input Widget is a composable overlay that injects recognized text into any Flutter TextEditingController, making it reusable across all text fields in the app with a single widget wrapper.

Dependencies

activity-logging-wizarddesign-token-systemsupabase-backend-core

Definition of Done

Microphone widget available in activity summary field and free-text note fields

Recognized text inserted at cursor position, editable before save

Explicit start/stop control — no continuous ambient recording

Permission request follows platform guidelines with clear rationale copy

Works offline (on-device recognition only, no network dependency)

Accessible: widget has semantic label, activation announces state via screen reader

Norwegian language recognition accuracy validated with test phrases from org workshops

Components (6)

User Stories (8)

Use Speech Input Across All Opted-In Text Fields

medium 5 pts

As a As a Peer Mentor (Likeperson)

I want I want speech dictation to be available in any text field across the app that has opted in to the Speech Input Widget

So that So that I can dictate not just the activity summary but also contact notes, event descriptions, and other free-text areas without switching to a separate tool

Acceptance Criteria

Given a text field has been wrapped with the Speech Input Widget, when the peer mentor views the field, then a microphone icon is visible adjacent to the field
Given multiple text fields on the same screen are wrapped with Speech Input Widget, when the peer mentor taps the mic on one field, then only that field's session is activated and the others remain inactive
Given a recognition session completes on a wrapped field, when the transcript is inserted, then the text is injected into the correct TextEditingController for that field
+2 more

View Full Story →

Configure Speech Input Preferences

low 3 pts

As a As a Peer Mentor (Likeperson)

I want I want to configure my speech input settings — including preferred recognition language and whether speech input is globally enabled — and have these settings persist across app sessions

So that So that the feature works in my preferred language and I can disable it entirely if I prefer typing or if I am in an environment where speaking aloud is not practical

Acceptance Criteria

Given the peer mentor navigates to Settings, when they select Speech Input, then a settings panel is shown with locale, noise gate, and enable/disable options
Given the peer mentor selects a different recognition locale, when they start a new recording session, then the Speech Recognition Service uses the selected locale for recognition
Given the peer mentor disables speech input globally, when they view text fields that previously showed a microphone icon, then no microphone icon is displayed
+2 more

View Full Story →

Stop or Cancel a Recording Session

high 2 pts

As a As a Peer Mentor (Likeperson)

I want I want to explicitly stop a recording when I am done speaking or cancel it without inserting text

So that So that I have full control over when recognition ends and can discard an unsuccessful attempt without polluting the text field

Acceptance Criteria

Given a recording session is active, when the peer mentor taps Stop, then the session finalizes, the last recognized transcript is inserted into the text field, and the recording UI closes
Given a recording session is active, when the peer mentor taps Cancel, then the session is aborted, no text is inserted or modified, and the field retains its previous content
Given a recording session is active and the peer mentor stops speaking, when silence continues for the configured timeout (e.g., 3 seconds), then the session auto-stops and inserts the recognized transcript
+2 more

View Full Story →

Grant Microphone Permission on First Use

high 3 pts

As a As a Peer Mentor (Likeperson)

I want I want the app to request microphone permission the first time I try to use speech input, with a clear explanation of why it is needed

So that So that I understand what I am consenting to and can grant permission confidently without feeling surveilled, and so the feature activates without friction on subsequent uses

Acceptance Criteria

Given the peer mentor has never used speech input, when they tap the microphone icon for the first time, then a rationale dialog appears before the system permission prompt explaining the dictation-only use case
Given the rationale dialog is shown, when the peer mentor dismisses it and the system prompt appears, then they can grant or deny microphone access
Given the peer mentor grants microphone permission, when they tap the mic icon on the same or future sessions, then recording begins immediately without repeating the permission flow
+2 more

View Full Story →

See Animated Waveform Feedback During Recording

high 3 pts

As a As a Peer Mentor (Likeperson)

I want I want to see an animated waveform and a clear recording indicator while I am speaking

So that So that I know the app is actively listening, can tell when my speech is being detected versus silence, and can judge when to stop speaking

Acceptance Criteria

Given a recording session is active, when the peer mentor is speaking, then the waveform animates with amplitude proportional to input volume
Given a recording session is active, when the environment is quiet or the mentor is not speaking, then the waveform animation reduces or pauses indicating silence detection
Given the recording indicator is displayed, when a screen reader is active, then the widget announces 'Recording in progress' or equivalent accessible label
+2 more

View Full Story →

Review and Edit Recognized Transcript Before Saving

high 3 pts

As a As a Peer Mentor (Likeperson)

I want I want the recognized text to be inserted into the text field as editable content so I can correct any recognition errors before saving

So that So that I can submit accurate activity notes even when the speech engine mishears a word, maintaining the quality of documentation without needing to retype everything from scratch

Acceptance Criteria

Given a recording session completes successfully, when the transcript is inserted, then the text field displays the recognized text as editable content with the cursor at the end
Given the transcript has been inserted, when the peer mentor taps any word in the field, then the cursor moves to that position for correction
Given the peer mentor edits the transcript, when they submit the activity form, then the edited (not original) transcript is saved as the activity summary
+2 more

View Full Story →

Receive Accessible Error Messages for Recognition Failures

high 5 pts

As a As a Peer Mentor (Likeperson)

I want I want to receive clear, accessible error messages when speech recognition fails

So that So that I understand what went wrong and know how to recover — whether that means retrying, adjusting my environment, or falling back to typing

Acceptance Criteria

Given the peer mentor taps the mic but does not speak within the timeout window, when the session expires, then a 'No speech detected' message is shown with a retry prompt
Given microphone permission was revoked in device Settings between sessions, when the peer mentor taps the mic, then a 'Microphone access required' message is shown with a link to device Settings
Given the native speech engine fails to initialize (e.g., locale not supported), when recording is attempted, then an accessible error message is shown and the field remains unchanged
+2 more

View Full Story →

Dictate Activity Summary Using Speech

critical 8 pts

As a As a Peer Mentor (Likeperson)

I want I want to tap a microphone icon next to the activity summary field and speak my notes aloud

So that So that I can complete the free-text summary after a peer support visit without typing on a small phone screen, reducing effort and increasing the likelihood I will submit a complete report

Acceptance Criteria

Given the peer mentor is on the activity summary step of the wizard, when they tap the microphone icon, then the Voice Recording Control activates and a recording session begins with clear visual indication
Given a recording session is active, when the peer mentor speaks aloud, then the Speech Recognition Service processes audio via the native engine and produces a recognized transcript
Given recognition produces a transcript, when the session ends, then the full transcript is inserted into the activity summary text field replacing any placeholder text
+2 more

View Full Story →

Description

Analysis

Dependencies

Definition of Done

Components (6)

User Interface (2)

Service Layer (2)

Data Layer (1)

Infrastructure (1)

User Stories (8)