HIGH story-duplicate-activity-detection-coordinator-004 5 pts
5
Story Points
High
Priority
Duplicate Activity Detection
Feature

User Story

As a Coordinator
I want each individual activity record in a bulk registration batch to be checked for duplicates before the batch is committed to the database
So that So that bulk or proxy registrations — which carry a higher risk of overlap with activities already submitted by peer mentors — do not silently introduce duplicate records into the system

Acceptance Criteria

  • Given a coordinator submits a bulk registration batch, when the batch is processed, then each record is individually evaluated by the Duplicate Detection Service before any record is committed
  • Given a bulk batch contains one or more records with confidence scores above the duplicate threshold, when detection completes, then the coordinator is shown the flagged records with details of each conflicting existing activity
  • Given flagged records are shown, when the coordinator reviews them, then the coordinator can selectively exclude specific flagged records from the batch while allowing the rest to proceed
  • Given flagged records are shown, when the coordinator chooses to override for a specific record, then that record is saved with an audit marker indicating coordinator-confirmed override in a bulk context
  • Given a bulk batch where no records exceed the duplicate threshold, when detection completes, then the entire batch is committed without interruption
  • Given a bulk batch of N records, when duplicate checking runs, then all N records are checked before the first record is saved — partial commits do not occur before all checks complete

Business Value

Bulk and proxy registration workflows are specifically identified as high-risk for duplication because coordinators registering on behalf of peer mentors may be unaware of what the peer mentor has already submitted directly. A per-record check in bulk flows prevents silent data corruption at scale, where the impact on Bufdir statistics and grant calculations would be proportionally larger than a single individual duplicate.