Development BRD — Visual Asset Similarity Search Utility

Relationship to research BRD: Visual_Asset_Similarity_Search_BRD.md defines the broad problem space and research plan. This document narrows the scope to a concrete, buildable utility and provides enough specification for a developer to implement it.

1) Purpose & Scope

What this utility does

A web-based tool that lets a user upload an image and find the most visually similar images from a pre-indexed repository. Similarity is decomposed into four independent dimensions — Color, Shape, Composition, and Style — that the user can select in any combination to control the search.

What this document covers

The end-to-end user workflow (upload → configure → search → review)
Functional requirements for the search interface, results display, and indexing pipeline
Stack-agnostic system architecture and API contracts
Performance and scale targets

What this document does NOT cover

See Section 7 — Out of Scope.

2) User Workflow

┌──────────────────────────────────────────────────────────────┐
│  1. Upload Image                                             │
│     User drags, pastes, or file-picks a single image.        │
│                                                              │
│  2. Select Similarity Dimensions  (optional)                 │
│     Multi-select dropdown: Color | Shape | Composition |     │
│     Style.  Default = all four selected.                     │
│                                                              │
│  3. Click "Search"                                           │
│     System generates query embedding(s) for the selected     │
│     dimensions, queries the vector index, and computes       │
│     per-dimension + overall scores.                          │
│                                                              │
│  4. Review Results                                           │
│     Grid of matching assets with thumbnail previews.         │
│     When >1 dimension is selected, a sortable table view     │
│     shows individual dimension scores and an overall score.  │
│     User can click any result to see a larger preview        │
│     and metadata.                                            │
└──────────────────────────────────────────────────────────────┘

Step-by-step detail

Step	User action	System behavior
1 — Upload	Drag-drop, paste, or file-pick an image (PNG, JPG, WEBP, SVG, or PDF rendered to raster).	Validate file type and size. Display a thumbnail preview of the uploaded image.
2 — Configure	Optionally open the similarity dropdown and select/deselect dimensions (Color, Shape, Composition, Style).	Update the UI to reflect the active dimensions. Default state: all four selected.
3 — Search	Click the "Search" button.	Generate embedding(s) for the query image across the selected dimensions. Execute ANN query against the index. Compute per-dimension similarity scores and an overall (combined) score. Return ranked results.
4 — Results	Browse the results grid. Sort by any score column. Click a result for detail view.	Display results as a grid of thumbnail previews. Show sortable score columns when multiple dimensions are active. Provide a detail panel with full-size preview and metadata.

3) Functional Requirements

3.1 Search Interface

ID	Requirement	Details
SI-1	Image upload	Accept a single image via drag-drop, clipboard paste, or file picker. Supported formats: PNG, JPG/JPEG, WEBP, SVG, PDF (first page). Max file size: 20 MB.
SI-2	Upload preview	Display a thumbnail of the uploaded image before search is executed.
SI-3	Similarity dimension selector	Multi-select dropdown with options: Color, Shape, Composition, Style. Default: all four selected. At least one must be selected to search.
SI-4	Search action	A "Search" button that submits the query. Disabled until an image is uploaded and at least one dimension is selected.
SI-5	Loading state	Show a progress indicator while the search is in progress.

3.2 Similarity Dimensions

Each dimension captures a distinct aspect of visual similarity. The implementation must produce an independent score (0–1, where 1 = identical) for each dimension.

Dimension	Definition	Examples of what it captures
Color	Similarity of color palette, distribution, and dominant hues.	Two icons that both use the same blue-and-white palette score high even if their shapes differ.
Shape	Similarity of geometric forms, contours, and silhouettes.	A circle-based logo and another circle-based logo score high regardless of color.
Composition	Similarity of spatial layout — how elements are arranged within the frame.	Two images with a centered subject over a bottom bar score high even if the subjects differ.
Style	Similarity of visual treatment — line weight, shading, gradients, flat vs. skeuomorphic, corner radii, texture.	Two flat-design icons with thin strokes score high; a flat icon vs. a 3D-rendered icon scores low.

3.3 Results Display

ID	Requirement	Details
RD-1	Results grid	Display matching assets as a grid of thumbnail previews, ordered by overall similarity score (descending).
RD-2	Score columns	When more than one similarity dimension is selected, display a sortable table/column for each active dimension's score plus an Overall score column. Clicking a column header sorts results by that score.
RD-3	Overall score	Computed as the mean of the active dimension scores (equal weighting).
RD-4	Result count	Display the total number of results. Return a maximum of 100 results per query.
RD-5	Detail view	Clicking a result opens a detail panel showing: full-size preview, all dimension scores, and available metadata (filename, file path, file type, image dimensions, indexed timestamp).
RD-6	No-results state	If no results meet the minimum similarity threshold, display a clear "No similar assets found" message.

3.4 Indexing Pipeline

ID	Requirement	Details
IX-1	Batch folder scan	Accept a root directory path. Recursively scan for image files (PNG, JPG/JPEG, WEBP, SVG, PDF).
IX-2	Embedding generation	For each discovered image, generate embeddings for all four similarity dimensions and store them in the vector index along with file metadata.
IX-3	Idempotent re-indexing	If an image has already been indexed and its content has not changed (e.g., same file hash), skip re-processing. If content has changed, update the index entry.
IX-4	Scale target	Index 10,000 images within hours (not days) on commodity hardware. The pipeline should support parallelism/batching to meet this target.
IX-5	Progress reporting	Expose progress metrics during indexing: total files found, files processed, files skipped, errors encountered.
IX-6	Error handling	Log and skip unreadable/corrupt files without halting the batch. Produce a summary report at completion listing any failures.

4) System Architecture (Stack-Agnostic)

The system is composed of five logical components. This section defines their responsibilities and interfaces without prescribing specific technologies.

┌─────────────┐       ┌─────────────────┐       ┌────────────────────┐
│  Frontend    │──────▶│  Backend API    │──────▶│  Embedding Engine  │
│  (Web UI)    │◀──────│                 │◀──────│                    │
└─────────────┘       └────────┬────────┘       └────────────────────┘
                               │
                               ▼
                      ┌─────────────────┐
                      │  Vector Index /  │
                      │  Storage         │
                      └─────────────────┘
                               ▲
                               │
                      ┌─────────────────┐
                      │  Indexing        │
                      │  Pipeline        │
                      └─────────────────┘

4.1 Frontend (Web UI)

Responsibility: Provide the browser-based interface for uploading images, configuring search parameters, and viewing results.

Single-page application served over HTTPS.
Communicates with the Backend API over REST (or equivalent).
Handles image upload, dimension selection, result rendering, and sorting.
No business logic — all scoring and retrieval logic lives server-side.

4.2 Backend API

Responsibility: Orchestrate search and indexing requests. Acts as the single entry point for the frontend.

Receives search requests (image + selected dimensions).
Delegates embedding generation to the Embedding Engine.
Queries the Vector Index for nearest neighbors per dimension.
Computes per-dimension and overall similarity scores.
Returns ranked, scored results to the frontend.
Exposes indexing endpoints (trigger scan, check status).

4.3 Embedding Engine

Responsibility: Convert an image into one or more embedding vectors, one per similarity dimension.

Accepts a raw image (or pre-processed tensor) and a list of requested dimensions.
Returns a map of { dimension → embedding vector }.
May use a single model with dimension-specific projections, or separate models per dimension — this is an implementation decision.
Must be callable both from the search path (single image, low latency) and the indexing path (batch, high throughput).

4.4 Vector Index / Storage

Responsibility: Persist embedding vectors and support fast approximate nearest-neighbor (ANN) retrieval.

Stores one embedding vector per dimension per image (i.e., up to 4 vectors per indexed image).
Supports ANN queries filtered by dimension.
Stores associated metadata alongside each indexed image (filename, path, file type, image dimensions, file hash, indexed timestamp).
Supports insert, update (by image ID), and delete operations.

4.5 Indexing Pipeline

Responsibility: Scan a directory of images, generate embeddings, and populate the vector index.

Runs as a batch/background process triggered via the Backend API or CLI.
Walks the target directory recursively.
For each eligible image: compute file hash → check for existing index entry → generate embeddings if new/changed → upsert into vector index.
Reports progress and errors.

5) API Contracts

Stack-agnostic interface definitions. Implementations may use REST, gRPC, or any protocol that satisfies these contracts.

5.1 Search

Request: Search

{
  "image": <binary image data or base64-encoded string>,
  "dimensions": ["color", "shape", "composition", "style"],  // 1–4 selected
  "max_results": 50           // optional, default 50, max 100
}

Response: SearchResults

{
  "query_image_preview": "<url or data URI of uploaded image thumbnail>",
  "dimensions_searched": ["color", "shape", "composition", "style"],
  "total_results": 42,
  "results": [
    {
      "image_id": "abc-123",
      "filename": "logo_v2.png",
      "file_path": "/assets/logos/logo_v2.png",
      "file_type": "png",
      "image_width": 512,
      "image_height": 512,
      "thumbnail_url": "<url or data URI>",
      "scores": {
        "color": 0.92,
        "shape": 0.78,
        "composition": 0.85,
        "style": 0.64
      },
      "overall_score": 0.7975,
      "indexed_at": "2026-01-15T10:30:00Z"
    }
    // ... more results
  ]
}

Score semantics:

Each dimension score is a float in [0.0, 1.0] where 1.0 = maximum similarity.
overall_score = arithmetic mean of the active dimension scores.
Results are ordered by overall_score descending by default.
Only dimensions included in the request appear in the scores map.

5.2 Indexing — Trigger

Request: StartIndexing

{
  "directory_path": "/path/to/asset/folder",
  "recursive": true             // optional, default true
}

Response: IndexingJob

{
  "job_id": "job-456",
  "status": "running",
  "directory_path": "/path/to/asset/folder",
  "started_at": "2026-02-10T14:00:00Z"
}

5.3 Indexing — Status

Request: GetIndexingStatus

{
  "job_id": "job-456"
}

Response: IndexingStatus

{
  "job_id": "job-456",
  "status": "running" | "completed" | "failed",
  "total_files_found": 10000,
  "files_processed": 6500,
  "files_skipped": 120,
  "errors": 3,
  "started_at": "2026-02-10T14:00:00Z",
  "completed_at": null,           // null while running
  "error_details": [
    {
      "file_path": "/path/to/corrupt.png",
      "error": "Unable to decode image"
    }
  ]
}

6) Non-Functional Requirements

Category	Requirement	Target
Search latency	Time from "Search" click to first results rendered.	< 3 seconds for a 10K-image index.
Indexing throughput	Time to index a folder of images.	10,000 images in < 4 hours on commodity hardware (4-core CPU, 16 GB RAM, SSD).
Concurrent users	Number of simultaneous search users supported.	At least 5 concurrent searches without degradation.
Image size handling	Maximum input image dimensions and file size.	Up to 20 MB file size; images larger than 2048px on the longest edge are resized before embedding.
Index size	Number of images the index supports without architectural changes.	Up to 100,000 images.
Availability	Uptime target for the search service.	Best-effort (internal tool); graceful error messaging when service is unavailable.
Browser support	Supported browsers for the frontend.	Latest versions of Chrome, Firefox, Edge, and Safari.

7) Out of Scope

The following are explicitly deferred and are not part of this development phase. They may be addressed in future iterations (see the research BRD's Phase 2/3).

Item	Why deferred
Access control / ACL-filtered results	Adds significant complexity; not needed for an internal prototype.
User authentication	Same as above; prototype operates as a single-user or trusted-network tool.
Region/crop-based search	Valuable but adds UX and algorithmic complexity; better suited for Phase 2.
"More like this" refinement	Requires session state and re-query logic; deferred to Phase 2.
Metadata / keyword search	This BRD covers image-only similarity; hybrid search is a Phase 2 concern.
Continuous / real-time indexing	Batch indexing is sufficient for the initial version.
Feedback loop ("good match / bad match")	Requires a data collection pipeline and model retraining strategy; Phase 3.
Usage analytics and monitoring dashboards	Phase 3 concern.
Multi-user collaboration features	Not needed for prototype.
Deployment automation (CI/CD, containerization)	Operational concern, not a functional requirement for the prototype.

8) Glossary

Term	Definition
ANN	Approximate Nearest Neighbor — an algorithm that finds vectors close to a query vector in sub-linear time.
Embedding	A fixed-length numeric vector that represents an image's visual properties in a high-dimensional space.
Dimension (similarity)	One of the four independent axes of visual similarity defined in this BRD: Color, Shape, Composition, Style.
Overall score	The arithmetic mean of the active similarity dimension scores for a query–result pair.
Vector index	A data structure optimized for storing and querying embedding vectors via ANN search.