Development BRD — Visual Asset Similarity Search Utility

Relationship to research BRD: Visual_Asset_Similarity_Search_BRD.md defines the broad problem space and research plan. This document narrows the scope to a concrete, buildable utility and provides enough specification for a developer to implement it.


1) Purpose & Scope

What this utility does

A web-based tool that lets a user upload an image and find the most visually similar images from a pre-indexed repository. Similarity is decomposed into four independent dimensions — Color, Shape, Composition, and Style — that the user can select in any combination to control the search.

What this document covers

What this document does NOT cover

See Section 7 — Out of Scope.


2) User Workflow

┌──────────────────────────────────────────────────────────────┐
│  1. Upload Image                                             │
│     User drags, pastes, or file-picks a single image.        │
│                                                              │
│  2. Select Similarity Dimensions  (optional)                 │
│     Multi-select dropdown: Color | Shape | Composition |     │
│     Style.  Default = all four selected.                     │
│                                                              │
│  3. Click "Search"                                           │
│     System generates query embedding(s) for the selected     │
│     dimensions, queries the vector index, and computes       │
│     per-dimension + overall scores.                          │
│                                                              │
│  4. Review Results                                           │
│     Grid of matching assets with thumbnail previews.         │
│     When >1 dimension is selected, a sortable table view     │
│     shows individual dimension scores and an overall score.  │
│     User can click any result to see a larger preview        │
│     and metadata.                                            │
└──────────────────────────────────────────────────────────────┘

Step-by-step detail

Step User action System behavior
1 — Upload Drag-drop, paste, or file-pick an image (PNG, JPG, WEBP, SVG, or PDF rendered to raster). Validate file type and size. Display a thumbnail preview of the uploaded image.
2 — Configure Optionally open the similarity dropdown and select/deselect dimensions (Color, Shape, Composition, Style). Update the UI to reflect the active dimensions. Default state: all four selected.
3 — Search Click the "Search" button. Generate embedding(s) for the query image across the selected dimensions. Execute ANN query against the index. Compute per-dimension similarity scores and an overall (combined) score. Return ranked results.
4 — Results Browse the results grid. Sort by any score column. Click a result for detail view. Display results as a grid of thumbnail previews. Show sortable score columns when multiple dimensions are active. Provide a detail panel with full-size preview and metadata.

3) Functional Requirements

3.1 Search Interface

ID Requirement Details
SI-1 Image upload Accept a single image via drag-drop, clipboard paste, or file picker. Supported formats: PNG, JPG/JPEG, WEBP, SVG, PDF (first page). Max file size: 20 MB.
SI-2 Upload preview Display a thumbnail of the uploaded image before search is executed.
SI-3 Similarity dimension selector Multi-select dropdown with options: Color, Shape, Composition, Style. Default: all four selected. At least one must be selected to search.
SI-4 Search action A "Search" button that submits the query. Disabled until an image is uploaded and at least one dimension is selected.
SI-5 Loading state Show a progress indicator while the search is in progress.

3.2 Similarity Dimensions

Each dimension captures a distinct aspect of visual similarity. The implementation must produce an independent score (0–1, where 1 = identical) for each dimension.

Dimension Definition Examples of what it captures
Color Similarity of color palette, distribution, and dominant hues. Two icons that both use the same blue-and-white palette score high even if their shapes differ.
Shape Similarity of geometric forms, contours, and silhouettes. A circle-based logo and another circle-based logo score high regardless of color.
Composition Similarity of spatial layout — how elements are arranged within the frame. Two images with a centered subject over a bottom bar score high even if the subjects differ.
Style Similarity of visual treatment — line weight, shading, gradients, flat vs. skeuomorphic, corner radii, texture. Two flat-design icons with thin strokes score high; a flat icon vs. a 3D-rendered icon scores low.

3.3 Results Display

ID Requirement Details
RD-1 Results grid Display matching assets as a grid of thumbnail previews, ordered by overall similarity score (descending).
RD-2 Score columns When more than one similarity dimension is selected, display a sortable table/column for each active dimension's score plus an Overall score column. Clicking a column header sorts results by that score.
RD-3 Overall score Computed as the mean of the active dimension scores (equal weighting).
RD-4 Result count Display the total number of results. Return a maximum of 100 results per query.
RD-5 Detail view Clicking a result opens a detail panel showing: full-size preview, all dimension scores, and available metadata (filename, file path, file type, image dimensions, indexed timestamp).
RD-6 No-results state If no results meet the minimum similarity threshold, display a clear "No similar assets found" message.

3.4 Indexing Pipeline

ID Requirement Details
IX-1 Batch folder scan Accept a root directory path. Recursively scan for image files (PNG, JPG/JPEG, WEBP, SVG, PDF).
IX-2 Embedding generation For each discovered image, generate embeddings for all four similarity dimensions and store them in the vector index along with file metadata.
IX-3 Idempotent re-indexing If an image has already been indexed and its content has not changed (e.g., same file hash), skip re-processing. If content has changed, update the index entry.
IX-4 Scale target Index 10,000 images within hours (not days) on commodity hardware. The pipeline should support parallelism/batching to meet this target.
IX-5 Progress reporting Expose progress metrics during indexing: total files found, files processed, files skipped, errors encountered.
IX-6 Error handling Log and skip unreadable/corrupt files without halting the batch. Produce a summary report at completion listing any failures.

4) System Architecture (Stack-Agnostic)

The system is composed of five logical components. This section defines their responsibilities and interfaces without prescribing specific technologies.

┌─────────────┐       ┌─────────────────┐       ┌────────────────────┐
│  Frontend    │──────▶│  Backend API    │──────▶│  Embedding Engine  │
│  (Web UI)    │◀──────│                 │◀──────│                    │
└─────────────┘       └────────┬────────┘       └────────────────────┘
                               │
                               ▼
                      ┌─────────────────┐
                      │  Vector Index /  │
                      │  Storage         │
                      └─────────────────┘
                               ▲
                               │
                      ┌─────────────────┐
                      │  Indexing        │
                      │  Pipeline        │
                      └─────────────────┘

4.1 Frontend (Web UI)

Responsibility: Provide the browser-based interface for uploading images, configuring search parameters, and viewing results.

4.2 Backend API

Responsibility: Orchestrate search and indexing requests. Acts as the single entry point for the frontend.

4.3 Embedding Engine

Responsibility: Convert an image into one or more embedding vectors, one per similarity dimension.

4.4 Vector Index / Storage

Responsibility: Persist embedding vectors and support fast approximate nearest-neighbor (ANN) retrieval.

4.5 Indexing Pipeline

Responsibility: Scan a directory of images, generate embeddings, and populate the vector index.


5) API Contracts

Stack-agnostic interface definitions. Implementations may use REST, gRPC, or any protocol that satisfies these contracts.

Request: Search

{
  "image": <binary image data or base64-encoded string>,
  "dimensions": ["color", "shape", "composition", "style"],  // 1–4 selected
  "max_results": 50           // optional, default 50, max 100
}

Response: SearchResults

{
  "query_image_preview": "<url or data URI of uploaded image thumbnail>",
  "dimensions_searched": ["color", "shape", "composition", "style"],
  "total_results": 42,
  "results": [
    {
      "image_id": "abc-123",
      "filename": "logo_v2.png",
      "file_path": "/assets/logos/logo_v2.png",
      "file_type": "png",
      "image_width": 512,
      "image_height": 512,
      "thumbnail_url": "<url or data URI>",
      "scores": {
        "color": 0.92,
        "shape": 0.78,
        "composition": 0.85,
        "style": 0.64
      },
      "overall_score": 0.7975,
      "indexed_at": "2026-01-15T10:30:00Z"
    }
    // ... more results
  ]
}

Score semantics:

5.2 Indexing — Trigger

Request: StartIndexing

{
  "directory_path": "/path/to/asset/folder",
  "recursive": true             // optional, default true
}

Response: IndexingJob

{
  "job_id": "job-456",
  "status": "running",
  "directory_path": "/path/to/asset/folder",
  "started_at": "2026-02-10T14:00:00Z"
}

5.3 Indexing — Status

Request: GetIndexingStatus

{
  "job_id": "job-456"
}

Response: IndexingStatus

{
  "job_id": "job-456",
  "status": "running" | "completed" | "failed",
  "total_files_found": 10000,
  "files_processed": 6500,
  "files_skipped": 120,
  "errors": 3,
  "started_at": "2026-02-10T14:00:00Z",
  "completed_at": null,           // null while running
  "error_details": [
    {
      "file_path": "/path/to/corrupt.png",
      "error": "Unable to decode image"
    }
  ]
}

6) Non-Functional Requirements

Category Requirement Target
Search latency Time from "Search" click to first results rendered. < 3 seconds for a 10K-image index.
Indexing throughput Time to index a folder of images. 10,000 images in < 4 hours on commodity hardware (4-core CPU, 16 GB RAM, SSD).
Concurrent users Number of simultaneous search users supported. At least 5 concurrent searches without degradation.
Image size handling Maximum input image dimensions and file size. Up to 20 MB file size; images larger than 2048px on the longest edge are resized before embedding.
Index size Number of images the index supports without architectural changes. Up to 100,000 images.
Availability Uptime target for the search service. Best-effort (internal tool); graceful error messaging when service is unavailable.
Browser support Supported browsers for the frontend. Latest versions of Chrome, Firefox, Edge, and Safari.

7) Out of Scope

The following are explicitly deferred and are not part of this development phase. They may be addressed in future iterations (see the research BRD's Phase 2/3).

Item Why deferred
Access control / ACL-filtered results Adds significant complexity; not needed for an internal prototype.
User authentication Same as above; prototype operates as a single-user or trusted-network tool.
Region/crop-based search Valuable but adds UX and algorithmic complexity; better suited for Phase 2.
"More like this" refinement Requires session state and re-query logic; deferred to Phase 2.
Metadata / keyword search This BRD covers image-only similarity; hybrid search is a Phase 2 concern.
Continuous / real-time indexing Batch indexing is sufficient for the initial version.
Feedback loop ("good match / bad match") Requires a data collection pipeline and model retraining strategy; Phase 3.
Usage analytics and monitoring dashboards Phase 3 concern.
Multi-user collaboration features Not needed for prototype.
Deployment automation (CI/CD, containerization) Operational concern, not a functional requirement for the prototype.

8) Glossary

Term Definition
ANN Approximate Nearest Neighbor — an algorithm that finds vectors close to a query vector in sub-linear time.
Embedding A fixed-length numeric vector that represents an image's visual properties in a high-dimensional space.
Dimension (similarity) One of the four independent axes of visual similarity defined in this BRD: Color, Shape, Composition, Style.
Overall score The arithmetic mean of the active similarity dimension scores for a query–result pair.
Vector index A data structure optimized for storing and querying embedding vectors via ANN search.