krites — design spec¶
Status: DRAFT / intent (scaffold generated, build not started) Owner: Matt Cockayne Last updated: 2026-06-21
Read me first. This is the master design spec and the source of truth for what krites is and how it is built. The CLI + studio web-UI contracts (the
R-*requirements that anchor the tests) live in0002-interface-contracts.md. Open design questions are collected in §13 — resolve or explicitly defer each before the code it gates is written.
1. Purpose¶
krites takes the 3,000–4,000 frames from a full day's wedding shoot and judges them — culling the unusable, ranking the rest, then auto-correcting and lightly retouching the keepers — so a photographer hands off an album-ready set in a fraction of the time. It is driven from a local browser "studio", not a command line.
The name is Greek — κριτής, kritēs, "the judge". That is the product in one word: the thing that appraises every frame and rules on it. The human stays the final arbiter; krites does the first, exhausting pass.
It is built for Hailey (semi-pro wedding photographer) first, but designed so it could be offered to other photographers later (§2 non-goals, §12). The reference point in the market is aftershoot.com (AI culling + editing); krites is the bespoke, self-hosted, privacy-keeping alternative tailored to Hailey's own taste and criteria.
It is a tool built on go-tool-base (GTB) — a downstream consumer of that framework, shipping as a single Go binary, the same way keryx is. Where keryx is CLI-/CI-first with a stretch studio, krites inverts that: the studio is the primary surface and the CLI is for low-level/scripted operation.
2. Goals / non-goals¶
Goals
- Cull at scale. Ingest a day's shoot and, per frame, detect and flag: out-of-focus / motion blur, bad exposure (blown / crushed), closed eyes & blinks, and near-duplicate bursts — picking the best frame of each cluster. Output a keep / maybe / reject verdict with reasons, for a human to confirm. This is the core value and Phase 1.
- A taste that grows. The cull criteria are not hardcoded — they live in editable cull profiles (§6) that Hailey tunes, and over time krites adapts to her own keep/reject decisions (§11, Phase 4). "Suitable for a wedding album" is her definition, learned, not ours.
- Auto-correct the keepers. Auto-straighten (level the horizon / verticals) and auto-crop to a pleasing composition; apply a colour profile / look (white balance + tone + grade); minor retouching (blemish / skin / small fixes).
- Remove unwanted objects from a scene (a stray guest, a bin, an exit sign) by inpainting.
- Non-destructive, always. Originals are never modified. Every verdict, crop, and edit is a reversible instruction stored beside the shoot; pixels are only rendered on export (§3, §4.6). This is non-negotiable for a working photographer.
- Local-first & private. A wedding's photos are sensitive. The default pipeline runs on Hailey's own machine; any cloud/AI backend is opt-in, per-capability, and disclosed (§5).
- Fit the existing workflow. Photographers live in Lightroom — Hailey finishes there ~90% of the time (§13-Q5). krites reads and writes XMP sidecars (ratings, picks/labels, and — Phase 2 — crop/straighten/white balance) so a krites cull and its corrections show up in her Lightroom catalog. Cull-in-krites → finish-in-Lightroom is the primary path (§4.7); krites end-to-end is the supported ~10% for shoots she keeps entirely in krites.
- Single Go binary, GTB ecosystem; studio web UI as the primary surface, CLI for low-level/scripted use, MCP as a third surface mirroring the CLI.
Non-goals
- Not a full raster/RAW editor. krites does coarse, automatable corrections and a bounded retouch set — it is not Lightroom/Photoshop/Capture One and will not grow a curves panel, masks-everywhere, or a layers stack. For deep finishing, export to a real editor. (Analogy: keryx composes, it is not a video timeline; krites judges + auto-corrects, it is not a develop module.)
- Not a catalogue / DAM. It operates on a shoot (a directory handed to it); it is not a permanent library, keyword manager, or asset database across all of someone's work. Lightroom/Photo Mechanic own that.
- Not a delivery / gallery / proofing platform. It produces an album-ready set on disk; client galleries, downloads, and print fulfilment are out of scope (Pic-Time / Pixieset own that).
- Not multi-tenant (Phase 1–4). Single user, localhost. A hosted multi-user service is a possible future (§12), explicitly out of the initial scope and not to be designed-for prematurely.
- Not a phone app. Culling 4,000 frames is a desktop, big-screen, keyboard-driven task; the studio is desktop-first (the inverse of keryx's phone-first studio).
3. The shoot workspace (state model)¶
Like keryx, krites holds no state of its own — it operates on a workspace
handed to it. Here the workspace is a shoot: a directory of imported images
plus a hidden .krites/ sidecar tree that records every decision. This is what
keeps krites reusable (it works on any folder) and non-destructive (originals
are inputs, never outputs).
<shoot>/
originals/ # the imported frames — READ ONLY, never written after ingest
DSC0001.CR3 …
.krites/
shoot.yaml # manifest: id, name, date, source, counts, profile/look in use
analysis/ # per-frame analysis cache (scores, detections) — regenerable
DSC0001.json …
verdicts.yaml # per-frame keep/maybe/reject + rating + cluster id + reasons
edits/ # per-frame non-destructive edit instructions (straighten/crop/look/retouch/removals)
DSC0001.yaml …
previews/ # generated JPEG/embedded previews for fast studio browsing — regenerable
export/ # rendered, album-ready output (written by `krites export`)
DSC0001.xmp … # XMP sidecars beside originals (Lightroom interop, §4.7) — opt-in
- Originals are immutable. Nothing under
originals/is ever modified after ingest. Verdicts and edits are instructions; export renders them intoexport/. Re-running analysis or regenerating previews never touches them. - Everything reversible. A verdict or edit is a change to a YAML/JSON record, so undo/redo and "reset frame" are cheap and total.
- Caches are regenerable.
analysis/andpreviews/can be deleted and rebuilt; onlyverdicts.yaml,edits/, and the XMP are durable decisions. - The studio can hold many shoots. A shoot is self-contained; the studio's library is just a list of shoot directories it knows about (§10). The CLI, by contrast, scopes to the current shoot directory.
4. The pipeline¶
A frame flows: ingest → cull (judge) → develop (correct) → retouch → remove → export. Each stage is a CLI command (low-level, §9) and a studio panel (§10); they share the workspace files, so studio and CLI are interchangeable. Stages are non-destructive until export.
4.1 Ingest¶
krites ingest <dir> registers a shoot: copies/links the source frames into
originals/, decodes embedded previews (or generates them), reads EXIF (camera,
lens, time, orientation), and writes shoot.yaml. RAW formats decode via libvips
/ libraw (§5); for speed, culling runs against embedded JPEG previews where
present, full-decode only on demand.
4.2 Cull — the judge (the core)¶
The analysis pass scores every frame on a set of signals, then a cull profile (§6) turns the signals into a verdict. Signals (Phase 1 set, extensible):
| Signal | What it measures | Default technique |
|---|---|---|
| Sharpness / focus | global + subject-region blur; distinguishes nice bokeh from a missed-focus / motion-blurred frame | variance-of-Laplacian + edge energy on the subject region; (later) a learned focus model |
| Exposure | clipped highlights %, crushed shadows %, median EV deviation | histogram analysis |
| Eyes / blinks | per-face: eyes open/closed, partial blink | face + eye-state model (provider, §5) |
| Expression (Phase ≥2) | smile / neutral / awkward, looking-at-camera | face attribute model (provider) |
| Near-duplicate clustering | groups burst/near-identical frames, ranks within the cluster | perceptual hash + time proximity → cluster; rank by sharpness+eyes+exposure |
| Aesthetic score (Phase ≥2) | a general "is this a strong frame" score | aesthetic model (provider) |
Output per frame: a structured analysis record (the raw signals) and a verdict
(keep / maybe / reject) with human-readable reasons ("rejected: motion
blur; both subjects' eyes closed") and a cluster id where it is one of a burst.
krites proposes; the human disposes — verdicts are confirmable/overridable in
the studio, never silently final.
The default flow surfaces the best frame per cluster as keep, demotes the
rest of the cluster to maybe, and rejects anything failing the profile's hard
gates (e.g. badly out of focus). Thresholds and which signals are hard gates vs
soft penalties are entirely the cull profile's business (§6) — so "suitable
for a wedding album" stays Hailey's definition.
4.3 Develop — auto-correct¶
Applied to keepers (batchable):
- Straighten — detect the horizon / dominant verticals and compute a level rotation; clamp to a sane max angle; never upside-down-guess.
- Crop — propose a crop that improves composition (aspect target, rule-of- thirds / subject placement, headroom, removing edge distractions) while respecting the frame's subject. Always a proposal with a preview.
- Colour profile / look — apply white-balance correction + a tonal/colour look from the look catalog (§6): Hailey's signature grade, a camera/profile match, per-scene presets (ceremony / golden-hour / reception). LUT- and curve-expressible looks, applied as develop settings, not baked pixels.
4.4 Retouch (Phase ≥2)¶
A bounded set of automatic, low-touch fixes: blemish/spot removal, gentle skin smoothing, optional teeth/eye cleanup — conservative defaults, every effect toggle-able and dialled per frame. The hard ceiling: anything needing real local masking and artistry goes to Photoshop. (Mirrors keryx's "AI coarse edits only" ceiling.)
4.5 Object removal (Phase ≥3)¶
Remove an unwanted object/person/distraction: the user marks a region (brush / box in the studio, or a coordinate/region on the CLI), and an inpainting provider (§5) fills it. Stored as a removal instruction (mask + result reference) on the frame's edit record so it stays non-destructive and reversible.
4.6 Export¶
krites export renders the current verdicts + edits into export/: keepers only
(or a chosen verdict set), with straighten/crop/look/retouch/removals baked,
to JPEG (or TIFF), at a chosen size/quality, with EXIF/copyright carried over.
This is the only stage that produces new pixels. Originals remain untouched.
4.7 Lightroom / XMP interop (the primary workflow — §13-Q5)¶
Finish-in-Lightroom is Hailey's main path (~90%); krites end-to-end is the supported minority (~10%). So the XMP bridge is a first-class deliverable, not an afterthought — and because her corrections must survive the jump to Lightroom, it carries geometry/WB, not just verdicts. krites reads/writes XMP sidecars next to originals:
- Write (Phase 1): cull verdicts → star rating + pick/reject flag + colour label. A krites cull then appears in her Lightroom catalog on import / read- metadata — the core of the 90% workflow.
- Write (Phase 2): develop geometry/WB → crop, straighten angle, white balance into Camera-Raw XMP, so keepers open in Lightroom already leveled, cropped and WB-corrected. A real requirement, since otherwise krites' auto-correction is lost the moment she opens Lightroom.
- Never written: looks / retouch / object-removal — XMP can't express them; they serve krites' own export (the 10% end-to-end path) only. No fragile, half-applied develop mapping.
- Read: respect existing ratings/labels so krites and Lightroom round-trip without clobbering each other.
5. Backends are pluggable (provider interfaces)¶
The heavy lifting — RAW decode, ML detections, generative inpainting — sits behind narrow Go interfaces, with the implementation chosen from config at construction (mirrors keryx's provider/Publisher pattern). Adding a backend is purely additive (implement + register a constructor; no call-site changes). Requests are provider-neutral (krites' intent, not a vendor payload).
| Capability | Interface (intent) | Default (local-first) | Optional |
|---|---|---|---|
| Decode | Decoder — RAW/JPEG → pixels + EXIF + preview |
libvips/libraw (govips) | — |
| Image ops | Pipeline — rotate/crop/resize/colour |
libvips / Go imaging | — |
| Blur/exposure | QualityAnalyzer |
pure-Go (Laplacian, histogram) | — |
| Faces / eyes / expression | FaceAnalyzer |
local model via ONNX Runtime | cloud vision |
| Aesthetic score | AestheticScorer |
local model (ONNX) | cloud |
| Dedup | Clusterer |
pure-Go perceptual hash | — |
| Inpaint / object-removal | Inpainter |
local LaMa via ONNX Runtime (CoreML) | cloud (SD-inpaint etc.) |
ML runtime (RESOLVED, §13-Q1). Go is a fine orchestrator and does the
deterministic image maths (sharpness, histograms, hashing, rotate/crop/resize,
export) in pure Go. All models run through ONNX Runtime with the CoreML
execution provider — loaded via purego (no CGO) against a bundled ONNX
Runtime dylib — covering the discriminative models (face/eye, aesthetic) and
the generative inpainter (LaMa, exported to ONNX). No Python sidecar.
krites stays a single Go binary + the ORT dylib. Every cloud backend is an
opt-in fallback behind the same interface (§13-Q3), never the default.
Local-by-default, cloud as a first-class choice (RESOLVED, §13-Q3). Every
capability defaults to its local backend (privacy-respecting out of the box),
but a cloud backend is a genuine, user-selectable option per capability (the
"Optional" column), chosen in config for best-in-class quality — not a buried
fallback. Whenever a cloud backend is active, the studio/CLI disclose that a
frame would leave the machine (R-PRIV-2, R-UI-19) and secrets stay in the
keychain/env (R-PRIV-3).
6. Config-driven catalogs (cull profiles & looks)¶
The judgement and the aesthetic are config, never Go constants — exactly
as keryx makes themes config-driven. Two catalogs, both managed through the GTB
config layer (init seeds house defaults; krites profile|look list|show|add|
edit|rm; hot-reloaded):
- Cull profile — a named, self-contained ruleset: per-signal thresholds,
which signals are hard gates (auto-reject) vs soft penalties, dedup
aggressiveness, how aggressive keep-vs-maybe is. e.g.
wedding-default,ceremony-strict,reception-loose.--profile <name>per run; falls back to the configured default. - Look — a named colour/tone profile: white-balance policy, tone curve,
grade/LUT reference, per-scene variants. e.g.
hailey-signature,golden-hour,bw-classic.--look <name>on develop/export.
krites init seeds a starter wedding-default profile and a neutral look; these
are starting points Hailey tunes, and the Phase 4 learning loop (§11) feeds
adjustments back into her profile. Never reintroduce hardcoded thresholds /
grades in Go — wire them through the catalog.
7. Planned package layout¶
pkg/ is the default home, not the exception. krites' real value is its
engine — the analysis, culling, develop, workspace, and provider seams — and
that engine is meant to be reusable: it could power a hosted product, a Wails
app, or another tool, and others may import it (§12). So the engine lives in
pkg/ (a deliberate public API, with generated mocks in mocks/).
internal/ is reserved for genuinely krites-CLI-specific glue that is not
part of the reusable surface (CLI output formatting, command-only orchestration).
The RunX command bodies live in pkg/cmd/<name> (the GTB convention). When in
doubt, prefer pkg/. Indicative:
| Package | Responsibility |
|---|---|
pkg/shoot |
the workspace: manifest, verdicts, edit records, sidecar IO, undo/redo |
pkg/ingest |
import, EXIF, preview generation |
pkg/analyze/quality |
sharpness + exposure (pure-Go) |
pkg/analyze/face |
face / eye / expression (provider-backed) |
pkg/analyze/dedup |
perceptual-hash clustering + in-cluster ranking |
pkg/analyze/aesthetic |
aesthetic scoring (provider-backed) |
pkg/cull |
signals + cull profile → verdict + reasons |
pkg/develop/straighten |
horizon/vertical detection → level angle |
pkg/develop/crop |
composition-aware crop proposal |
pkg/develop/look |
white balance + tone + grade/LUT |
pkg/retouch |
bounded auto-retouch set |
pkg/remove |
object-removal orchestration (mask + Inpainter) |
pkg/render |
export: apply edits → write pixels |
pkg/xmp |
XMP sidecar read/write (Lightroom interop) |
pkg/imaging |
decode/ops abstraction over libvips (the Decoder/Pipeline seams) |
pkg/catalog |
cull-profile + look catalogs over the GTB config layer |
pkg/studio |
the web UI: GTB service lifecycle, HTTP, SPA serving, SSE |
internal/... |
CLI-only glue (output formatting, command orchestration) |
The provider interfaces (Decoder, QualityAnalyzer, FaceAnalyzer,
AestheticScorer, Clusterer, Inpainter) are the seams that let each
capability land — and swap local↔cloud — independently.
8. Testing approach (TDD + BDD, the GTB pattern)¶
- Spec-first, test-first. Behaviour, error cases, and edge cases derive from
this spec and the
R-*requirements in0002. Write failing tests first. - BDD with godog for user-facing CLI commands and multi-step workflows
(Gherkin in
features/, steps intest/e2e/steps/, driven bycmd/e2e) — the GTB pattern keryx uses. A command isn't done without scenarios. - Deterministic core unit-tested directly: sharpness/exposure maths, hashing
- clustering, verdict resolution from a profile, straighten-angle maths, crop geometry, XMP round-trip, render correctness. These must be deterministic and fast.
- Providers faked behind their interfaces. The ML/generative backends
(
FaceAnalyzer,AestheticScorer,Inpainter) and libvips are injected and faked in unit tests; no package-level mocking hooks (they race undert.Parallel()) — inject via options/fields/config, the keryx rule. - Golden-image fixtures. A small, committed corpus of representative frames (sharp/blurry, blinks, bursts, over/under) pins analyzer behaviour. Real model
- libvips runs are env-var-gated integration tests (
INT_TEST=1,*_integration_test.go), not run in the default suite. github.com/cockroachdb/errorsfor errors; no//nolint— fix the root cause; table-driven tests witht.Parallel().
9. CLI surface (low-level)¶
The CLI is the scriptable, low-level face; the studio (§10) is primary. Each
pipeline stage is a command operating on the current shoot directory. Sketch
(full contract in 0002 §2–3):
krites ingest <dir> [--name] [--copy|--link] # register a shoot workspace
krites cull [--profile <name>] [--reanalyze] # run analysis → verdicts + reasons
krites verdict <frame> keep|maybe|reject # override a verdict
krites dedup [--profile <name>] # (re)cluster bursts, pick best
krites straighten [<frame|--all>] # propose level angles
krites crop [<frame|--all>] [--aspect] # propose crops
krites develop [--look <name>] [<frame|--all>] # white balance + tone + grade
krites retouch [<frame|--all>] # bounded auto-retouch (Phase ≥2)
krites remove <frame> --region <…> # object removal (Phase ≥3)
krites export [--verdict keep] [--look] [--size] # render album-ready set
krites xmp write|read # Lightroom sidecar sync
krites profile|look list|show|add|edit|rm # manage catalogs (§6)
krites studio [--port N] # the browser studio (primary, §10)
Inherited GTB defaults: init, update, docs, doctor, changelog,
config, keychain, mcp. MCP exposes krites' commands as tools (a third
surface mirroring the CLI); side-effecting / destructive-feeling commands are
gated off MCP (export, remove, and anything that writes pixels or deletes)
via setup.ExcludeFromMCP, the keryx pattern — present on the CLI, absent from
the assistant surface.
10. The studio (web UI) — the primary surface¶
Full interface contract: the studio's functional requirements, UX, and the HTTP + SSE API are specified in
0003-studio.md. This section is the design rationale;0003is the contract.
krites studio [--port N] starts a local, single-user, localhost-bound web
server (GTB service lifecycle via pkg/controls, server via pkg/http,
frontend go:embed-ed so the binary stays one file) — the same architecture as
keryx's studio (its spec §10), but this is krites' front door, not a stretch
goal, and it is desktop-first (not phone-first): culling thousands of frames
is a big-screen, keyboard-driven job.
What it does:
- Shoot library — list / open / create / rename / remove shoots; each shows its date, counts (total / keep / maybe / reject), and the profile/look in use.
- Cull review (the heart) — a fast grid + loupe over the shoot:
keep/maybe/reject and 1–5 star with keyboard flags (Lightroom-muscle-memory:
Ppick /Xreject /1‑5/ arrows), filter by verdict or reason, and a reasons overlay ("why did krites reject this"). Near-duplicate clusters shown together with the proposed best highlighted and a side-by-side compare to swap the pick. This panel must feel instant on 4,000 frames (preview-backed, virtualised grid). - Develop — straighten/crop preview with handles, look picker, retouch toggles; before/after.
- Object removal — brush/box a region → preview the inpaint → accept/reject.
- Export — choose verdict set, look, size/quality, destination → render.
- Settings — edit cull profiles, looks, and providers (writes the project config via the GTB config layer, hot-reloaded; secrets stay in keychain/env, never in config). A clear privacy indicator when a cloud provider is on.
Frontend stack (RESOLVED, §13-Q2). A full Svelte SPA over a thin Go JSON +
SSE API, go:embed-ed so the binary stays one file. krites' cull review is
heavy — a virtualised grid over thousands of images, drag-select, side-by-side
compare, crop handles, a paint mask for object removal — so the whole studio is
one coherent client app rather than a server-rendered/HTMX straddle. Svelte
(compiles to tiny vanilla JS, no virtual-DOM runtime — not Electron) keeps the
footprint and memory low and go:embeds cleanly. The renderer is
swappable: start with krites studio → local server →
browser (GTB model, fastest iteration, DevTools debugging), and later wrap the
same SPA in Wails for a native macOS .app (system WKWebView) with near-zero
rewrite. The memory-critical bit is the 4,000-thumbnail grid — solved by
virtualisation + server-sized preview JPEGs + off-screen eviction, not by the
framework choice. Avoid a heavyweight React/Next stack for a localhost tool.
11. Phased roadmap¶
- Phase 0 — scaffold. ✅ GTB project generated; builds. (This spec.)
- Phase 1 — cull MVP (the whole point). Ingest → quality (blur/exposure) +
eye/blink + near-duplicate clustering → verdicts with reasons → studio cull
review → export keepers + XMP write. Cull profile catalog (
wedding- default). This alone delivers Aftershoot's core value. Settle §13-Q1 (ML runtime) and §13-Q2 (studio stack) first. - Phase 2 — develop. Auto-straighten, composition crop, looks/colour profiles, bounded retouch; expression/aesthetic signals feed the cull.
- Phase 3 — object removal.
Inpainterprovider + studio brush/preview. - Phase 4 — learns Hailey's taste. A feedback loop: krites learns from her accept/override history and proposes cull-profile/look adjustments (and, longer-term, a personalised model). This is the feature that turns "a good culler" into "her culler" — Aftershoot's headline, done privately and bespoke.
Build order rule: culling before correcting before generating — the value and the certainty both decrease in that order, and the privacy/cost surface increases.
12. Future: could it be a product?¶
Designed single-user/local first, but the seams (provider interfaces, config- driven catalogs, a self-contained shoot workspace, a studio over an HTTP API) don't preclude a later hosted, multi-user offering for other photographers — the market krites is benchmarked against (aftershoot.com). That is explicitly out of scope for Phases 1–4 and must not drive premature design (no multi- tenancy, no accounts, no billing now). Captured only so the boundaries above are deliberate, not accidental.
12.1 WebAssembly (anticipated requirement — protect the option now)¶
WASM is a first-class anticipated direction (likely a requirement as the tool grows), not a maybe. Two distinct fits, both already supported by the architecture:
- Go → WASM: reuse the deterministic engine client-side. The
pkg/engine (quality, dedup, cull verdict-resolution, xmp) is pure Go and compiles toGOOS=js GOARCH=wasmtoday (verified). Running it in the browser lets the studio re-resolve all ~4,000 verdicts instantly client-side when a profile threshold changes — no server round-trip — and enables an offline/PWA mode. (Go's WASM runtime is multi-MB; TinyGo shrinks it with stdlib limits — a bundle-size trade-off to weigh per surface.) - ONNX Runtime Web + WebGPU: client-side inference — the multi-tenant unlock.
The same exported ONNX models we run natively (CoreML) can run in the browser
via ONNX Runtime Web (WASM/WebGPU). For a hosted service this moves
inference into each user's browser, which both eliminates per-tenant GPU
cost and keeps the privacy promise even in SaaS (photos never leave the
browser). It slots in as another
FaceAnalyzer/Inpainter/Decoderadapter — additive, no call-site changes. (Note: the current native ONNX path usespurego+ a dylib, which is not WASM-compatible; the browser path is ONNX Runtime Web on the JS/Svelte side fulfilling the same interface role. Clean split: Go-WASM does the deterministic maths, ORT-Web does the models.)
Load-bearing constraint (enforce from now): the deterministic pkg/ engine
MUST stay pure-Go and cgo-free so it remains WASM-compilable. Any native/cgo or
purego+dylib backend (libvips decode, ONNX inference) lives only in its own
adapter package behind a provider interface, never imported by quality,
dedup, cull, xmp, or shoot. A CI check (GOOS=js GOARCH=wasm go build
./pkg/... over the engine packages) should guard this. When WASM becomes a
concrete requirement it earns its own spec (à la 0003); §13-Q7 frames the
strategy.
13. Open questions (resolve or defer before the gated work)¶
- ML runtime strategy. ✅ RESOLVED (2026-06-21): ONNX Runtime for
everything, no Python. Deterministic maths (sharpness, exposure, perceptual-
hash dedup) stays pure Go. All models — discriminative (face/eye-state,
aesthetic) and generative inpainting (LaMa, which exports to ONNX) —
run through ONNX Runtime with the CoreML execution provider, loaded via
purego(so no CGO) against a bundled ONNX Runtime dylib. krites stays a single Go binary + that dylib; no Python environment. Cloud is an opt-in fallback only (Q3). Consequence to note in design: inpainting is LaMa-class — excellent for removing objects/distractions, not a full generative scene editor; if a future need outgrows LaMa, theInpainterinterface lets a cloud diffusion backend slot in without call-site changes. - Studio frontend stack. ✅ RESOLVED (2026-06-21): a full SPA over a thin
Go JSON + SSE API, served by
krites studioin the browser to start, built Wails-ready. Framework: Svelte (+ Vite, Svelte stores, TanStack Virtual /svelte-virtual) — settled on after weighing Vue: Svelte's smallest footprint (compiles to vanilla JS, no virtual-DOM runtime), cleanestgo:embed, and best fit for the bespoke culling UI won out, and the API contract is framework- agnostic so the choice stays reversible (0003§0). The frontend is a self-contained SPAgo:embed-ed into the binary; the renderer is a swappable detail. We start with the GTB model (krites studio→ local server → browser tab — fastest iteration, DevTools debugging, negligible memory), and because the SPA talks to a clean API we can later wrap the same frontend in Wails (native macOS.appover the system WKWebView, light) with near-zero rewrite — so the web-vs-native choice is deferred at no cost. The real memory concern is not the framework but holding ~4,000 thumbnails: handled by a virtualised grid (render only visible rows), server-generated right-sized preview JPEGs, and client-side off-screen eviction — designed in from day one (R-UI-3). Pure-Go-native (Gio/Fyne) was rejected: unfamiliar stack, hand-built grid/canvas, slower loop. - Cloud providers — any allowed at all? ✅ RESOLVED (2026-06-21): cloud is
a first-class, user-choosable backend (not just a fallback). Each capability
(
FaceAnalyzer,AestheticScorer,Inpainter, …) MAY be pointed at a cloud backend for best-in-class quality, selected per-capability in config. Local remains the default for every capability (privacy-respecting out of the box), and the disclosure contract is mandatory —R-PRIV-2(a command says when it would send data off-machine) andR-UI-19(the studio's persistent privacy indicator + which operation leaves the Mac) still hold, and secrets stay in the keychain (R-PRIV-3). So: local-by-default, cloud as a genuine opt-in choice with eyes open — not an escape hatch buried behind a flag. - RAW handling depth. ✅ RESOLVED (2026-06-21): previews to cull, RAW to
develop. Culling runs against the embedded JPEG preview every RAW already
carries (sufficient for blur / exposure / eye / dedup), so all ~4,000 frames
cull fast. Full libvips/libraw RAW decode happens only for keepers that are
developed/exported (a few hundred frames) — giving real RAW latitude (white
balance, highlight recovery, full bit depth) exactly where it matters. The
Decoderprovider exposes both a cheapPreview()and a fullDecode()path;ingest/culluse the former,develop/exportthe latter. - Lightroom round-trip fidelity. ✅ RESOLVED (2026-06-21): finish-in- Lightroom is the primary workflow (~90%); krites end-to-end is the supported minority (~10%). Both are accommodated. Consequence: krites' geometric/WB corrections are only valuable to the 90% if they travel via XMP, so the XMP bridge is richer than verdicts alone:
- Phase 1 — cull verdicts → XMP: star rating + pick/reject + colour label, read by every Lightroom. This is the 90% workflow's core deliverable.
- Phase 2 — crop / straighten / white-balance → Camera-Raw XMP: a real requirement (not an optional nicety), so keepers open in Lightroom already leveled, cropped and WB-corrected.
- Never in XMP — looks / retouch / object-removal: Adobe can't represent them, so they serve the krites end-to-end export path (the 10%) only; no fragile half-applied mapping is shipped.
This makes krites' auto-correction useful to her main Lightroom workflow while
keeping the full-fidelity finisher path for the cases she stays in krites.
6. Hardware target. ✅ RESOLVED (2026-06-21): Apple Silicon Mac (M-series).
krites targets macOS / arm64 first. Local ML runs through ONNX Runtime
with the CoreML execution provider (Neural Engine / Metal-accelerated);
pure-Go maths needs no acceleration. Local generative inpainting (Phase 3)
is feasible on a higher-RAM M-Pro/Max (≥32 GB) but should degrade to an opt-in
cloud backend on leaner machines — so the Inpainter provider must support
both (see Q1). Sub-detail to confirm: exact chip + RAM, which only affects how
aggressively Phase 3 inpaint runs locally.
7. WASM strategy (anticipated requirement, §12.1). ⏳ OPEN — protect the
option now, decide the shape when it lands. The enabling constraint is
already in force (the deterministic pkg/ engine stays cgo-free / WASM-
compilable, guarded by a CI build). The decisions to make when WASM becomes a
concrete requirement: which surfaces run the engine client-side (instant
profile re-resolve, offline/PWA) and whether TinyGo is needed for bundle
size; whether client-side inference uses ONNX Runtime Web + WebGPU (and the
model-download size budget over a network); and, for a hosted multi-tenant
service, how much of the pipeline moves into the browser to preserve privacy +
cap GPU cost. Earns its own spec at that point.