Skip to content

0005 — decision capture (the Phase-4 learning substrate)

Status: IN PROGRESS (2026-06-22) — substrate implemented (FrameVerdict Proposed/DecidedAt, the analysis cache, Profile.Fingerprint); the learner that consumes it is Phase 4. Companion to 0001-krites.md (master design, esp. §6, §11 Phase 4), 0002-interface-contracts.md (R-CULL-*, R-VRD-*, R-GLOBAL-6), and 0004-face-eye.md.

Scope: persist, per frame, the labelled decision record — the signals the machine measured, the verdict it proposed, and the verdict Hailey settled on — so that the Phase-4 learning loop (0001 §11) has accumulated history to learn from. This spec builds the data substrate, not the learner. It is the cheap, early move that turns Phase 4 from "start collecting data" into "train on a year of real labelled examples."


1. Why now, when the learner is Phase 4

0001 §1 ("a taste that grows") and §11 (Phase 4) promise a system that adapts to Hailey's keep/reject history and auto-tunes the cull profile rather than leaving her to hand-twiddle every knob. As Phases 1–2 add tunable knobs (EyeOpenSoft, the EAR anchors, sharpness floors, dedup distance), the payoff for that promise grows — but a learner can only learn from data we have been capturing.

Today krites captures the override flag (R-VRD-1) but not the structured signals behind a decision, nor the machine's original proposal once a human overrides it, nor any analysis across runs. So if Phase 4 shipped tomorrow it would start from zero history. Capturing the decision record now — long before the learner exists — means the learner arrives to a populated dataset.

The capture is also independently useful before any learning: it backs the analysis cache (R-GLOBAL-6, fast re-cull) and makes verdicts explainable ("the machine proposed reject because sharpness 38 < 50; you kept it").

2. What a decision is

The labelled example the learner needs, per frame:

Part Source Purpose
Signals cull.Signals at cull time the features (sharpness, exposure, faceCount, eye-open)
Proposed verdict the machine's Resolve result the model's label
Final verdict the effective verdict after any human override the ground-truth label
Overridden? Override flag whether the human corrected the machine — the highest-signal events
Profile id the cull profile's name (+ a content hash) the thresholds in force, since they change over time
Decided at timestamp of the last change ordering + recency weighting

A frame where the machine proposed reject and Hailey set keep is one labelled correction; a frame she left alone is a (weaker) agreement signal. Both matter.

3. Where it lives — per-shoot, krites stays stateless

0001 §3 keeps krites stateless: the shoot holds the state. §2 makes a cross-shoot catalogue a non-goal (that's Lightroom's job). Decision capture must honour both:

  • R-CAP-1 (MUST) Decision records live in the shoot's .krites/ sidecar, per-shoot, non-destructive (R-ND-1/2). krites owns no global decision store.
  • R-CAP-2 (MUST) The Phase-4 learner (future) aggregates history by being pointed at a set of shoots, reading each one's records — the union is the cross-shoot history, without krites owning a catalogue. Discovering/looping that set is the learner's concern, specced in Phase 4, not here.

This keeps the load-bearing statelessness while still enabling cross-shoot learning at train time.

4. The data model

Two cooperating records, both per-shoot:

4.1 Analysis cache — the signals (also serves R-GLOBAL-6)

  • R-CAP-3 (MUST) Per frame, the structured cull.Signals are persisted under .krites/analysis/ keyed by frame + an analysis-version tag (bumped when a signal's computation changes, so stale rows are detectable). This is the feature vector — numbers, not the reasons-as-text we keep today.
  • R-CAP-4 (SHOULD) cull --reanalyze recomputes and overwrites; without it, cached signals are reused (R-GLOBAL-6) and only the verdict is re-resolved.

4.2 Enriched verdict — the labels

The verdict record gains the machine's proposal so a human override no longer erases what the machine thought:

type FrameVerdict struct {
    Verdict  cull.Verdict // the EFFECTIVE verdict (machine, or human if overridden)
    Reasons  []string
    Rating   int
    Cluster  string
    Override bool
    // Proposed is the machine's own verdict from the last cull, preserved even
    // when a human overrides Verdict — so (Proposed, Verdict) is the labelled
    // (machine-said, human-said) pair the learner trains on (0005 §2).
    Proposed cull.Verdict `yaml:"proposed,omitempty"`
    // DecidedAt is when Verdict last changed (RFC3339); recency for the learner.
    DecidedAt string `yaml:"decided_at,omitempty"`
}
  • R-CAP-5 (MUST) On cull, Proposed is set to the machine verdict and equals Verdict (no override yet). On a human override (verdict CLI / studio), Verdict changes and Override=true but Proposed is preserved — yielding the (Proposed → Verdict) correction pair.
  • R-CAP-6 (MUST) The records stay human-readable YAML (0001 §3) and deterministic given the same inputs (R-GLOBAL-7) — except DecidedAt, the one intentionally time-varying field.

5. Non-goals (this spec)

  • The learner itself — model, profile auto-tuning, the learn command: Phase 4.
  • A cross-shoot catalogue / global store — explicitly out (0001 §2; R-CAP-1).
  • Full event sourcing — we keep the latest decision per frame (signals + proposed + final + decided-at), not every intermediate flip. An append-only event log is a possible future enrichment if the learner wants trajectories; flagged, not built (§7-Q2).
  • PII / consent for training — the records are local, per-shoot, never leave the machine; revisit only if learning ever moves cloud-side.

6. Implementation plan (alongside the eye wiring)

The minimal substrate, TDD, deterministic-core unit-tested:

  1. shoot.FrameVerdict gains Proposed + DecidedAt; Verdicts.Override preserves Proposed, stamps DecidedAt (passed in, not read from the clock, to stay testable — R-GLOBAL-7). Round-trip + override unit tests.
  2. pipeline.Cull sets Proposed = the resolved verdict and persists the per-frame Signals to the analysis cache.
  3. shoot gains analysis-cache read/write (.krites/analysis/), keyed by frame + analysis-version. Pure (de)serialisation, unit-tested.
  4. Docs: a short "what krites records, and why (Phase-4 ready)" page.

Layered so it lands with the eye-signal wiring — the first real culls then bank labelled data including the new eye signal.

7. Open questions

  • Q1 — analysis-cache granularity. One file per frame (.krites/analysis/<frame>.yaml) vs one analysis.yaml map for the shoot. A per-frame file diffs cleanly and resumes well on 4,000 frames; a single map is simpler. Recommend: one map file now (matches verdicts.yaml), revisit if it gets heavy.
  • Q2 — latest-state vs event log. Keep only the latest decision per frame, or append every change? Recommend: latest-state now (§5); event log deferred to Phase 4 if the learner wants trajectories.
  • Q3 — profile identity. Store just the profile name, or name + a content hash of its thresholds? Recommend: name + hash — the thresholds move, and the learner needs to know which were in force.